Skip to content

[feature] Seperate task to purge old cached subtitles for media that has been (re)moved #35

Closed
@sjorge

Description

After re-organizing some media I noticed the subtitles dir nearly doubled in size. After looking at the code for a bit, it seems that old subtitles are not cleaned up.

When not using this plugin and using on-demand extracting, one could just purge files older than X days as there do not seem to be any reference stored in the database (that I could find). But when using this plugin, you can't take this simple approach and old files build up without every being removed.

I think having a separate that purges cached files for removed media would be a good addition.

One way this task could work is to build a list of all paths that should exists (this can be done by doing the same query + calls to GetSubtitleCachePath()) and keep them in a list. Then lookup all files in data/subtitles/ and check if they are on the list or not, if not Delete them.

I have not touched C# in over 10 years (closer to 15 years), but I did managed to write a proof of concept (I first tried it in python but ran into issues)

using System;
using System.IO;
using System.Globalization;
using System.Text;
using System.Security.Cryptography;
using Microsoft.Data.Sqlite;

namespace jf_subtitle_cache_cleaner;

class Program
{
    // https://github.com/jellyfin/jellyfin/blob/f7227c6ca184baecac9756c8123f9a3cfa075b5b/MediaBrowser.Common/Extensions/BaseExtensions.cs#L30
    private static Guid GetMD5(string str)
    {
        return new Guid(MD5.HashData(Encoding.Unicode.GetBytes(str)));
    }
    // https://github.com/jellyfin/jellyfin/blob/f7227c6ca184baecac9756c8123f9a3cfa075b5b/MediaBrowser.MediaEncoding/Subtitles/SubtitleEncoder.cs#L831
    private static string GetSubtitleCachePath(string subtitleCachePath, string path, int streamIndex, string outputSubtitleExtension)
    {
        var ticksParam = string.Empty;
        var date = File.GetLastWriteTimeUtc(path);
        ReadOnlySpan<char> filename = GetMD5(path + "_" + streamIndex.ToString(CultureInfo.InvariantCulture) + "_" + date.Ticks.ToString(CultureInfo.InvariantCulture) + ticksParam) + outputSubtitleExtension;
        var prefix = filename.Slice(0, 1);

        return Path.Join(subtitleCachePath, prefix, filename);
    }

    private static string GetSubtitleExtension(string codec)
    {
	if (codec.ToLower() == "ass" || codec.ToLower() == "ssa")
        {
            return "." + codec;
        }
        else
        {
            return ".srt";
        }
    }

    // https://github.com/jellyfin/jellyfin/blob/f7227c6ca184baecac9756c8123f9a3cfa075b5b/MediaBrowser.Model/Entities/MediaStream.cs#L688
    private static bool IsTextFormat(string format)
    {
	string codec = format ?? string.Empty;

	// microdvd and dvdsub/vobsub share the ".sub" file extension, but it's text-based.

	return codec.Contains("microdvd", StringComparison.OrdinalIgnoreCase)
	       || (!codec.Contains("pgs", StringComparison.OrdinalIgnoreCase)
		   && !codec.Contains("dvdsub", StringComparison.OrdinalIgnoreCase)
		   && !codec.Contains("dvbsub", StringComparison.OrdinalIgnoreCase)
		   && !string.Equals(codec, "sup", StringComparison.OrdinalIgnoreCase)
		   && !string.Equals(codec, "sub", StringComparison.OrdinalIgnoreCase));
    }

    static void Main(string[] args)
    {
        var dataPath = "/var/lib/jellyfin/data";
        var dbPath = Path.Join(dataPath, "library.db");
        var subtitleCachePath = Path.Join(dataPath, "subtitles");

        IDictionary<string, string> pathMappingCache = new Dictionary<string, string>();
	List<string> whitelistSubtitleCacheFiles = new List<string>{};
        var purgeCount = 0;
        var keepCount = 0;

        Console.WriteLine("Opening " + dbPath + " ...");
        using (var connection = new SqliteConnection("Data Source=" + dbPath + ";Mode=ReadOnly;"))
        {
            connection.Open();

            var commandMediaStream = connection.CreateCommand();
            commandMediaStream.CommandText =
            @"
                SELECT *
                FROM mediastreams
                WHERE StreamType = @streamType AND IsExternal = @isExternal;
            ";
            commandMediaStream.Parameters.AddWithValue("@streamType", "Subtitle");
            commandMediaStream.Parameters.AddWithValue("@isExternal", 0);

            Console.WriteLine("Looking up subtitle mediastreams ...");
            using (var readerMediaStream = commandMediaStream.ExecuteReader())
            {
                while (readerMediaStream.Read())
                {
                    var guid = readerMediaStream.GetGuid(0);
		    if (!IsTextFormat(readerMediaStream.GetString(readerMediaStream.GetOrdinal("Codec"))))
		    {
			continue;
                    }

                    if (!pathMappingCache.ContainsKey(guid.ToString()))
                    {
                        var commandBaseItem = connection.CreateCommand();
                        commandBaseItem.CommandText =
                        @"
                            SELECT *
                            FROM TypedBaseItems WHERE guid = @guid;
                        ";
                        commandBaseItem.Parameters.AddWithValue("@guid", guid.ToByteArray());
                        using (var readerBaseItem = commandBaseItem.ExecuteReader())
                        {
                            while (readerBaseItem.Read())
                            {
                                pathMappingCache.Add(guid.ToString(), readerBaseItem.GetString(readerBaseItem.GetOrdinal("Path")));
                            }
                        }
                    }

	            var path = pathMappingCache[guid.ToString()];
	            var streamIndex = readerMediaStream.GetInt32(readerMediaStream.GetOrdinal("StreamIndex"));
                    var extension = GetSubtitleExtension(readerMediaStream.GetString(readerMediaStream.GetOrdinal("Codec")));

		    var subtitleCacheFile = GetSubtitleCachePath(subtitleCachePath, path, streamIndex, extension);
                    whitelistSubtitleCacheFiles.Add(subtitleCacheFile);

                }
            }
            Console.WriteLine("Detected " + whitelistSubtitleCacheFiles.Count + " valid subtitle cache paths.");
            foreach (string subtitleFile in Directory.GetFiles(subtitleCachePath, "*", SearchOption.AllDirectories))
            {
		if (!whitelistSubtitleCacheFiles.Contains(subtitleFile))
                {
		    purgeCount += 1;
		    File.Delete(subtitleFile);
                }
                else
                {
                    keepCount += 1;
                }
            }
            Console.WriteLine("Subtitle cache: purged=" + purgeCount + ", kept=" + keepCount);
        }
    }
}

I tested this by ensuring the extraction found no new subtitles to extract and running this -> no files purged. I then dropped a .ignore in a series folder, ran scan all libraries twice. And ran this, it then purged files belonging to those episodes. Removing .ignore and scanning again made the files reappear as expected based on how the filenames get caluculated.

This just copied some bits from around the jellyfin code base into a simple file that just talks to the database directly (eek bad I know). It way more performant than I expected, I was expecting hour+ to run but it finishes in <2 minutes.

If there is interest, I'm willing to give it a go at implementing this myself. However I am not sure if there is a better approach.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions