Sorting/isolating files with series numbers in middle of file names #294

Open
truth1ness opened this Issue Apr 2, 2015 · 11 comments

Projects

None yet

2 participants

@truth1ness

A problem I'm running into is files that are part of a series but the series number is in the middle.

For example:

Title Name part 1 - Author Album etc.ext
Title Name part 2 - Author Album etc.ext
Title Name part 3 - Author Album etc.ext

These clearly aren't duplicates but are hard to filter out without a near 100% hardness level. There are a few of these scattered throughout the list of matches that are hard to find manually. What would really help is one of the following sort options to isolate these:

  1. Column to sort any files with numbers in their name together
    This is the simplest method as it merely separates any file with numerical content so they can be easily combed through manually.

  2. Column to sort any files where dupes differ only in number
    A slightly more targeted approach would be instead of sorting by any file with numbers sort files where the number is the only differing word in the duplicate list. This would quickly separate out file series' from regular dupes.

You could also include logic to capture other number formats (1 2 3, one two three, I II II, etc).

Numbered file series' are pretty common and people may place the number anywhere. This would save a ton of manual searching to be able to sort these together using either of these two methods.

@truth1ness truth1ness changed the title from Series number in middle to Series numbers in middle of file names Apr 2, 2015
@truth1ness truth1ness changed the title from Series numbers in middle of file names to Sorting/isolating files with series numbers in middle of file names Apr 2, 2015
@hsoft hsoft self-assigned this Apr 3, 2015
@hsoft
Owner
hsoft commented Apr 3, 2015

I was about to suggest enabling "Use regular expressions when filtering", which would have allowed you to use the filter box with something like part \d+. That would have been a solution similar to your first proposition, working right out of the box.

However, as I tried it myself, I noticed that the regex feature seem broken.

I'll check this out soon and see if there's a bug that needs fixing. But normally, that should help you solve your problem.

@truth1ness

Ah, sweet, thanks. Let me know when when it's updated.

@hsoft
Owner
hsoft commented Apr 5, 2015

The Regexp filtering option is fine, it's just that its activation is reversed under Windows and Linux. So until the next bugfix release, you have to uncheck the "Use regular expressions when filtering" checkbox to enable the feature.

So my first recommendation still stands: If you enable regex filtering, you could search for something like parts \d+ to quickly locate the dupes you're looking for.

@hsoft hsoft removed their assignment Apr 5, 2015
@truth1ness

Ah, ok. I'm on mac so it should be working like normal. So it seems to be working with parts \d+ but for some reason the regex doesn't work alone \d+. I want to use the regex alone because some multi part files simply have a number and don't actually say "part" or "parts". But when I type \d+ by itself it keeps showing all results including those with and without any digits as if the filter box was empty. Is this a bug or am I doing something wrong?

Also, is there any way to do boolean searches with the filter? I tried NOT and - before a term but it didn't exclude the term.

@hsoft
Owner
hsoft commented Apr 5, 2015

The search is done on the whole path, so if there's any number anywhere in the path, \d+ would always match. Your best resource here would be a tutorial on regexes, like http://www.regular-expressions.info/tutorial.html

@truth1ness

Oh. In your documentation it says filename, not file path, you might want to update that if it is searching path as that could lead to accidents.

Is there a way to apply the filter to just the file names? Or should I open a new request? It seems like a genuinely useful feature if it's not already there.

@hsoft hsoft added a commit that referenced this issue Apr 5, 2015
@hsoft Clarify documentation about results filtering
It wasn't clear that filtering was applied to whole paths.

ref #294
3b6fe99
@hsoft
Owner
hsoft commented Apr 5, 2015

You're right, it wasn't clear in the documentation that filters are applied to whole paths. I fixed in in the commit referenced above.

As for filtering just the filename, again, it's possible with some regex-foo, but I'll admit that it's not the easiest thing to do. For example, with the filter d+[^/]*$, we get all dupes with a digit in their filenames. We do this by saying "give me all paths with a digit not followed by a slash until the end". This is equivalent to a filename constraint.

@truth1ness

Cool, thanks. Perhaps adding a page in the documentation for this and a few other common regex's would be useful if you don't want to add the specific feature. The first regex I could probably have done but the second one would probably take me a while to figure out and I'd still be hesitant about its reliability with a big set of files I'm about to delete.

A link in the preferences next to the regex option would help people to find this list, too.

@hsoft hsoft added accepted docs and removed feature thinking labels Apr 5, 2015
@hsoft
Owner
hsoft commented Apr 5, 2015

Yes, I agree the documentation can be improved further to give example. I'm changing the scope of this ticket to a Documentation ticket with the goal of letting the next user with the same needs as you to have a smoother ride :)

@truth1ness

Hey, I finally got around to trying out the d+[^/]*$ you suggested and I'm not sure if I'm using it wrong but it doesn't seem to be working. You can see here it returns some items that don't have any numbers in either the file name or the path:

http://i.imgur.com/j4E0b1y.png

@hsoft
Owner
hsoft commented Apr 12, 2015

Oh! sorry, I mistyped my example and left out the leading backslash. The correct expression is \d+[^/]*$!

@hsoft hsoft removed the accepted label Apr 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment