Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make 'lyric_filename()' search harder #2567

Merged
merged 24 commits into from Mar 18, 2018

Conversation

elbeardmorez
Copy link
Contributor

@elbeardmorez elbeardmorez commented Sep 17, 2017

Hi,

Here I wanted lyric files sourced via other programs to be visible in the View Lyrics pane

So specifically, before this commit I'd have to manually rename any lyric file I wanted to see in the ViewLyrics pane to the bespoke ViewLyrics format of '~/.lyrics/<artist/lyricartist>/<title>.lyric', and now I don't have to, and that convenience is why I would like this patch to go into Quod Libet.

To facilitate this the changes added are:.

-search alternate extensions: 'lyrics', '', 'txt'
-add another default 'root'/'<artist> - <title>/lyric search path
-add local song path as another default search path
-add support for lyrics filename configuration via two '[memory]'
config variables: 'lyric_rootpaths' and 'lyric_filenames'. song
variables can also be used in these comma delimited list values.

config example:
lyric_filenames = <artist>.-.<title>,<artist> - <title>.lyrics_mod
lyric_rootpaths = ~/music/lyrics

debug example:

D: 1.660: ViewLyrics.plugin_on_song_started: Looking for lyrics for ~/media/music/library/Dire Straits/Money for Nothing (1988)/09. Dire Straits - Private Investigations.mp3
D: 1.660: MP3File.lyric_filename:
searching for lyrics in:
~/media/music/lyrics/<artist>.-.<title>
~/media/music/lyrics/<artist> - <title>.lyrics_mod
~/media/music/lyrics/Dire Straits/Private Investigations.lyric
~/media/music/lyrics/Dire Straits - Private Investigations.lyric
~/.lyrics/<artist>.-.<title>
~/.lyrics/<artist> - <title>.lyrics_mod
~/.lyrics/Dire Straits/Private Investigations.lyric
~/.lyrics/Dire Straits - Private Investigations.lyric
~/media/music/library/Dire Straits/Money for Nothing (1988)/<artist>.-.<title>
~/media/music/library/Dire Straits/Money for Nothing (1988)/<artist> - <title>.lyrics_mod
~/media/music/library/Dire Straits/Money for Nothing (1988)/Dire Straits/Private Investigations.lyric
~/media/music/library/Dire Straits/Money for Nothing (1988)/Dire Straits - Private Investigations.lyric
D: 1.661: MP3File.lyric_filename: extending search to extensions: ['lyric', 'lyrics', '', 'txt']
D: 1.661: MP3File.call: Reading lyrics from ~/media/music/lyrics/Dire Straits - Private Investigations.txt

# add defaults
lyric_filenames.append(
(self.comma("lyricist") or self.comma("artist"))
.replace(u'/', u'')[:128] + os.sep +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this repeated string munging to a named function please (e.g. sanitise, or something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. done

match_ = ""
for pathfile in pathfiles.keys():
path = expanduser(pathfile)
if '<' in path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, you can create a path with > on most OSes, however unlikely it might be...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, note that this 'issue' would only result in extra calls to the builtin 'ArbitraryExtensionFileFromPattern' function ..saves on cycles their, but we've added cycles by switching to regex.

@declension
point of me mentioning this detail, is that this code has already slipped into a previous merge #2560 | a64e2d6 ..happy to create another to PR to correct this if you'd like??

lyric_extensions = ('lyric', 'lyrics', '', 'txt')
for pathfile in pathfiles_expanded.keys():
ext = os.path.splitext(pathfile)[1][1:]
path = pathfile[:-1 * len(ext)].strip('.') if ext else pathfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding what this logic is trying to do - it's fairly highly nested with lots of paths through the logic. Can it be simplified / methods extracted to make it clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call again, i've blitzed it with comments and pulled out a big chunk of the logic to the 'generate_mod_ext_paths' nested function whose name denotes purpose

@declension
Copy link
Member

Also, my normal comment - please add some tests for this core code 😄

@elbeardmorez
Copy link
Contributor Author

@declension. thanks for your help and sorry for the delay! other work(s) will likely suffer delay now too i'm afraid so apologies in advance (i'd much prefer to be responsive!)

Copy link
Member

@declension declension left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much improved, thanks @elbeardmorez. Not sure why the tests are failing right now

from quodlibet.pattern import ArbitraryExtensionFileFromPattern

def sanitise(sep, parts):
return sep.join(list(map(lambda s: s.replace(u'/', u'')[:128],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about sep.join(part.replace(u'/', u'')[:128] for part in parts)? We prefer comprehensions generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


#print_d("searching for lyrics in:\n%s" % '\n'.join(pathfiles.keys()))

# expand each raw pathfile in turn and test for existence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a function - any time there's a block with a comment we can normally extract to function (then use docstrings :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

extra_extensions = [x for x in lyric_extensions if x != ext]

# join valid new extensions to pathfile stub and return
return list(map(lambda ext: '.'.join([path, ext])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also comprehension would read nicely here: ['.'.join([path, ext] if ext else path for ext in extra_extensions], or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

break
if match_:
break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps debug output here would be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another (commented out) debug print line elsewhere for the 'found by extended extension search' case

msg_fail = str("item1: %s !=\nitem2: %s" % (a, b))
self.assertEqual(a, b, msg_fail)

with temp_filename() as filename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all seem like great independent tests. Some people use the rule of "one assertion per test" - bit strict IMO but the comments here are good indicators that there are separate conditions being tests => separate tests. If there's code common to each of them you can use methods, setUp, decorators or helpers like some of the other tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, although i didn't make use of the unit-test specific setup etc. as that would have required separating the rather broad set of tests in this 'test_format_audio' set into different classes in order that my setup (e.g. custom paths/name templates) wouldn't affect other tests with hard-coded expected paths/names

-search alternate extensions: 'lyrics', '', 'txt'
-add another default 'root'/'<artist> - <title>/lyric search path
-add support for lyrics filename configuration via two '[memory]'
config variables: 'lyric_rootpaths' and 'lyric_filenames'. song
variables can also be used in these comma delimited list values.
e.g.
lyric_filenames = <artist>.-.<title>,<artist> - <title>.lyrics_mod
lyric_rootpaths = ~/music/lyrics
-pull out duplicate string munging to separate 'sanitise' function
-use a regular expression over trivial string comparison to
determine whether a potential filename contains a Quod Libet
placeholder (e.g. <title>)
-pull out logic which creates the extended set of search paths
based on extension modification only to a 'generate_mod_ext_paths'
function
-add many more comments
-improve logging (commented by default)
-test built-in default
-test built-in default local path
-test custom default (fnf fallback!)
-test user defined
-test order priority
-test modified extension fallback search
-test '<' and/or '>' in name
-test '<' and '>' in name across path
-push filename escaping code path beyond the initial globbing
for existence
-make string comparison case insensitive in tests
-simple container used for discerning a pathfile's 'root directory
part' from its 'end part'. the variable depth of a pathfile's 'end
part' (e.g. the part of the path constructed from song attributes)
renders  os.path built-ins (basename etc.) useless for this purpose
-optionally recover 'escaped' versions of the parts too
-use RootPathFile containers to ease testing of escaped file parts
-pull out 'expand_pathfile' code to separate function
@elbeardmorez elbeardmorez force-pushed the lyricsfilesearch branch 2 times, most recently from 4d03464 to 97cb5e8 Compare February 17, 2018 14:05
@elbeardmorez
Copy link
Contributor Author

@declension: this one's good to go now :)


@property
def end_escaped(self):
escaped = list(map(lambda part: escape_filename(part),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like quite a lot of Python projects, we try to avoid map and especially anonymous functions in favour of comprehensions:

escaped = [escape_filename(part) for part in self.end.split(os.path.sep)]

is much clearer (and avoids the Python3 list conversion problem)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third time (at least!) that you've suggested this, apologies. I'll go through and remove all maps in the remainder of all other code too.

# pass custom error message to avoid truncation!
a2 = os.path.realpath(a).lower()
b2 = os.path.realpath(b).lower()
msg_fail = str("item1: %s !=\nitem2: %s" % (a2, b2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is roughly equivalent to pytest / unittest default one, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was a commit in its own right, use custom assert to avoid error truncation, where the original comment was much more derogatory - I could not quite believe I was in a situation where the part of the error message (only viewable through one of the build system interfaces) that I needed in order to diagnose the issue, was missing, truncated

However, having now just spent 20 minutes searching through the 100s of build logs for this commit, I can't find the concrete example that drove me to it. Willing to concede that I just couldn't see the wood for the trees at the time ..hence I'll strip it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe it depends on the parameter passed to pytest too (or maybe even that's changed over time) e.g. https://docs.pytest.org/en/latest/usage.html#modifying-python-traceback-printing

@@ -409,6 +410,193 @@ def test_lyric_filename(self):
song["lyricist"] = u"Lyricist"
self.assertTrue(isinstance(song.lyric_filename, fsnative))

def assert_paths_equal(self, a, b):
# pass custom error message to avoid truncation!
a2 = os.path.realpath(a).lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why /tmp/FOO and /tmp/foo should be the same path. This is only true on Windows, and even then only sometimes. Have I misunderstood this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests\test_formats__audio.py:418: in assert_paths_equal
    self.assertEqual(a2, b2, msg_fail)
E   AssertionError: 'C:\\users\\user\\temp\\ql-test-0r0eu8s0\\a%20%3C%20b\\b%20%3E%20a.lyric' != 'C:\\users\\user\\Temp\\QL-TEST-0r0eu8s0\\a%20%3C%20b\\b%20%3E%20a.lyric'
E   - C:\users\user\temp\ql-test-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric
E   ?               ^    ^^ ^^^^
E   + C:\users\user\Temp\QL-TEST-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric
E   ?               ^    ^^ ^^^^
E    : item1: C:\users\user\temp\ql-test-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric !=
E   item2: C:\users\user\Temp\QL-TEST-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric

When I strip this custom assert out, the lower() will have to go back much closer to the source of its need. You're right to question it, the issue is subtle.

I state 'create me a test file to dump lyrics in', tempfile returns blah\\Temp\\blah, i construct an expected path rpf.pathfile that the lyrics_filename search algo should yield. It constructs path guesses based on the file name (plus derivations using song info parts) etc.

Annoyingly self.comma('~filename') yields blah\\temp\\blah and as such

 586   lyric_paths.append(
 587       os.path.join(os.path.dirname(self.comma('~filename'))))

will result in a match. There's never anything wrong with that match per se as ~filename is always correct. Why is it 'always correct?'

AudioFile() -> sanitize(pathfile) -> _normalize_path -> os.path.normcase(filename)[quodlibet/quodlibet/util/path.py:~344]

os.path.normcase(path)
Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes

So it only gets lower()d on nt. Why do we bother with normcase? Forward slash conversion I assume, and the change of case is just unwanted fallout. "Correct that then?!" -> forgive me but I just can't be bothered with the rabbit hole(s) that changes to that might bring, and hence, i'll just ensure I set both test path and result path lower().

with temp_filename() as filename:
s = self.lyric_filename_search_test_song(filename)
root = os.path.dirname(filename)
fp = os.path.join(root, s["artist"] + " - " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated several times - perhaps it can be a helper method..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Pulled common code into lyric_filename_test_setup context helper

root = os.path.dirname(filename)
config.set("memory", "lyric_rootpaths", root)
root2 = os.path.join(get_home_dir(), ".lyrics") # built-in default
fp2 = os.path.join(root2, s["artist"] + " - " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...more here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

root = os.path.dirname(filename)

path_variants = \
['<oldskool>'] if (os.name == "nt") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we have a helper method for checking for windows, worth using probably (helps tracking of OS-specific stuff too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, switched to environment's 'is_windows()'

return os.path.sep.join([self.root, self.end_escaped])

@property
def is_valid(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd argue a method should be is_valid() but a property should just be valid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree entirely.

-ease OS specific code tracking
-truncation of vital debugging text doesn't seem to be an issue now,
no idea what (if anything) changed (e.g. pytest flags for verbosity)
-clarifies use of 'lower' function in string comparisons
-nest test file creation within a setup context
@declension declension merged commit 9c53f4e into quodlibet:master Mar 18, 2018
@declension
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants