New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make 'lyric_filename()' search harder #2567

Merged
merged 24 commits into from Mar 18, 2018

Conversation

Projects
None yet
2 participants
@elbeardmorez
Contributor

elbeardmorez commented Sep 17, 2017

Hi,

Here I wanted lyric files sourced via other programs to be visible in the View Lyrics pane

So specifically, before this commit I'd have to manually rename any lyric file I wanted to see in the ViewLyrics pane to the bespoke ViewLyrics format of '~/.lyrics/<artist/lyricartist>/<title>.lyric', and now I don't have to, and that convenience is why I would like this patch to go into Quod Libet.

To facilitate this the changes added are:.

-search alternate extensions: 'lyrics', '', 'txt'
-add another default 'root'/'<artist> - <title>/lyric search path
-add local song path as another default search path
-add support for lyrics filename configuration via two '[memory]'
config variables: 'lyric_rootpaths' and 'lyric_filenames'. song
variables can also be used in these comma delimited list values.

config example:
lyric_filenames = <artist>.-.<title>,<artist> - <title>.lyrics_mod
lyric_rootpaths = ~/music/lyrics

debug example:

D: 1.660: ViewLyrics.plugin_on_song_started: Looking for lyrics for ~/media/music/library/Dire Straits/Money for Nothing (1988)/09. Dire Straits - Private Investigations.mp3
D: 1.660: MP3File.lyric_filename:
searching for lyrics in:
~/media/music/lyrics/<artist>.-.<title>
~/media/music/lyrics/<artist> - <title>.lyrics_mod
~/media/music/lyrics/Dire Straits/Private Investigations.lyric
~/media/music/lyrics/Dire Straits - Private Investigations.lyric
~/.lyrics/<artist>.-.<title>
~/.lyrics/<artist> - <title>.lyrics_mod
~/.lyrics/Dire Straits/Private Investigations.lyric
~/.lyrics/Dire Straits - Private Investigations.lyric
~/media/music/library/Dire Straits/Money for Nothing (1988)/<artist>.-.<title>
~/media/music/library/Dire Straits/Money for Nothing (1988)/<artist> - <title>.lyrics_mod
~/media/music/library/Dire Straits/Money for Nothing (1988)/Dire Straits/Private Investigations.lyric
~/media/music/library/Dire Straits/Money for Nothing (1988)/Dire Straits - Private Investigations.lyric
D: 1.661: MP3File.lyric_filename: extending search to extensions: ['lyric', 'lyrics', '', 'txt']
D: 1.661: MP3File.call: Reading lyrics from ~/media/music/lyrics/Dire Straits - Private Investigations.txt

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 411fa4e to 001c427 Sep 17, 2017

# add defaults
lyric_filenames.append(
(self.comma("lyricist") or self.comma("artist"))
.replace(u'/', u'')[:128] + os.sep +

This comment has been minimized.

@declension

declension Sep 22, 2017

Member

Can we extract this repeated string munging to a named function please (e.g. sanitise, or something)

This comment has been minimized.

@elbeardmorez

elbeardmorez Nov 5, 2017

Contributor

good call. done

match_ = ""
for pathfile in pathfiles.keys():
path = expanduser(pathfile)
if '<' in path:

This comment has been minimized.

@declension

declension Sep 22, 2017

Member

Hmmm, you can create a path with > on most OSes, however unlikely it might be...

This comment has been minimized.

@elbeardmorez

elbeardmorez Nov 5, 2017

Contributor

done, note that this 'issue' would only result in extra calls to the builtin 'ArbitraryExtensionFileFromPattern' function ..saves on cycles their, but we've added cycles by switching to regex.

@declension
point of me mentioning this detail, is that this code has already slipped into a previous merge #2560 | a64e2d6 ..happy to create another to PR to correct this if you'd like??

lyric_extensions = ('lyric', 'lyrics', '', 'txt')
for pathfile in pathfiles_expanded.keys():
ext = os.path.splitext(pathfile)[1][1:]
path = pathfile[:-1 * len(ext)].strip('.') if ext else pathfile

This comment has been minimized.

@declension

declension Sep 22, 2017

Member

I'm having trouble understanding what this logic is trying to do - it's fairly highly nested with lots of paths through the logic. Can it be simplified / methods extracted to make it clearer?

This comment has been minimized.

@elbeardmorez

elbeardmorez Nov 5, 2017

Contributor

good call again, i've blitzed it with comments and pulled out a big chunk of the logic to the 'generate_mod_ext_paths' nested function whose name denotes purpose

@declension

This comment has been minimized.

Member

declension commented Sep 22, 2017

Also, my normal comment - please add some tests for this core code 😄

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 001c427 to 492b6dc Nov 5, 2017

@elbeardmorez

This comment has been minimized.

Contributor

elbeardmorez commented Nov 5, 2017

@declension. thanks for your help and sorry for the delay! other work(s) will likely suffer delay now too i'm afraid so apologies in advance (i'd much prefer to be responsive!)

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 492b6dc to a2d90ed Nov 5, 2017

@declension

Looking much improved, thanks @elbeardmorez. Not sure why the tests are failing right now

from quodlibet.pattern import ArbitraryExtensionFileFromPattern
def sanitise(sep, parts):
return sep.join(list(map(lambda s: s.replace(u'/', u'')[:128],

This comment has been minimized.

@declension

declension Nov 14, 2017

Member

How about sep.join(part.replace(u'/', u'')[:128] for part in parts)? We prefer comprehensions generally.

This comment has been minimized.

@elbeardmorez

elbeardmorez Feb 25, 2018

Contributor

done

#print_d("searching for lyrics in:\n%s" % '\n'.join(pathfiles.keys()))
# expand each raw pathfile in turn and test for existence

This comment has been minimized.

@declension

declension Nov 14, 2017

Member

This could be a function - any time there's a block with a comment we can normally extract to function (then use docstrings :) )

This comment has been minimized.

@elbeardmorez

elbeardmorez Feb 25, 2018

Contributor

done

extra_extensions = [x for x in lyric_extensions if x != ext]
# join valid new extensions to pathfile stub and return
return list(map(lambda ext: '.'.join([path, ext])

This comment has been minimized.

@declension

declension Nov 14, 2017

Member

Also comprehension would read nicely here: ['.'.join([path, ext] if ext else path for ext in extra_extensions], or something.

This comment has been minimized.

@elbeardmorez

elbeardmorez Feb 25, 2018

Contributor

done

break
if match_:
break

This comment has been minimized.

@declension

declension Nov 14, 2017

Member

Perhaps debug output here would be useful

This comment has been minimized.

@elbeardmorez

elbeardmorez Feb 25, 2018

Contributor

added another (commented out) debug print line elsewhere for the 'found by extended extension search' case

msg_fail = str("item1: %s !=\nitem2: %s" % (a, b))
self.assertEqual(a, b, msg_fail)
with temp_filename() as filename:

This comment has been minimized.

@declension

declension Nov 14, 2017

Member

These all seem like great independent tests. Some people use the rule of "one assertion per test" - bit strict IMO but the comments here are good indicators that there are separate conditions being tests => separate tests. If there's code common to each of them you can use methods, setUp, decorators or helpers like some of the other tests...

This comment has been minimized.

@elbeardmorez

elbeardmorez Feb 25, 2018

Contributor

done, although i didn't make use of the unit-test specific setup etc. as that would have required separating the rather broad set of tests in this 'test_format_audio' set into different classes in order that my setup (e.g. custom paths/name templates) wouldn't affect other tests with hard-coded expected paths/names

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from b0a9af9 to 9e956c8 Feb 5, 2018

@quodlibet quodlibet deleted a comment from codecov bot Feb 5, 2018

elbeardmorez added some commits Sep 13, 2017

make 'lyric_filename()' search harder
-search alternate extensions: 'lyrics', '', 'txt'
-add another default 'root'/'<artist> - <title>/lyric search path
-add support for lyrics filename configuration via two '[memory]'
config variables: 'lyric_rootpaths' and 'lyric_filenames'. song
variables can also be used in these comma delimited list values.
e.g.
lyric_filenames = <artist>.-.<title>,<artist> - <title>.lyrics_mod
lyric_rootpaths = ~/music/lyrics
improve code readability
-pull out duplicate string munging to separate 'sanitise' function
avoid trying to resolve a filename unnecessarily
-use a regular expression over trivial string comparison to
determine whether a potential filename contains a Quod Libet
placeholder (e.g. <title>)
improve code readability
-pull out logic which creates the extended set of search paths
based on extension modification only to a 'generate_mod_ext_paths'
function
-add many more comments
-improve logging (commented by default)
add tests
-test built-in default
-test built-in default local path
-test custom default (fnf fallback!)
-test user defined
-test order priority
-test modified extension fallback search
-test '<' and/or '>' in name
-test '<' and '>' in name across path
adapt for case-insensitive filesystems
-push filename escaping code path beyond the initial globbing
for existence
-make string comparison case insensitive in tests
add RootPathFile utility container to discern file parts
-simple container used for discerning a pathfile's 'root directory
part' from its 'end part'. the variable depth of a pathfile's 'end
part' (e.g. the part of the path constructed from song attributes)
renders  os.path built-ins (basename etc.) useless for this purpose
-optionally recover 'escaped' versions of the parts too
fix searching for special escaped character on nt os
-use RootPathFile containers to ease testing of escaped file parts

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 9e956c8 to 60ea806 Feb 15, 2018

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 60ea806 to 2b9404c Feb 15, 2018

improve code readability
-pull out 'expand_pathfile' code to separate function

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch 2 times, most recently from 4d03464 to 97cb5e8 Feb 17, 2018

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch 10 times, most recently from 8309d3b to f1409cf Feb 17, 2018

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from f1409cf to 77695b2 Feb 25, 2018

@elbeardmorez elbeardmorez force-pushed the elbeardmorez:lyricsfilesearch branch from 77695b2 to 5876d95 Feb 25, 2018

@elbeardmorez

This comment has been minimized.

Contributor

elbeardmorez commented Mar 13, 2018

@declension: this one's good to go now :)

@property
def end_escaped(self):
escaped = list(map(lambda part: escape_filename(part),

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

like quite a lot of Python projects, we try to avoid map and especially anonymous functions in favour of comprehensions:

escaped = [escape_filename(part) for part in self.end.split(os.path.sep)]

is much clearer (and avoids the Python3 list conversion problem)

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 16, 2018

Contributor

The third time (at least!) that you've suggested this, apologies. I'll go through and remove all maps in the remainder of all other code too.

# pass custom error message to avoid truncation!
a2 = os.path.realpath(a).lower()
b2 = os.path.realpath(b).lower()
msg_fail = str("item1: %s !=\nitem2: %s" % (a2, b2))

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

This message is roughly equivalent to pytest / unittest default one, no?

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 16, 2018

Contributor

This change was a commit in its own right, use custom assert to avoid error truncation, where the original comment was much more derogatory - I could not quite believe I was in a situation where the part of the error message (only viewable through one of the build system interfaces) that I needed in order to diagnose the issue, was missing, truncated

However, having now just spent 20 minutes searching through the 100s of build logs for this commit, I can't find the concrete example that drove me to it. Willing to concede that I just couldn't see the wood for the trees at the time ..hence I'll strip it out.

This comment has been minimized.

@declension

declension Mar 16, 2018

Member

I think maybe it depends on the parameter passed to pytest too (or maybe even that's changed over time) e.g. https://docs.pytest.org/en/latest/usage.html#modifying-python-traceback-printing

@@ -409,6 +410,193 @@ def test_lyric_filename(self):
song["lyricist"] = u"Lyricist"
self.assertTrue(isinstance(song.lyric_filename, fsnative))
def assert_paths_equal(self, a, b):
# pass custom error message to avoid truncation!
a2 = os.path.realpath(a).lower()

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

Not sure why /tmp/FOO and /tmp/foo should be the same path. This is only true on Windows, and even then only sometimes. Have I misunderstood this?

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 17, 2018

Contributor
tests\test_formats__audio.py:418: in assert_paths_equal
    self.assertEqual(a2, b2, msg_fail)
E   AssertionError: 'C:\\users\\user\\temp\\ql-test-0r0eu8s0\\a%20%3C%20b\\b%20%3E%20a.lyric' != 'C:\\users\\user\\Temp\\QL-TEST-0r0eu8s0\\a%20%3C%20b\\b%20%3E%20a.lyric'
E   - C:\users\user\temp\ql-test-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric
E   ?               ^    ^^ ^^^^
E   + C:\users\user\Temp\QL-TEST-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric
E   ?               ^    ^^ ^^^^
E    : item1: C:\users\user\temp\ql-test-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric !=
E   item2: C:\users\user\Temp\QL-TEST-0r0eu8s0\a%20%3C%20b\b%20%3E%20a.lyric

When I strip this custom assert out, the lower() will have to go back much closer to the source of its need. You're right to question it, the issue is subtle.

I state 'create me a test file to dump lyrics in', tempfile returns blah\\Temp\\blah, i construct an expected path rpf.pathfile that the lyrics_filename search algo should yield. It constructs path guesses based on the file name (plus derivations using song info parts) etc.

Annoyingly self.comma('~filename') yields blah\\temp\\blah and as such

 586   lyric_paths.append(
 587       os.path.join(os.path.dirname(self.comma('~filename'))))

will result in a match. There's never anything wrong with that match per se as ~filename is always correct. Why is it 'always correct?'

AudioFile() -> sanitize(pathfile) -> _normalize_path -> os.path.normcase(filename)[quodlibet/quodlibet/util/path.py:~344]

os.path.normcase(path)
Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes

So it only gets lower()d on nt. Why do we bother with normcase? Forward slash conversion I assume, and the change of case is just unwanted fallout. "Correct that then?!" -> forgive me but I just can't be bothered with the rabbit hole(s) that changes to that might bring, and hence, i'll just ensure I set both test path and result path lower().

with temp_filename() as filename:
s = self.lyric_filename_search_test_song(filename)
root = os.path.dirname(filename)
fp = os.path.join(root, s["artist"] + " - " +

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

This is repeated several times - perhaps it can be a helper method..

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 17, 2018

Contributor

Done. Pulled common code into lyric_filename_test_setup context helper

root = os.path.dirname(filename)
config.set("memory", "lyric_rootpaths", root)
root2 = os.path.join(get_home_dir(), ".lyrics") # built-in default
fp2 = os.path.join(root2, s["artist"] + " - " +

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

...more here

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 17, 2018

Contributor

Done

root = os.path.dirname(filename)
path_variants = \
['<oldskool>'] if (os.name == "nt") \

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

FYI we have a helper method for checking for windows, worth using probably (helps tracking of OS-specific stuff too)

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 17, 2018

Contributor

Yep, switched to environment's 'is_windows()'

return os.path.sep.join([self.root, self.end_escaped])
@property
def is_valid(self):

This comment has been minimized.

@declension

declension Mar 14, 2018

Member

Personally I'd argue a method should be is_valid() but a property should just be valid

This comment has been minimized.

@elbeardmorez

elbeardmorez Mar 17, 2018

Contributor

agree entirely.

elbeardmorez added some commits Mar 17, 2018

use 'is_windows' helper
-ease OS specific code tracking
revert to built-in 'assertEqual'
-truncation of vital debugging text doesn't seem to be an issue now,
no idea what (if anything) changed (e.g. pytest flags for verbosity)
-clarifies use of 'lower' function in string comparisons
remove some repeated code sections from tests
-nest test file creation within a setup context

@declension declension merged commit 9c53f4e into quodlibet:master Mar 18, 2018

6 checks passed

ci/circleci: job.fedora26 Your tests passed on CircleCI!
Details
ci/circleci: job.ubuntu16.04 Your tests passed on CircleCI!
Details
ci/circleci: job.ubuntu17.10 Your tests passed on CircleCI!
Details
ci/circleci: job.win32 Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@declension

This comment has been minimized.

Member

declension commented Mar 18, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment