New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting tags fails for non-word characters before the separator #2678

Closed
pfps opened this Issue Dec 31, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@pfps
Contributor

pfps commented Dec 31, 2017

I had a tag value that had multiple separators in it, namely
Adam Duritz (1964-); Dan Vickrey (1966-); Ben Mize; David Bryson
(my separator is ;). When I split this quodlibet only split off the last value instead of splitting into four values which is what I expected. After this split it was not possible to perform further splits.

As far as I can tell, there is very little documentation on what quodlibet does when splitting, and the testing in test_util_string.py is minimal. There is a cryptic comment in the code for split_value that might be construed as saying something different, but I think that splitting into more than two parts is the correct behaviour.

This appears to be related to #1088

@pfps

This comment has been minimized.

Contributor

pfps commented Dec 31, 2017

Oh, the problem I'm experiencing is due to the wrapping of separators in split_values. Because I have a non-word character at the end of a "value", the separation is not recognized.

The question is what should be done here. My view is that the word boundary bit should be removed, but that's a decided change from current behaviour.

@pfps pfps changed the title from splitting into multiple values (in current development version) to splitting tag values into multiple values (in current development version) Dec 31, 2017

@declension declension changed the title from splitting tag values into multiple values (in current development version) to Splitting tags fails for non-word characters before the separator Jan 1, 2018

@declension

This comment has been minimized.

Member

declension commented Jan 1, 2018

Thanks for the report @pfps - and I agree with the diagnosis.

I've got something together that seems to sort this "without breaking existing usage", but like you say the test coverage of this area (value-wise, not lines-of-code wise) is low but I'll add a few

declension added a commit that referenced this issue Jan 1, 2018

Allow non-word characters around tag separators
 * Move tests to better location (matching the packages)
 * Add some test cases matching #2678 and #1088
 * Extract regex to parts for clarity, it was going to get unreadable
 * Simplify logic slightly, use `any` with generator comprehension rather than `filter` etc
 * Allow word boundary _or_ non-word character between separators, but don't capture
 * Allows "foo (bar),baz" and "foo,(the) baz" without breaking current tests

Fixes #1088, #2678

@declension declension closed this Jan 1, 2018

@pfps

This comment has been minimized.

Contributor

pfps commented Jan 2, 2018

Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment