replace fuzzywuzzy #858

maxbachmann · 2020-09-30T12:20:13Z

FuzzyWuzzy is GPLv2 licensed which would force you to licence the whole project under GPLv2.
For this reason this Pullrequest replaces FuzzyWuzzy with rapidfuzz which is implementing the same algorithm but is based on a version of fuzzywuzzy that was MIT licensed.
Rapidfuzz is:

Mit licensed so it can be used with the license used by this project
Is faster than FuzzyWuzzy

Since it is written in C++14 on Windows it requires the C++ Redistributable 2015 to be installed (or newer versions like 2019, that include the 2015 version automatically)

maxbachmann · 2020-09-30T12:20:54Z

spotdl/search/provider.py

+        for eachLetter in str2:
            if eachLetter.isalnum() or eachLetter.isspace():
                newStr2 += eachLetter


I am pretty certain, that this should be str2, since otherwise it would just compare str1 with str1

maxbachmann · 2020-09-30T12:22:34Z

spotdl/search/provider.py


-    A wrapper around `fuzzywuzzy.partial_ratio` to handle UTF-8 encoded
+    A wrapper around `rapidfuzz.fuzz.partial_ratio` to handle UTF-8 encoded
    emojis that usually cause errors
    '''

    #! this will throw an error if either string contains a UTF-8 encoded emoji 


Do you have an example when this occurs? I failed to reproduce this both with fuzzywuzzy and rapidfuzz

Mostly happens when you try to parse YouTube Music playlists. We actually blanket-parse all YouTube Music results so if a playlist has emojis. The whole thing breaks even though we have no use for emojis. The same goes for video results with emojis in them.

Eg. EMOJI CHALLENGE ★ Guess the Fifth Harmony (including Camila) Song Titles, the ⭐ will cause an error.

Hm strange I still fail to reproduce this

fuzz.partial_ratio("EMOJI CHALLENGE ★ Guess the Fifth Harmony (including Camila) Song Titles", "EMOJI CHALLENGE ★")

works fine for me. At least in RapidFuzz I would consider this as a bug.

btw I did see you usually lowercase the strings aswell before you match them, so you could use

fuzz.partial_ratio(str1, str2, score_cutoff=score_cutoff, processor=True)

instead which would lowercase the strings, remove all non alphanumeric characters and trim whitespaces at the start and end of the string (but is faster than doing the same thing in python)

or pass a custom function when you want some different kind of preprocessing (the preprocessor has to accept a string as argument and return the preprocessed string)

fuzz.partial_ratio(str1, str2, score_cutoff=score_cutoff, processor=your_preprocessor_function)

maxbachmann · 2020-09-30T12:25:02Z

spotdl/search/provider.py

@@ -402,15 +406,15 @@ def search_and_order_ytm_results(songName: str, songArtists: List[str],
        #! we use fuzzy matching because YouTube spellings might be mucked up
        if result['type'] == 'song':
            for artist in songArtists:
-                if match_percentage (artist.lower(), result['artist'].lower()) > 85:
+                if match_percentage (artist.lower(), result['artist'].lower(), 85):


using the score_cutoff this way, allows the fuzzy matching to exit early when the score can not be reached

Good design decision.

ghost

Good job.

Use commit messsages, not just commit titles.

maxbachmann · 2020-10-04T15:32:14Z

Use commit messsages, not just commit titles.

Your right I should really do this 👍

@maxbachmann

Author: @maxbachmann comma missing in install_requires since #858. closes #869

replace fuzzywuzzy

a15832e

maxbachmann commented Sep 30, 2020

View reviewed changes

ghost approved these changes Oct 4, 2020

View reviewed changes

ghost merged commit 845fd41 into spotDL:master Oct 4, 2020

maxbachmann mentioned this pull request Oct 4, 2020

add missing comma #868

Merged

Aeris1One mentioned this pull request Oct 4, 2020

Install failed due to rapidfuzz #869

Closed

ghost pushed a commit that referenced this pull request Oct 5, 2020

Fix Setup.py breakage

2edd131

Author: @maxbachmann comma missing in install_requires since #858. closes #869

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace fuzzywuzzy #858

replace fuzzywuzzy #858

maxbachmann commented Sep 30, 2020

maxbachmann Sep 30, 2020 •

edited

ghost Oct 4, 2020

maxbachmann Sep 30, 2020

ghost Oct 4, 2020

maxbachmann Oct 4, 2020 •

edited

maxbachmann Sep 30, 2020

ghost Oct 4, 2020

ghost left a comment

maxbachmann commented Oct 4, 2020 •

edited

replace fuzzywuzzy #858

replace fuzzywuzzy #858

Conversation

maxbachmann commented Sep 30, 2020

maxbachmann Sep 30, 2020 • edited

Choose a reason for hiding this comment

ghost Oct 4, 2020

Choose a reason for hiding this comment

maxbachmann Sep 30, 2020

Choose a reason for hiding this comment

ghost Oct 4, 2020

Choose a reason for hiding this comment

maxbachmann Oct 4, 2020 • edited

Choose a reason for hiding this comment

maxbachmann Sep 30, 2020

Choose a reason for hiding this comment

ghost Oct 4, 2020

Choose a reason for hiding this comment

ghost left a comment

Choose a reason for hiding this comment

maxbachmann commented Oct 4, 2020 • edited

maxbachmann Sep 30, 2020 •

edited

maxbachmann Oct 4, 2020 •

edited

maxbachmann commented Oct 4, 2020 •

edited