Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review error rate definitions etc. #45

Open
mikegerber opened this issue Nov 10, 2020 · 2 comments
Open

Review error rate definitions etc. #45

mikegerber opened this issue Nov 10, 2020 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mikegerber
Copy link
Member

No description provided.

@mikegerber mikegerber added the documentation Improvements or additions to documentation label Nov 10, 2020
@mikegerber mikegerber self-assigned this Nov 19, 2020
@bertsky
Copy link
Contributor

bertsky commented Jun 9, 2021

I suggest to implement alignment path length as denominator instead of the GT length (which can be >1):

if d == 0:
return 0, n
if n == 0:
return float("inf"), n
return d / n, n

(Ideally, you implement all 3 length options: alignment path, maximum sequence, GT sequence.)

The problem for dinglehopper is that your levenshtein_matrix does not give you the alignment path, you only have the resulting minimum distance.

@bertsky
Copy link
Contributor

bertsky commented Apr 10, 2024

Update: I recommend using rapidfuzz's normalized_distance instead of just dividing distance by the GT length. Internally (in the CPP backend) the denominator is calculated as the actual path length (=maximum distance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants