Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add other methods #6

Open
12 tasks
tresoldi opened this issue Mar 3, 2021 · 0 comments
Open
12 tasks

Add other methods #6

tresoldi opened this issue Mar 3, 2021 · 0 comments
Assignees
Milestone

Comments

@tresoldi
Copy link
Collaborator

tresoldi commented Mar 3, 2021

There is a handful of methods that could be added, considering their potential in philological studies. These are (not considering alignment ones):

  • Hamming (easy to add, could be good as a baseline)
  • Python's various difflib.ratio (could be the fastest ones)
  • Osamu Gotoh (1982) affine gap (could be used as a generic alignment one)
  • Longest Common SubSequence Similarity (perhaps not so useful, but trivial to implement)
  • Running Length Encoding (an additional compression method, that might be easier to explain/demonstratet -- no point in implementing the Burrows-Wheeler transform, though)
  • Square Root NCD (additional compression method)
  • Tversky index (additional token method, but only makes sense after properly integrating ngrams)
  • Overlap coefficient (additional token method, but only makes sense after properly integrating ngrams)
  • Tanimoto distance (additional token method, but only makes sense after properly integrating ngrams)
  • Cosine similarity (additional token method, but only makes sense after properly integrating ngrams)
  • Monge-Elkan (additional token method, but only makes sense after properly integrating ngrams)
  • Bag distance (additional token method, but only makes sense after properly integrating ngrams)
@tresoldi tresoldi self-assigned this Mar 3, 2021
@tresoldi tresoldi added this to the v0.4 milestone Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant