You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a handful of methods that could be added, considering their potential in philological studies. These are (not considering alignment ones):
Hamming (easy to add, could be good as a baseline)
Python's various difflib.ratio (could be the fastest ones)
Osamu Gotoh (1982) affine gap (could be used as a generic alignment one)
Longest Common SubSequence Similarity (perhaps not so useful, but trivial to implement)
Running Length Encoding (an additional compression method, that might be easier to explain/demonstratet -- no point in implementing the Burrows-Wheeler transform, though)
Square Root NCD (additional compression method)
Tversky index (additional token method, but only makes sense after properly integrating ngrams)
Overlap coefficient (additional token method, but only makes sense after properly integrating ngrams)
Tanimoto distance (additional token method, but only makes sense after properly integrating ngrams)
Cosine similarity (additional token method, but only makes sense after properly integrating ngrams)
Monge-Elkan (additional token method, but only makes sense after properly integrating ngrams)
Bag distance (additional token method, but only makes sense after properly integrating ngrams)
The text was updated successfully, but these errors were encountered:
There is a handful of methods that could be added, considering their potential in philological studies. These are (not considering alignment ones):
difflib.ratio
(could be the fastest ones)The text was updated successfully, but these errors were encountered: