Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUTO: Score repeated bigrams correctly, fixes #1959 #1993

Merged
merged 1 commit into from Jun 15, 2021

Conversation

github-actions[bot]
Copy link
Contributor

No description provided.

The current Sorensen-Dice coefficient algorithm does not correctly score strings with repeating bigrams. The score can end up being greater than 1 (the max possible score). This is because the algorithm does not consume bigrams as it matches them. The match count ends up being a count of the cartesian join of the matching bigrams. The revised algorithm in this change will consume bigrams as they are matched, preventing the cartesian join situation and providing correct scores.
@conker84 conker84 merged commit b811001 into 4.1 Jun 15, 2021
@conker84 conker84 deleted the auto-4.1-c6d40191e7c97f0f3f60bfb4d26c8590023280a1 branch June 15, 2021 14:37
vga91 pushed a commit to vga91/neo4j-apoc-procedures that referenced this pull request Jun 28, 2021
…4j-contrib#1993)

The current Sorensen-Dice coefficient algorithm does not correctly score strings with repeating bigrams. The score can end up being greater than 1 (the max possible score). This is because the algorithm does not consume bigrams as it matches them. The match count ends up being a count of the cartesian join of the matching bigrams. The revised algorithm in this change will consume bigrams as they are matched, preventing the cartesian join situation and providing correct scores.

Co-authored-by: Tom Larsen <larsenthomasj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants