AUTO: Score repeated bigrams correctly, fixes #1959 #1994

github-actions · 2021-06-11T15:38:17Z

No description provided.

The current Sorensen-Dice coefficient algorithm does not correctly score strings with repeating bigrams. The score can end up being greater than 1 (the max possible score). This is because the algorithm does not consume bigrams as it matches them. The match count ends up being a count of the cartesian join of the matching bigrams. The revised algorithm in this change will consume bigrams as they are matched, preventing the cartesian join situation and providing correct scores.

The current Sorensen-Dice coefficient algorithm does not correctly score strings with repeating bigrams. The score can end up being greater than 1 (the max possible score). This is because the algorithm does not consume bigrams as it matches them. The match count ends up being a count of the cartesian join of the matching bigrams. The revised algorithm in this change will consume bigrams as they are matched, preventing the cartesian join situation and providing correct scores. Co-authored-by: Tom Larsen <larsenthomasj@gmail.com>

The current Sorensen-Dice coefficient algorithm does not correctly score strings with repeating bigrams. The score can end up being greater than 1 (the max possible score). This is because the algorithm does not consume bigrams as it matches them. The match count ends up being a count of the cartesian join of the matching bigrams. The revised algorithm in this change will consume bigrams as they are matched, preventing the cartesian join situation and providing correct scores. Co-authored-by: Tom Larsen <larsenthomasj@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Tom Larsen <larsenthomasj@gmail.com>

github-actions bot assigned fbiville Jun 11, 2021

github-actions bot added the autocreated label Jun 11, 2021

conker84 merged commit 4a76eac into 4.3 Jun 15, 2021

conker84 deleted the auto-4.3-c6d40191e7c97f0f3f60bfb4d26c8590023280a1 branch June 15, 2021 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AUTO: Score repeated bigrams correctly, fixes #1959 #1994

AUTO: Score repeated bigrams correctly, fixes #1959 #1994

github-actions bot commented Jun 11, 2021

AUTO: Score repeated bigrams correctly, fixes #1959 #1994

AUTO: Score repeated bigrams correctly, fixes #1959 #1994

Conversation

github-actions bot commented Jun 11, 2021