Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Sometimes, MWEs are uncommon phrases #139

Closed
meghdadFar opened this issue Apr 8, 2024 · 0 comments · Fixed by #141
Closed

Bug Report: Sometimes, MWEs are uncommon phrases #139

meghdadFar opened this issue Apr 8, 2024 · 0 comments · Fixed by #141
Labels
bug Something isn't working enhancement New feature or request

Comments

@meghdadFar
Copy link
Owner

Description

In many cases, the extracted top MWEs are very uncommon. That's because both the MWE and some or all of their components have a very low frequency leading the PMI to be large.

To Reproduce

Steps to reproduce the behavior:
Simply run MWE extraction and check the results.

Expected behavior

Top MWE results should be common expressions not very rare and unknown.

Examples

whip these ninjas

Possible Solutions

Add a frequency threshold (as a parameter) that defaults to 1. MWE candidates that were observed below this threshold are discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant