Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Sometimes, MWEs are wrong or misspelled phrases #138

Open
meghdadFar opened this issue Apr 8, 2024 · 0 comments
Open

Bug Report: Sometimes, MWEs are wrong or misspelled phrases #138

meghdadFar opened this issue Apr 8, 2024 · 0 comments
Labels
bug Something isn't working enhancement New feature or request up for grabs

Comments

@meghdadFar
Copy link
Owner

meghdadFar commented Apr 8, 2024

Description

In many cases, when the corpus contains misspelled or foreign words and phrases, top MWEs end up being those very rare misspelled expressions. This is a known problem when measuring PMI.

To Reproduce

Steps to reproduce the behavior:
Simply run MWE extraction and check the results.

Expected behavior

Top MWE results should be common expressions consisting of correct words.

Examples

Light Verb Constructions: LOCK THE DOOOOR

Possible Solutions

The proposed solution is to check the components of MWEs against a lexicon of the selected language to ensure they are actual words and not made-up words.

@meghdadFar meghdadFar added bug Something isn't working enhancement New feature or request up for grabs labels Apr 8, 2024
@meghdadFar meghdadFar changed the title Bug Report: Sometimes, MWEs are pretty rare and uncommon or misspelled phrases Bug Report: Sometimes, MWEs are uncommon, wrong or misspelled phrases Apr 8, 2024
@meghdadFar meghdadFar changed the title Bug Report: Sometimes, MWEs are uncommon, wrong or misspelled phrases Bug Report: Sometimes, MWEs are wrong or misspelled phrases Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request up for grabs
Projects
None yet
Development

No branches or pull requests

1 participant