New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with OpenNLP Language detection #512
Conversation
@@ -9,8 +9,7 @@ dependencies { | |||
compile "com.googlecode.libphonenumber:carrier:$googleCarrierVersion" | |||
|
|||
// Optimaize language detection | |||
compile "com.salesforce.transmogrifai:language-detector:$optimaizeLangDetectorVersion" | |||
|
|||
compile "com.optimaize.languagedetector:language-detector:$optimaizeLangDetectorVersion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some fixes I did into Optimaize language detector so we had to make our own version. I think it was related to Guava.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to get rid of this fork by shading if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. No development is happening on that repo, so not a big deal imo.
Let us know if it works better or worse than Optimaize one. And please update the description on the PR explaining the motivation behind it. |
It seems that it has more languages supported. Is this correct? |
Codecov Report
@@ Coverage Diff @@
## master #512 +/- ##
===========================================
- Coverage 86.74% 25.68% -61.06%
===========================================
Files 347 349 +2
Lines 11859 11886 +27
Branches 388 612 +224
===========================================
- Hits 10287 3053 -7234
- Misses 1572 8833 +7261
Continue to review full report at Codecov.
|
Cool - curious to see how this compares. Another one we could try is FastText, which also has a language detection module: https://github.com/facebookresearch/fastText/#full-documentation |
No description provided.