Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with OpenNLP Language detection #512

Closed
wants to merge 3 commits into from

Conversation

gerashegalov
Copy link
Contributor

No description provided.

@@ -9,8 +9,7 @@ dependencies {
compile "com.googlecode.libphonenumber:carrier:$googleCarrierVersion"

// Optimaize language detection
compile "com.salesforce.transmogrifai:language-detector:$optimaizeLangDetectorVersion"

compile "com.optimaize.languagedetector:language-detector:$optimaizeLangDetectorVersion"
Copy link
Collaborator

@tovbinm tovbinm Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some fixes I did into Optimaize language detector so we had to make our own version. I think it was related to Guava.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to get rid of this fork by shading if necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. No development is happening on that repo, so not a big deal imo.

@tovbinm
Copy link
Collaborator

tovbinm commented Sep 12, 2020

Let us know if it works better or worse than Optimaize one. And please update the description on the PR explaining the motivation behind it.

@tovbinm
Copy link
Collaborator

tovbinm commented Sep 12, 2020

It seems that it has more languages supported. Is this correct?

@codecov
Copy link

codecov bot commented Sep 12, 2020

Codecov Report

Merging #512 into master will decrease coverage by 61.05%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #512       +/-   ##
===========================================
- Coverage   86.74%   25.68%   -61.06%     
===========================================
  Files         347      349        +2     
  Lines       11859    11886       +27     
  Branches      388      612      +224     
===========================================
- Hits        10287     3053     -7234     
- Misses       1572     8833     +7261     
Impacted Files Coverage Δ
...lesforce/op/stages/impl/feature/LangDetector.scala 0.00% <0.00%> (-100.00%) ⬇️
.../op/stages/impl/feature/NameEntityRecognizer.scala 0.00% <0.00%> (-100.00%) ⬇️
...esforce/op/stages/impl/feature/TextTokenizer.scala 0.00% <0.00%> (-97.37%) ⬇️
...sforce/op/utils/text/OpenNLPLanguageDetector.scala 0.00% <0.00%> (ø)
...a/com/salesforce/op/utils/text/OpenNLPModels.scala 0.00% <0.00%> (-97.62%) ⬇️
...orce/op/utils/text/OptimaizeLanguageDetector.scala 0.00% <0.00%> (-90.91%) ⬇️
...om/salesforce/op/utils/text/LanguageDetector.scala 0.00% <0.00%> (ø)
...main/scala/com/salesforce/op/dsl/RichFeature.scala 0.00% <0.00%> (-100.00%) ⬇️
...main/scala/com/salesforce/op/filters/Summary.scala 0.00% <0.00%> (-100.00%) ⬇️
.../scala/com/salesforce/op/cli/gen/ProblemKind.scala 0.00% <0.00%> (-100.00%) ⬇️
... and 210 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d46181...6f42be6. Read the comment docs.

@tovbinm tovbinm marked this pull request as draft September 12, 2020 00:16
@Jauntbox
Copy link
Contributor

Cool - curious to see how this compares. Another one we could try is FastText, which also has a language detection module: https://github.com/facebookresearch/fastText/#full-documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants