Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Language Detect #932

Merged
merged 141 commits into from Sep 6, 2021
Merged

Optional Language Detect #932

merged 141 commits into from Sep 6, 2021

Conversation

gavishpoddar
Copy link
Contributor

@gavishpoddar gavishpoddar commented Jun 3, 2021

Implementing optional language detection and allow the creation of custom language detection libraries by users.

This work is currently in progress and this PR is created for the monitoring purpose by the maintainers to suggest progress.

This PR is currently not complete.

TODOS:

  • Restructuring protect to pass function for language detection.
  • Managing HTTP exception for fasttext model download.
  • Updating setup.py for optional language detection.
  • Mock test of parse, search_dates and DateDataParser.
  • Unit tests for language detect functions.
  • Functionality analysis of the implemented language detection.
  • Testing optional language detection settings.
  • Make apply_setting independent functions.
  • Fasttext model download manager for models.
  • Documentation optional language detection settings.
  • Language Mapping (CLDR & ISO 639).
  • Check working with other settings.
  • Removing unsupported locale from language detecting
  • Improving docs.
  • Setting fasttext default language.
  • Removing langauge_map.json and sinking to language_data.py
  • langdetect set default DetectorFactory without changing global state.
  • dateparser-download set default caching folder.
  • Documenting dateparser-download.
  • Changes for preventing breaking changes.
  • detect_languages_func -> detect_languages_function

language/fast_text.py Outdated Show resolved Hide resolved
@lopuhin
Copy link
Member

lopuhin commented Jun 4, 2021

regarding adding dateparser/custom_lang_detect/data/lid.176.ftz - I'm not sure we should package this in dateparser even though it's not too big, but this is a minor thing which can be dealt with later. It's good to be able to download that automatically though.

@gavishpoddar
Copy link
Contributor Author

HI @lopuhin, I am actually still developing and testing so I have not deleted it but will do that but in the end, I will clean codes and remove them.

dateparser/__init__.py Outdated Show resolved Hide resolved
@noviluni
Copy link
Collaborator

noviluni commented Jun 4, 2021

Hey @gavishpoddar good job! 💪

I left some comments. We will iterate over this code and probably the result will be quite different, but really good to see that you are getting results so quickly 🚀

Let me know if there's something you need any clarification on. 😄

@gavishpoddar gavishpoddar changed the title WIP : Optional Language Detect [WIP] Optional Language Detect Jun 6, 2021
@gavishpoddar
Copy link
Contributor Author

gavishpoddar commented Jun 8, 2021

Hi @lopuhin and @noviluni, I have made some changes can you suggest. I have also tried to cache the model without the global keyword instead used a class for the same.

I have started writing additional tests will try to complete them and cleaning up the code by Friday.

dateparser/data/languages_info.py Outdated Show resolved Hide resolved
dateparser/data/languages_info.py Outdated Show resolved Hide resolved
Co-authored-by: Marc Hernández <noviluni@gmail.com>
tests/test_settings.py Outdated Show resolved Hide resolved
tests/test_settings.py Show resolved Hide resolved
Copy link
Collaborator

@noviluni noviluni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! ✔️ 🚀

Really happy to see this happening 🎉 😄

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great PR @gavishpoddar , added some minor notes, I think it's basically ready to be merged.

dateparser_cli/cli.py Outdated Show resolved Hide resolved
dateparser_cli/cli.py Outdated Show resolved Hide resolved
dateparser_cli/cli.py Outdated Show resolved Hide resolved
docs/settings.rst Outdated Show resolved Hide resolved
dateparser_cli/cli.py Outdated Show resolved Hide resolved
Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @gavishpoddar 👍

@kishan3 kishan3 self-requested a review September 3, 2021 05:38
dateparser_cli/cli.py Outdated Show resolved Hide resolved
@lopuhin lopuhin merged commit 544ea39 into scrapinghub:master Sep 6, 2021
@lopuhin
Copy link
Member

lopuhin commented Sep 6, 2021

@gavishpoddar congrats on the merge, great work 👍

@noviluni sorry that I didn't ask before merging, I hope a merge commit is fine, or would you prefer a squash and merge?

@gavishpoddar
Copy link
Contributor Author

Thanks for the merge :)

@noviluni
Copy link
Collaborator

Hi @lopuhin, sorry for the late answer. I prefer to squash because in that way it is easier for me to build the changelog when releasing a new version, but no problem, it is fine :)

(Congrats Gavish! 💪 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants