Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.1.0 update #6

Open
ryszardtuora opened this issue Jan 13, 2020 · 0 comments
Open

0.1.0 update #6

ryszardtuora opened this issue Jan 13, 2020 · 0 comments

Comments

@ryszardtuora
Copy link
Collaborator

0.1.0 UPDATE

Today we've released a new version which includes some important changes, mainly to the Morfeusz version.

The Morfeusz-based version gains two extensions in this release. The first one is the ability for full morphosyntactic analysis (i.e. tagging with features such as grammatical case, gender, number or tense) in the NKJP tagset. This is done by utilizing the winner of PolEval 2017, Toygger as our new tagger. This tagger requires TensorFlow to work. The second one is the introduction of a custom flexion component, the purpose of which is to inflect the words into the desired pattern. For details on both, please see the updated jupyter notebook for Morfeusz. The Morfeusz version works substantially slower now, because of the introduction of the new tagger. It may also be prone to errors on words which are OOV, if you expect your texts to contain many such words, we suggest trying out the basic model for tagging, and comparing the results. We aim to mitigate those issues in future releases, and further increase the capabilities of our integrated version of Toygger.

Additionally, the size of both models has been substantially reduced, we've found that we can use even less vectors without degradation in performance. For this reason, to fill in the slot of big models, we plan to release a bigger 300-dimension based model soon.

We work towards integrating our model as the officially recognized model for Polish. We've already sent the reconstruction scripts to the authors of spaCy, and hope that soon our model will gain their endorsement.

You can download the model here and the evaluation metrics are available here

Please, do experiment with it, and let us know about your results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant