Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Accuracy of The Gender Taggger #10

Closed
4 tasks
djokester opened this issue Jun 23, 2018 · 1 comment
Closed
4 tasks

Improve the Accuracy of The Gender Taggger #10

djokester opened this issue Jun 23, 2018 · 1 comment

Comments

@djokester
Copy link
Member

djokester commented Jun 23, 2018

The (word, gender) tuple is currently available here
In accordance with Issue #9 we will move this file to Sangita Data
We will also create a new repository for Hindi Word Vectors and one for machine learning models. These will be referenced in a separate issue.
Along with this we will remove the dependencies for Scikit Learn and work only with Keras.
The task list is given below

  • Move the gender.py to Sangita Data - Cakewalk.

  • Create a fresh set of word vectors and store it under a new repository especially for word vectors. - Pro.

  • Train the word vectors against the gender tags, and store the model under a separate repository. - Intermediate.

  • Refactor the code here, to accommodate these changes. - Intermediate.

@djokester djokester added this to To Do in GirlScript Summer of Code 2018 via automation Jun 23, 2018
@djokester djokester changed the title HindMonoCorp Extraction Improve the Accuracy of The Gender Taggger Jun 23, 2018
@SangitaNLP SangitaNLP deleted a comment from MansiBreja Jun 24, 2018
@djokester djokester added Pro and removed Pro labels Jun 25, 2018
@djokester
Copy link
Member Author

djokester commented Jun 25, 2018

For the word vectors we will use sentences from HDTB initially.
Once @MansiBreja is done with HindiMonoCorp Extraction we can use the sentences from HindiMonoCorp too and train the word vectors again.
Also we might need to scrape more sentences for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants