-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telugu support #1
Comments
Thanks for the initiative!
And that should be it. It might be worth scraping some Telugu news website for building a classification model as well on top of the LM. Let me know if I can help you with anything along the way! |
thanks a ton for the detailed pointers. @binga said he'd cleanup what he already has over here: https://github.com/binga/fastai_notes/tree/master/experiments/notebooks/lang_models and send a PR to inltk ( reference ). I'll follow his lead and take up and tasks that he needs help on. |
Okay - That'd be great! |
I had built a Telugu dataset which contains 1,58,000 articles scraped from a news paper website https://github.com/AnushaMotamarri/Telugu-Newspaper-Article-Dataset , This dataset should be useful for classification. Dataset is divided into 3 years, data under each year is further divided into several categories. Each file has date&time, title and content. and i had built another dataset which has around 26,000 files scraped from 300 novels https://github.com/AnushaMotamarri/Telugu-Books-Dataset . Datasets can be directly downloaded from links https://drive.google.com/file/d/1IbqM335M7imzG-2ZV0d8-JbRqCnyAii3/view and https://drive.google.com/file/d/1MDiP-_S2RtAN7c9TLnKi8I2pxIgONIP0/view Respectively. Here is the Tokenizer I had built for Telugu https://github.com/AnushaMotamarri/TeluguTokenizer I am currently working on creating a lemmatizer for Telugu Language. |
@AnushaMotamarri Thanks for reaching out! You would like to contribute with building Language Model? @binga will be contributing the LM to iNLTK. So, it'd be great, in order to avoid duplicating efforts, if you could contribute with Telugu NER or translation. |
yes, |
No, I'd just started with it. So nothing in iNLTK yet. |
ok, i will work on them |
Hey, I would love contribute my part and can I plz collaborate with you guys ? |
@Asrst It will be worth tagging and asking @AnushaMotamarri or @binga if you can help them out with something, or elaborating on how you would like to contribute! |
I would like to contribute as well. @goru001 may be a gitter channel would help for easier/faster conversation here 🤔 |
@sainathadapa Yes right! Here it is! |
Hi All, just wanted to introduce myself and see if I can help with something to add Telugu support. Please let me know if you have any initial thoughts on where I can contribute PS: posted on the Gitter channel and wasn't sure if it was being monitored. So posting here. |
Hi @praveenc1, It will be worth tagging and asking @AnushaMotamarri or @binga if you can help them out with something. Or else, you can start with anything NER, Coreference resolution etc, almost everything is unexplored territory. |
@goru001 After training on new Language how to integrate that model in inltk to get sentence vector? |
I haven't found a great source for Telugu languages. We shall make a collection by scraping the data from Telugu webpages |
Hi, I can help you with telugu language source |
Hi all, |
Any previous work done on Tenglish(Telugu typed in English) ? The usage Telugu we converse with on whatsapp etc everyday. |
With the latest release of iNLTK, i.e. v0.9 Telugu support has been added, thanks to @Shubhamjain27 . Hence, closing this issue. |
@hariperavali Tenglish (Telugu+English) support is not there yet, code-mixed support has been added for Hinglish, Tanglish and Manglish in v0.9. Feel free to work on it and raise a PR if you want to. |
i need telugu sentiwordnet, i was try in many sites not getting, please help me |
Hey, great repository.
I'd like to add Telugu support. If you have a framework I should follow to download Telugu wikipedia and train it, I'd love some instructions and get going
The text was updated successfully, but these errors were encountered: