Skip to content

gayatrivenugopal/hindi-corpus-stoplemmas

Repository files navigation

Hindi Aesthetics Corpus and Stop Lemma List

This repository consists of an Aesthetics corpus that was created using text from the following sources:

  1. http://hindisamay.com, an e-library maintained by Mahatma Gandhi Antarrashtriya Hindi Vishwa Vidyalaya, Wardha
  2. http://premchand.co.in, a website dedicated to the popular novelist Premchand’s stories, and
  3. Bhandarkar Oriental Research Institute’s Digital Library (http://borilib.com)

The repository also consists of an exhustive stop word list prepared from the sources listed below: Wictionary Top 1900 https://1000mostcommonwords.com/1000-most-common-hindi-words https://blogs.transparent.com/hindi/first-100-high-frequency-words-in-hindi http://home.iitk.ac.in/~prasant/HindiCorpus/word.html https://github.com/oprogramador/most-common-words-by-language https://github.com/Alir3z4/stop-words https://github.com/stopwords-iso/stopwords-hi/blob/master/stopwords-hi.txt https://github.com/Xangis/extra-stopwords https://data.mendeley.com/datasets/bsr3frvvjc/1 https://www.ranks.nl/stopwords/hindi Frequency list generated from Wiki Dump August 2019 Aesthetics Corpus (custom) http://opus.nlpl.eu/ CFILT Hindi Corpus (http://www.cfilt.iitb.ac.in/Downloads.html) CFILT Hindi English Parallel Corpus (Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya. The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference. 2018) TDIL English Hindi Tourism Text Corpus TDIL Hindi English ILCI II Corpus on Agriculture and Entertainment TDIL Hindi Monolingual Text Corpus ILCI II TDIL Hindi English Health ILCI

The "Linguistic Resources" obtained from TDIL have been developed & made available by TDIL, MeitY, Government of India. Co-Authors: Dr. Jatinderkumar R. Saini, Dr. Dhanya Pramod Copyright © 2019, Gayatri Venugopal This work is licensed under GNU GPL v3 https://www.gnu.org/licenses/gpl-3.0.html

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages