Malayalam Corpus, along with a web based POSTagger. βœ’οΈπŸ“ƒ
Clone or download
Latest commit 72b960e Mar 26, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Classifier Start Phase - 2 Jan 19, 2018
Scripts Start Phase - 2 Jan 19, 2018
Silver-Standard-Corpus Testable files Feb 9, 2018
Webapp Added web app for distributing the service Feb 26, 2018
__pycache__ Add links for desha Jul 22, 2017
README.md Update README.md Mar 26, 2018

README.md

Makri - Malayalam Knowledge Ripper

Makri is a POSTagger built with RDRPOSTagger which was trained with over 80,000 lines of POS Tagged Silver Corpus.

Details.

The scrapper is build using Scrapy-Python. The makri-links.py can be used to collect links of malayalam articles and makri-sentences.py can be used to get malayalam text from websites and write into files which are divided by category.

A project by Adarsh S and Jithin James under the supervision of ICFOSS under the supervision of Dr. Rajeev RR

FOSSASIA Talk

Tamil NLP creator and talk mentor Ashok R

Slides

BLARK Ideology

Language Resource Classification

Language Statistics Data

Web App

A Django based web app is built to distribute the service. The development server can be accessed via python2 manage.py runserver

The web app has an input text option for live data tagging, an upload file option to tag text files.

The sevice also has a web end-point to use in other applications.

curl -G -v "http://127.0.0.1:8000/" --data-urlencode "q=input" will return the tagged data for the given input.

Team Members

@isht3, @jjmachan -- creators of the project @abinmn -- deployable web app creator