Skip to content
Natural-Language-Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
accuracy release version 2.2, improve more dependency models Apr 21, 2019
crawl release version 0.7 Nov 27, 2018
dataset update dataset docs Feb 7, 2019
docs improve normalizer and added word2num May 17, 2019
example improve normalizer and added word2num May 17, 2019
importtime add readme importtime Jan 6, 2019
malaya improve normalizer and added word2num May 17, 2019
session
tests release version 2.2, improve more dependency models Apr 21, 2019
translator fix documentation and bump to 1.9.4 Mar 22, 2019
.gitignore release version 1.8, improve coding style, normalizer and spelling Feb 25, 2019
.travis.yml fixing script Dec 11, 2018
LICENSE Initial commit Mar 12, 2018
README.rst improve preprocessing May 3, 2019
build-dependencies.sh release first beta, version 1.0 Dec 25, 2018
build-package.sh added anaconda build Jan 2, 2019
generate-rst.sh release version 0.9 Dec 19, 2018
readme-pypi.rst improve preprocessing May 3, 2019
readthedocs.yml fix readthedocs Jan 16, 2019
setup-gpu.py improve normalizer and added word2num May 17, 2019
setup.py improve normalizer and added word2num May 17, 2019

README.rst

logo

Pypi version Python3 version MIT License Documentation Build status


Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x and above and Tensorflow 1.X are supported.

Features

  • Emotion Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep emotion analysis models.

  • Entities Recognition

    Latest state-of-art CRF deep learning models to do Naming Entity Recognition.

  • Language Detection

    using Multinomial, SGD, XGB, Fast-text N-grams deep learning to distinguish Malay, English, and Indonesian.

  • Normalizer

    using local Malaysia NLP researches to normalize any bahasa texts.

  • Num2Word

    Convert from numbers to cardinal or ordinal representation.

  • Part-of-Speech Recognition

    Latest state-of-art CRF deep learning models to do Naming Entity Recognition.

  • Dependency Parsing

    Latest state-of-art CRF deep learning models to do analyzes the grammatical structure of a sentence, establishing relationships between words.

  • ELMO (biLM)

    Provide pretrained bahasa wikipedia and bahasa news ELMO, with easy interface and visualization.

  • Sentiment Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep sentiment analysis models.

  • Spell Correction

    Using local Malaysia NLP researches to auto-correct any bahasa words.

  • Stemmer

  • Subjectivity Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep subjectivity analysis models.

  • Summarization

    Using skip-thought with attention state-of-art to give precise unsupervised summarization.

  • Topic Modelling

    Provide LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.

  • Toxicity Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Attention Neural Network to build deep toxicity analysis models.

  • Word2Vec

    Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.

  • Fast-text

    Provide pretrained bahasa wikipedia Fast-text, with easy interface and visualization.

License

License

You can’t perform that action at this time.