Natural-Language-Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Branch: master
Clone or download
Latest commit abe93dd Feb 20, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
accuracy add emotion analysis, release version 1.2 Jan 6, 2019
crawl
dataset update dataset docs Feb 7, 2019
docs fix readthedocs Feb 20, 2019
example added batch processing for word2vec calculator Feb 17, 2019
importtime add readme importtime Jan 6, 2019
malaya update readme Feb 19, 2019
session fix toxic fast-text session Feb 17, 2019
tests
translator add emotion analysis, release version 1.2 Jan 6, 2019
.gitignore release version 1.7, added word-mover distance, text similarity and etc Feb 15, 2019
.travis.yml fixing script Dec 11, 2018
LICENSE
README.rst fix readthedocs Feb 20, 2019
build-dependencies.sh release first beta, version 1.0 Dec 25, 2018
build-package.sh added anaconda build Jan 2, 2019
generate-rst.sh release version 0.9 Dec 19, 2018
readme-pypi.rst
readthedocs.yml fix readthedocs Jan 16, 2019
setup-gpu.py
setup.py added manual physical cores able to use for w2v Feb 18, 2019

README.rst

logo

Pypi version Python3 version MIT License Documentation Build status


Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x is supported.

Features

  • Emotion Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep emotion analysis models.

  • Entities Recognition

    Latest state-of-art CRF deep learning models to do Naming Entity Recognition.

  • Language Detection

    using Multinomial, SGD, XGB, Fast-text N-grams deep learning to distinguish Malay, English, and Indonesian.

  • Normalizer

    using local Malaysia NLP researches to normalize any bahasa texts.

  • Num2Word

    Convert from numbers to cardinal or ordinal representation.

  • Part-of-Speech Recognition

    Latest state-of-art CRF deep learning models to do Naming Entity Recognition.

  • Dependency Parsing

    Latest state-of-art CRF deep learning models to do analyzes the grammatical structure of a sentence, establishing relationships between words.

  • Sentiment Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep sentiment analysis models.

  • Spell Correction

    Using local Malaysia NLP researches to auto-correct any bahasa words.

  • Stemmer

  • Subjectivity Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Sparse Tensorflow, Attention Neural Network to build deep subjectivity analysis models.

  • Summarization

    Using skip-thought with attention state-of-art to give precise unsupervised summarization.

  • Topic Modelling

    Provide LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.

  • Topic and Influencers Analysis

    Using deep and machine learning models to understand topics and Influencers similarity in sentences.

  • Toxicity Analysis

    From BERT, Fast-Text, Dynamic-Memory Network, Attention Neural Network to build deep toxicity analysis models.

  • Word2Vec

    Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.

  • Fast-text

    Provide pretrained bahasa wikipedia Fast-text, with easy interface and visualization.

License

License