GitHub - stjordanis/Malaya: Natural-Language-Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/

Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x and above and Tensorflow 1.X are supported.

Features

Emotion Analysis

From fine-tuning BERT, Attention-Recurrent model, Sparse Tensorflow, Self-Attention to build deep emotion analysis models.
Entities Recognition

Latest state-of-art CRF deep learning models to do Naming Entity Recognition.
Language Detection

using Multinomial, SGD, XGB, Fast-text N-grams deep learning to distinguish Malay, English, and Indonesian.
Normalizer

using local Malaysia NLP researches to normalize any bahasa texts.
Num2Word

Convert from numbers to cardinal or ordinal representation.
Part-of-Speech Recognition

Latest state-of-art CRF deep learning models to do Part-of-Speech Recognition.
Dependency Parsing

Latest state-of-art CRF deep learning models to do analyzes the grammatical structure of a sentence, establishing relationships between words.
ELMO (biLM)

Provide pretrained bahasa wikipedia and bahasa news ELMO, with easy interface and visualization.
Relevancy Analysis

From Dilated Convolutional Neural Network and Self-Attention to build deep relevancy analysis models.
Sentiment Analysis

From fine-tuning BERT, Attention-Recurrent model, Sparse Tensorflow and Self-Attention to build deep sentiment analysis models.
Spell Correction

Using local Malaysia NLP researches to auto-correct any bahasa words.
Stemmer

Use Character LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.
Subjectivity Analysis

From fine-tuning BERT, Attention-Recurrent model, Sparse Tensorflow and Self-Attention to build deep subjectivity analysis models.
Similarity

Use deep LSTM siamese, deep Dilated CNN siamese, deep Self-Attention, siamese, Doc2Vec and BERT to build deep semantic similarity models.
Summarization

Using skip-thought and residual-network with attention state-of-art, LDA, LSA and Doc2Vec to give precise unsupervised summarization, and TextRank as scoring algorithm.
Topic Modelling

Provide LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.
Toxicity Analysis

From fine-tuning BERT, Attention-Recurrent model, Self-Attention to build deep toxicity analysis models.
Word2Vec

Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.
Fast-text

Provide pretrained bahasa wikipedia Fast-text, with easy interface and visualization.

References

If you use our software for research, please cite:

@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
accuracy		accuracy
bert		bert
crawl		crawl
dataset		dataset
docs		docs
example		example
importtime		importtime
malaya		malaya
session		session
tests		tests
translator		translator
xlnet		xlnet
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.rst		README.rst
build-dependencies.sh		build-dependencies.sh
build-package.sh		build-package.sh
generate-rst.sh		generate-rst.sh
readme-pypi.rst		readme-pypi.rst
readthedocs.yml		readthedocs.yml
setup-gpu.py		setup-gpu.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation

Installing from the PyPI

Features

References

License

About

Releases

Packages

Languages

License

stjordanis/Malaya

Folders and files

Latest commit

History

Repository files navigation

Documentation

Installing from the PyPI

Features

References

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages