Wrapper library for text cleansing, preprocessing and POS Tagging in NLP
https://jakartaresearch.github.io/maleo/
- Scanner : get insight about your text dataset (ex: number of chars, words, emojis, etc)
- Remove hyperlink, punctuation, stopword, emoticon, etc
- Extract hashtags, price from text
- Convert email, phone number, date to <TAG>
- Convert Indonesian slang to formal word
- Convert emoji to word or <TAG>
- Convert word to number
- Predict Part-of-Speech (POS) tags
pip install maleo
from maleo.wizard import Wizard
from maleo.pos_tag import POS
wiz = Wizard()
pos = POS()
wiz.scanner(df, 'text')
wiz.emoji_to_word(df.text)
wiz.slang_to_formal(df.text)
pos.predict('saya mau pergi beli makan siang dulu', output_pair=False)
https://universaldependencies.org/u/pos/index.html
- Ruben Stefanus