Skip to content

PKU-TANGENT/NLP-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

NLP tools

This repository is used to record the tools we are like to use in Natural Language Processing.

Websites

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

http://nlpprogress.com/

Papers with code ★

Just as the title says.

https://paperswithcode.com/

Preprocess

NLTK ★

Interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

https://www.nltk.org/

Standfor CoreNLP

POS tagger, NER, the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, information extraction and the basic dependencies

https://stanfordnlp.github.io/CoreNLP/

Gensim ★

tf-idf, LSA, LDA, word2vec

https://radimrehurek.com/gensim/

GloVe

word2vec

https://nlp.stanford.edu/projects/glove/

ELMo

contextualized word representation

https://allennlp.org/elmo

Bert

sentence encode

https://bert-as-service.readthedocs.io/en/latest/index.html

sentencepiece

https://github.com/google/sentencepiece

jieba

Chinese text segmentation, POS, NER, dependancy parsing , etc

https://github.com/fxsjy/jieba

pyltp

Chinese text segmentation, POS, NER, dependancy parsing, etc

https://github.com/HIT-SCIR/pyltp

HanLP

Chinese text segmentation, POS, NER, dependancy parsing, etc

https://github.com/hankcs/HanLP

Chinese-Word-Vectors

Chinese word2vec, provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora.

https://github.com/Embedding/Chinese-Word-Vectors

Algorithm

sklearn

machine learning

https://scikit-learn.org/stable/index.html

pulp

Linear Programming

https://pythonhosted.org/PuLP/

scipy

numerical integration, interpolation, optimization, linear algebra, and statistics, etc.

https://www.scipy.org/scipylib/index.html

crf++

implementation of crf

https://taku910.github.io/crfpp/

Library

Transformers/hugging face

SOTA NLP for tf2.0 and PyTorch, including BERT, GPT, XLNet, OpenAI etc.

https://github.com/huggingface/transformers

allennlp

implementations of high quality models for almost any NLP problem

https://allennlp.org/

ignite

a high-level library to help with training neural networks in PyTorch

https://pytorch.org/ignite/index.html

onmt

an open source ecosystem for neural machine translation and neural sequence learning

https://opennmt.net/

torchtext

Generic data loaders, abstractions, and iterators for text

https://github.com/pytorch/text

Googletrans

Google Translate API (unofficial)

elestlesearch

distributed search engine

https://elasticsearch-py.readthedocs.io/en/master/

Chinese analyzer: IK analysis for elasticsearch

Other

scrapy

An open source and collaborative framework for extracting the data you need from websites.

https://scrapy.org/

mongoDB

a general purpose, document-based, distributed database

https://www.mongodb.com/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published