Skip to content

NLPlay is a Pytorch library which implements some key & baseline models for text classifications problems

Notifications You must be signed in to change notification settings

jeremypoulain/nlplay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLPlay

What is NLPlay?

NLPlay is a toolbox / repository, centralizing implementations of key NLP algorithms in one place,to tackle Text Classification, Sentiment Analysis & Question Answering problems. The idea is to have a collection of ready to use algorithms & building blocks , to allow people to quickly benchmark/customize those different model architectures, over standard datasets or their own ones.

Supported models & features

Python/Sklearn (CPU Only)

Pytorch (CPU/GPU)

Additional Pytorch Optimizers

Additional Pytorch Activation Functions

Additional Pytorch loss

Datasets

Others

  • parlib : Parallel Processing for large lists (ie corpus pre-processing), Pandas DataFrames or Series, using joblib
  • DSManager / WordVectorsManager : Automatic reference and download of key datasets & pretrained vectors (Glove, FastText...)

Examples

Tutorials

Todo / Next Steps:

  1. Include additional Models :

  2. Include additional Datasets :

  3. Others :

    • Include Nvidia Apex - Mixed Precision to improve GPU memory footprint on Turing/Volta/Ampere architectures
    • Include support of Google TPU for training & inference via PyTorch/XLA
    • Include Cross validation mechanism
    • Include Metrics (F1,AUC...) + Confusion Matrix
    • Include automatic EDA reporting features
    • Include a streamlit app to easily explore & debug model predictions errors and identify potential root causes (ie tokenization, unseen tokens, sentence length,class confusion..)
    • Include Microsoft NNI for Hyper Parameters Tuning (TPE, SMAC, Hyperband, BOHB... )
    • Include MLflow for Experiments tracking