Skip to content

A starter project to get going with transformers and deep learning for NLP tasks with Hugging Face and PyTorch and GLUE.

License

Notifications You must be signed in to change notification settings

justinhchae/transformer_project_pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transformer_project_pytorch

Why use this template?

  • To experiment with various data structures, models, and methods that involve transformers and language modeling
  • Learn more about Hugging Face, Bert, and PyTorch
  • Leverage state of the art pretrained models from saved model check points for comparison testing locally
  • The environmental dependencies and code ready to go out of the box and ready to configure for your own project

Environment Setup with Conda ENV

  • Create a conda environment from yml of base environment
    # replace transformer_project with your project name
    conda env create --file environment.yml -n transformer_project

Get Started

  • Run a full pipeline of text classification scripts with bert and corpora through pytorch:
python3 main.py

Glue/Jiant Resources:

Hugging Face Bert PyTorchExamples/Resources:

Environment and Preliminaries from Scratch with Conda and Pip

  • Conda Set Channel Strict conda-forge. Do this from base env prior to creating envs.
    # ensures a predictable selection of dependency management
    conda config --set channel_priority strict
  • A new conda environment with python 3.9
    conda create -n transformer_project python=3.9
  • Activate Conda Environment:
    conda activate transformer_project
  • Alternatively, install all dependencies with pip (only required if building from scratch)
    pip3 install --no-cache-dir huggingface huggingface_hub transformers torch torchvision torchaudio torchtext progressbar2 tqdm boto3 requests regex sentencepiece sacremoses importlib_metadata pandas scikit-learn matplotlib seaborn
  • Not Required if using previous pip installs - PyTorch conda install with Conda per the docs
    conda install pytorch torchvision torchaudio -c pytorch
  • Not Required if using previous pip installs - Transformers install with conda per the docs
    conda install -c huggingface transformers
    # verify with the following
    python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"

General Resources Mostly Based on Conda

  • Feature Resolver sometimes needed based on pip/conda/env

    # becoming less necessary and not required in 2021, might be better to just upgrade pip
    pip install <package_name> --use-feature=2020-resolver
  • Upgrade Pip:

    pip3 install --upgrade pip
  • List Conda Environments:

    conda env list
  • Remove/Delete Environment:

    conda remove -n transformer_project --all
  • Display History of Revisions:

    conda list --revisions
  • Export Environment:

    conda env export > environment.yml
  • Update Conda:

    conda update -n base -c defaults conda
  • Git Things:

    # remove a folder dir from git tracking
    git rm -r --cached .idea/
    # remove the DS store file but not the actual file
    git rm --cached .DS_Store
    # reset the cache
    git rm -r --cached .

    Works Cited

[1] @inproceedings{wolf-etal-2020-transformers, title = "Transformers: State-of-the-Art Natural Language Processing", author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = oct, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", pages = "38--45" }

[2] @incollection{NEURIPS2019_9015, title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library}, author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith}, booktitle = {Advances in Neural Information Processing Systems 32}, editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch'{e}-Buc and E. Fox and R. Garnett}, pages = {8024--8035}, year = {2019}, publisher = {Curran Associates, Inc.}, url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf} }

[3] Chris McCormick and Nick Ryan. (2019, July 22). BERT Fine-Tuning Tutorial with PyTorch. Retrieved from http://www.mccormickml.com

[4] @article{neeraj2020bertlayers, title = "Feature-based Approach with BERT", author = "Neeraj, Trishala", journal = "trishalaneeraj.github.io", year = "2020", url = "https://trishalaneeraj.github.io/2020-04-04/feature-based-approach-with-bert"}