Skip to content


@chekalsky @nelslindahlx @jgoodrich77 @jqueguiner @P3GLEG


  1. The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

    Python 37.6k 1.7k

  2. Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

    Python 3.9k 610

  3. Provide an input CSV and a target field to predict, generate a model + code to run it.

    Python 1.7k 153

  4. Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

    Python 1.9k 381

  5. Python package + CLI to generate stylistic wordclouds, including gradients and icon shapes!

    Python 651 38

  6. A robust Python tool for text-based AI training and generation using GPT-2.

    Python 644 36

737 contributions in the last year

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Mon Wed Fri

Contribution activity

July 2020

Created an issue in huggingface/transformers that received 4 comments

3.0.1: "unexpected keyword argument 'is_pretokenized'" when using batch_encode_plus() w/ Fast Tokenizers

🐛 Bug Information See title. Does not occur with "slow" tokenizers. To reproduce from transformers import GPT2TokenizerFast tokenizer = GPT2Tokeniz…


Seeing something unexpected? Take a look at the GitHub profile guide.

You can’t perform that action at this time.