<a href="https://colab.research.google.com/github/pszemraj/ai-msgbot/blob/update-notebooks/notebooks/colab-notebooks/aitextgen_text_generation_%2B_training_on_GPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  aitextgen — Train a GPT-2 (or GPT Neo) Text-Generating Model w/ GPU

This notebook is based on the original tutorial from `aitextgen`!

- For more about `aitextgen`, you can visit [this GitHub repository](https://github.com/minimaxir/aitextgen) or [read the documentation](https://docs.aitextgen.io/).
- for `ai-msgbot` (which is using `aitextgen` for chatbot-esque purposes) you can find the project repo [here](https://github.com/pszemraj/ai-msgbot)


_updates made by [Peter](https://peterszemraj.ch/)_



---

In [1]:
#@markdown add auto-Colab formatting with `IPython.display`
from IPython.display import HTML, display
# colab formatting
def set_css():
    display(
        HTML(
            """
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  """
        )
    )

get_ipython().events.register("pre_run_cell", set_css)

### GPU

Colaboratory uses a Nvidia K80, an Nvidia P100, an Nvidia V100, or Nvidia A100. For finetuning GPT-2 124M, any of these GPUs will be fine, but for text generation, a k80 or a P100 is ideal since they have more VRAM. 

- In theory: **If you receive a T4 or a V100 GPU, you can enable `fp16=True` during training for faster/more memory efficient training.**

In [2]:
#@title print GPU status
!nvidia-smi

Sun Jan 23 02:46:25 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    49W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
#@title print out the VM's CPU stats
from psutil import virtual_memory
import os
ram_gb = round(virtual_memory().total / (1024**3), 1)
print(f'Runtime has {ram_gb} gigs of memory and {os.cpu_count()} processors')

if ram_gb < 20: print("WARNING - your CPU RAM allocated is less than 20.",
                      " You may experience errors loading")

Runtime has 51.0 gigs of memory and 8 processors


## setup

In [4]:
#@title set torch version
!pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html -q
!pip install https://storage.googleapis.com/jax-releases/cuda111/jaxlib-0.1.71+cuda111-cp37-none-manylinux2010_x86_64.whl -q

#@markdown see this issue https://github.com/googlecolab/colabtools/issues/2452 for colab A100 GPU

In [5]:
#@title install aitextgen
!pip install -q aitextgen

import logging

logging.basicConfig(
    format="%(asctime)s — %(levelname)s — %(name)s — %(message)s",
    datefmt="%m/%d/%Y %H:%M:%S",
    level=logging.INFO,
)

from aitextgen import aitextgen
from aitextgen.colab import mount_gdrive, copy_file_from_gdrive

In [6]:
mount_gdrive()


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Loading GPT-2 or GPT Neo


- A common use case is *continuing* to fine-tune a model that was originally pretrained, and then fine-tuned a little bit, but needs to be fine-tuned more for accuracy/saliency reasons or because Google cut off the runtime earlier. 
    - in this case, `load_from_folder` should be set to `True` and `load_folder_dir` points to where the model checkpoint is on your google drive.
- **the below section describes loading an new/pretrained model from the original tutorial.**

> If you're retraining a model on new text, you need to download and load the GPT-2 model into the GPU. 

> There are several sizes of GPT-2:

    * `124M` (default): the "small" model, 500MB on disk.
    * `355M` (default): the "medium" model, 1.5GB on disk.
    * `774M` (default): the "large" model, 3GB on disk.

> You can also finetune a GPT Neo model instead ([_or any textgen GPT-architecture model on huggingface_](https://huggingface.co/models?pipeline_tag=text-generation)), which is more suitable for longer texts and the base model has more recent data:

* `125M`: Analogous to the GPT-2 124M model. (355M parameter model was removed)
*  `EleutherAI/gpt-neo-1.3B` : 1.3 billion parameter model. Have yet to see this train on Colab without crashing

> The next cell downloads the model and saves it in the Colaboratory VM. If the model has already been downloaded, running this cell will reload it.

In [7]:
model_size = "355M" #@param ["355M", "774M"]
load_from_folder = False #@param {type:"boolean"}
load_folder_dir = "/content/drive/MyDrive/Programming/ai-msgbot/your-previous-model-name" #@param {type:"string"}


In [8]:
if load_from_folder:
    ai = aitextgen(
        model_folder=load_folder_dir, 
        to_gpu=True,
        gradient_checkpointing=True,
    )
else:
    ai = aitextgen(
        tf_gpt2=model_size, 
        to_gpu=True,
        gradient_checkpointing=True,
    )
# Comment out the above line and uncomment the below line to use GPT Neo instead.

# model_size = "gpt2-xl"
# ai = aitextgen(model='gpt2-xl', 
#                to_gpu=True, 
#                gradient_checkpointing=True)

01/23/2022 02:46:40 — INFO — aitextgen — Loading 355M GPT-2 model from /aitextgen.
01/23/2022 02:46:45 — INFO — aitextgen — GPT2 loaded with 354M parameters.
01/23/2022 02:46:45 — INFO — aitextgen — Gradient checkpointing enabled for model training.
01/23/2022 02:46:45 — INFO — aitextgen — Using the default GPT-2 Tokenizer.


## load training data


- links to my parsed data:

```
clean "large" whatsapp+iphone text dataset:

https://www.dropbox.com/s/gbk9lkbcx6axk07/clean_apple_and_whatsapp_msgs.txt?dl=1

clean "small" whatsapp+iphone text dataset:

https://www.dropbox.com/s/75hvz74ve2yux02/clean_dataset-V3-whatsapp-apple.txt?dl=1
```


In [9]:
dl_link = "https://github.com/pszemraj/ai-msgbot/raw/main/conversation-data/wizard-of-wikipedia/ScriptParse-wow-train-kilt.txt" #@param {type:"string"}
dataset_tag = "WoW" #@param {type:"string"}


In [10]:
#@markdown retrieve the file behind `dl_link`
from urllib import request
from os.path import join
import os
vm_wd = os.getcwd()
local_name = join(vm_wd, "training_script.txt")
request.urlretrieve(dl_link, local_name)


('/content/training_script.txt', <http.client.HTTPMessage at 0x7f344fdec190>)

In [11]:
#@title create the `update_script_names()` and `preview_script()` functions
#@markdown adjust names in script if needed  
import pprint as pp
from os.path import basename

def update_script_names(local_name, spkr_from="speaker a", 
                        spkr_to="person alpha",
                        resp_from="speaker b", resp_to="person beta",
                        verbose=False):
    """
    update_script_names - if the textfile script has different names for the 
    speaker/responder than desired (i.e. it is a group conversation, and the 
    chatbot is just supposed to simulate 1:1) this function can be used to 
    standardize
    """

    with open(local_name, 'r', encoding='utf-8', errors='ignore') as fi:
        orig_lines = fi.readlines()

    from tqdm.auto import tqdm

    upd_lines = []

    for line in tqdm(orig_lines, total=len(orig_lines), 
                     desc="replacing speaker names"):
        
        fixline = line.replace(spkr_from, spkr_to)
        fixline = fixline.replace(resp_from, resp_to)
        upd_lines.append(fixline)

    local_namev2 = join(vm_wd, "V2-rename-" + basename(local_name))

    with open(local_namev2, 'w', encoding='utf-8', errors='ignore') as fo:
        fo.writelines(upd_lines)

    if verbose: pp.pprint(upd_lines[:10])
    # return filepath
    return local_namev2

def preview_script(file_path, num_lines:int=20):
    with open(local_name, 'r', encoding='utf-8', errors='ignore') as fi:
        script_lines = fi.readlines()

    print(f"A preview of the first {num_lines} lines of {file_path} is: \n")
    pp.pprint(script_lines[:num_lines])

In [12]:
local_name = update_script_names(local_name)

file_name = local_name # update if using fn above

preview_script(file_name)

replacing speaker names:   0%|          | 0/1050198 [00:00<?, ?it/s]

A preview of the first 20 lines of /content/V2-rename-training_script.txt is: 

['person alpha:\n',
 'i like to watch ice hockey on tv. my favorite team is the chicago '
 'blackhawks.\n',
 '\n',
 'person beta:\n',
 "the blackhawks are one of my favorite teams, they've won 6 stanley cup "
 'championships since they started in 1926\n',
 '\n',
 'person alpha:\n',
 'the viking are sea pirates!\n',
 '\n',
 'person beta:\n',
 "i see! didn't they speak the norse language?\n",
 '\n',
 'person alpha:\n',
 "what's the norse language? what country speaks such?\n",
 '\n',
 'person beta:\n',
 'the north germans!\n',
 '\n',
 'person alpha:\n',
 'so what do the vikings do ?are they a cult group?\n']


## Train / Finetune GPT-2

The next cell will start the actual finetuning of GPT-2 in aitextgen. It runs for `num_steps`, and a progress bar will appear to show training progress, current loss (the lower the better the model), and average loss (to give a sense on loss trajectory).

The model will be saved every `save_every` steps in `trained_model` by default, and when training completes. If you mounted your Google Drive, the model will _also_ be saved there in a unique folder.

The training might time out after 4ish hours; if you did not mount to Google Drive, make sure you end training and save the results so you don't lose them! (if this happens frequently, you may want to consider using [Colab Pro](https://colab.research.google.com/signup))

Important parameters for `train()`:

- **`line_by_line`**: Set this to `True` if the input text file is a single-column CSV, with one record per row. aitextgen will automatically process it optimally.
- **`from_cache`**: If you compressed your dataset locally (as noted in the previous section) and are using that cache file, set this to `True`.
- **`num_steps`**: Number of steps to train the model for.
- **`generate_every`**: Interval of steps to generate example text from the model; good for qualitatively validating training.
- **`save_every`**: Interval of steps to save the model: the model will be saved in the VM to `/trained_model`.
- **`save_gdrive`**: Set this to `True` to copy the model to a unique folder in your Google Drive, if you have mounted it in the earlier cells
- **`fp16`**: Enables half-precision training for faster/more memory-efficient training. Only works on a T4 or V100 GPU.

Here are other important parameters for `train()` that are useful but you likely do not need to change.

- **`learning_rate`**: Learning rate of the model training.
- **`batch_size`**: Batch size of the model training; setting it too high will cause the GPU to go OOM. (if using `fp16`, you can increase the batch size more safely)

In [13]:
import gc, os
from os.path import join
from datetime import datetime
#@title admin params & setup
#@markdown creates folders etc
base_dir = "/content/drive/MyDrive/Programming/ai-msgbot" #@param {type:"string"}
# update to yours
def get_timestamp():
    return datetime.now().strftime("%b-%d-%Y_t-%H")

temp_gpu_path = join(base_dir, 
                     "GPT2-conversational-{sz}-{dt}".format(sz=model_size,
                                                            dt=get_timestamp(),
                                                            )
                     )
os.makedirs(temp_gpu_path, exist_ok=True)
gc.collect()

fp16_kwargs = {
    "amp_backend":'apex'
}
``
#@markdown **tips for training:**<br> do not use warmup steps. 
#@markdown if run OOM, decrease: `batch_size`, `gradient_accumulation_steps`, dataset size (length of text file input). 
#@markdown or increase number of layers frozen.

In [14]:
# DO NOT USE WARMUP STEPS

ai.train(
            file_name, # text file with training data
            output_dir=temp_gpu_path, # where it saves during "save_every"
            line_by_line=False, # if using CSV file input
            from_cache=False,
            num_steps=10000, # takes about 5 hours on 16 gb v100 GPU fo®r 75000
            generate_every=1000,
            max_grad_norm=0.5,
            save_every=1500,
            gradient_accumulation_steps=4,
            save_gdrive=False, # this is an "automated" save which is worse than current method (IMO)
            learning_rate=1e-3,
            # fp16=True, # current bug in aitextgen is MisconfigurationException: You have asked for `amp_level='O1'` but it's only supported with `amp_backend='apex'`.
            batch_size=1, # if pushing model_size you probably want to leave this at 1
            freeze_layers= True, # whether to change weights on ALL layers or not
            num_layers_freeze = 22, # standard GPT-2 M has 24 layers. size L has 36
        #  fp16_opt_level="O2", # different types of FP16 are possible
        )

01/23/2022 02:46:51 — INFO — aitextgen — Loading text from /content/V2-rename-training_script.txt with generation length of 1024.


  0%|          | 0/1050198 [00:00<?, ?it/s]

01/23/2022 02:46:52 — INFO — aitextgen.TokenDataset — Encoding 1,050,198 sets of tokens from /content/V2-rename-training_script.txt.
01/23/2022 02:47:07 — INFO — aitextgen — Layer freezing enabled for model training.
  f"Setting `Trainer(checkpoint_callback={checkpoint_callback})` is deprecated in v1.5 and will "
  f"Setting `Trainer(progress_bar_refresh_rate={progress_bar_refresh_rate})` is deprecated in v1.5 and"
  "Setting `Trainer(weights_summary=None)` is deprecated in v1.5 and will be removed"
01/23/2022 02:47:07 — INFO — pytorch_lightning.utilities.distributed — GPU available: True, used: True
01/23/2022 02:47:07 — INFO — pytorch_lightning.utilities.distributed — TPU available: False, using: 0 TPU cores
01/23/2022 02:47:07 — INFO — pytorch_lightning.utilities.distributed — IPU available: False, using: 0 IPUs
01/23/2022 02:47:07 — INFO — pytorch_lightning.accelerators.gpu — LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


  0%|          | 0/10000 [00:00<?, ?it/s]

  "`trainer.progress_bar_dict` is deprecated in v1.5 and will be removed in v1.7."


[1m1,000 steps reached: generating sample texts.[0m

person beta:
yes. what?

person beta:
well, the original show was based on a tv series and it was based on american literature.

person alpha:
i just love to play video games, especially on the xbox

person beta:
it's a great hobby. i love all things electronic, even when it's a hobby.

person alpha:
i love to play the guitar, do you play guitar as well?

person beta:
yes, i like to play electric guitars, i think it is the most basic.

person alpha:
my favorite kind of music is the kind where even though it is a musical composition, it still manages to be a fun and engaging experience.

person beta:
that sounds like a lot of fun, i do too.

person alpha:
it's a great way to share your feelings with someone or maybe it's so much fun that you have to keep going.

person beta:
it is a lot of fun. music has been around for a while so it's hard to pick a favorite.

person alpha:
i always wanted to be a singer, but i don't think i ever


  rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")
01/23/2022 04:29:36 — INFO — aitextgen — Saving trained model pytorch_model.bin to //content/drive/MyDrive/Programming/ai-msgbot/GPT2-conversational-355M-Jan-23-2022_t-02


In [15]:
#@markdown save results to created folders
import os
from os.path import join
save_path = join(base_dir, 
                     "FIN-GPT-conv-{sz}-{tag}-{dt}".format(sz=model_size,
                                                             tag=dataset_tag,
                                                        dt=get_timestamp(),
                                                        )
                     )

os.makedirs(save_path, exist_ok=True)
ai.save(save_path)

print(f'saved! {get_timestamp()}')


saved! Jan-23-2022_t-04


You're done! Feel free to go to the **Generate Text From The Trained Model** section to generate text based on your retrained model.

---


# Use a Train Model for Generation

If you already had a trained model from this notebook, running the next cell will copy the `pytorch_model.bin` and the `config.json`file from the specified folder in Google Drive into the Colaboratory VM. (If no `from_folder` is specified, it assumes the two files are located at the root level of your Google Drive)

In [16]:
!nvidia-smi

Sun Jan 23 04:29:58 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    40W / 300W |  11691MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [17]:
# best model thus far @ 1.3B parameters and tuned for 50k steps
# from_folder = "/content/drive/MyDrive/Programming/AI_peter/GPT-Neo-1B-V1"

from_folder = save_path

if len(from_folder) > 2:

    for file in ["pytorch_model.bin", "config.json"]:
        if from_folder:
            copy_file_from_gdrive(file, from_folder)
        else:
            copy_file_from_gdrive(file)

    ai = aitextgen(model_folder=from_folder, to_gpu=True)
else:
    ai = aitextgen(model_folder=".", to_gpu=True)


01/23/2022 04:30:07 — INFO — aitextgen — Loading model from provided weights and config in //content/drive/MyDrive/Programming/ai-msgbot/FIN-GPT-conv-355M-WoW-Jan-23-2022_t-04.
  "Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
01/23/2022 04:30:11 — INFO — aitextgen — GPT2 loaded with 354M parameters.
01/23/2022 04:30:11 — INFO — aitextgen — Using the default GPT-2 Tokenizer.


## Generate Text From The Trained Model


`generate()` without any parameters generates a single text from the loaded model to the console.

In [18]:
ai.generate(n=3, max_length=256, 
            temperature=1.0, top_p=0.9)

person alpha:
yes they are delicious! do you like it with just plain old cheese?

person beta:
i enjoy that, i think it really is a combination of italian and american.

person alpha:
i think i prefer my cheese to be on the thick side, and to go with meat like lamb, goat, turkey, etc.

person beta:
that is an excellent way to go. i enjoy steak to have with a nice marinara sauce or steak and kidney.

person alpha:
i love to cook new dishes every weekend

person beta:
me too, the combination of one, two or all is great, but it's weird to think of it as an art.

person alpha:
i think the only thing that's not as well known is that chefs can make a wide variety of different meals at home.

person beta:
me too. i'm really not a big fan of the prepackaged food though. i prefer to cook food on the grill, either through the use of an open fire or through electric stoves.

person alpha:
hello! i live in a rural area and love it. are you from a rural
person alpha:
i want to get a job in the fede

In [19]:
ai.generate(prompt="give me a good pickup line!\n person beta:", temperature=1,
            min_length=10, batch_size =20, top_k=6)

[1mgive me a good pickup line!
 person beta:[0m
well you don't have to have any special knowledge to get the most out of this. you can learn about it online and in stores and in class.

person alpha:
i love pizza! it is my favorite food, especially with friends over, do you enjoy a nice thin crust?

person beta:
i absolutely do. i like to fold it in half and eat it like a pizza!

person alpha:
i like to fold it in half, too, but it sounds kind of gross to eat it that way. do you have a preference on toppings or condiments?

person beta:
i like pepperoni, mushrooms, sausage, peppers and onions. i also like to put in some good ol' classic pizza

person alpha:
i love the sound of the violin, i used to play it when i was little!

person beta:
it's also informally known as a fiddle, and is also informally called a fiddle, as well.

person alpha:
i love to read, how about you?

person beta:
i do it for entertainment sometimes. do you have a favorite genre?


If you're creating an API based on your model and need to pass the generated text elsewhere, you can do `text = ai.generate_one()`

You can also pass in a `prompt` to the generate function to force the text to start with a given character sequence and generate text from there (good if you add an indicator when the text starts).

You can also generate multiple texts at a time by specifing `n`. You can pass a `batch_size` to generate multiple samples in parallel, giving a massive speedup (in Colaboratory, set a maximum of 50 for `batch_size` to avoid going OOM).

Other optional-but-helpful parameters for `ai.generate()` and friends:

*  **`min length`**: The minimum length of the generated text: if the text is shorter than this value after cleanup, aitextgen will generate another one.
*  **`max_length`**: Number of tokens to generate (default 256, you can generate up to 1024 tokens with GPT-2 and 2048 with GPT Neo)
* **`temperature`**: The higher the temperature, the crazier the text (default 0.7, recommended to keep between 0.7 and 1.0)
* **`top_k`**: Limits the generated guesses to the top *k* guesses (default 0 which disables the behavior; if the generated output is super crazy, you may want to set `top_k=40`)
* **`top_p`**: Nucleus sampling: limits the generated guesses to a cumulative probability. (gets good results on a dataset with `top_p=0.9`)

In [20]:
ai.generate(
    n=3, batch_size=25, prompt="person beta:\n i just", max_length=256, 
    temperature=1.0, top_p=0.9
)

[1mperson beta:
 i just[0m can't see anyone as an attractive woman! what are some other qualities of a woman you find attractive?

person alpha:
maybe just some stubby legs or long legs, and a face that is usually not smiling.

person beta:
that is one of my favorite traits! i never could get past the fact that she was unmarried.

person beta:
that's why it's interesting that she ended up marrying a guy, which is quite the accomplishment. it's not hard at all.

person alpha:
do you think that there is anything else you would like to know about a person before they meet them?

person beta:
people usually date after a few dates, so it's not that difficult to meet someone new. i like to meet someone in person.

person alpha:
hi, i love hiking! it is so relaxing.

person beta:
i love hiking too! when was the last time you hiked?

person alpha:
never, but it has changed a lot in the last few years.

person beta:
that makes sense, i've heard so many new ideas from the outdoors. is hiking t

For bulk generation, you can generate a large amount of texts to a file and sort out the samples locally on your computer. The next cell will generate `num_files` files, each with `n` texts and whatever other parameters you would pass to `generate()`. The files can then be downloaded from the Files sidebar!

You can rerun the cells as many times as you want for even more generated texts!

In [25]:
save_loc = "/content/drive/MyDrive/Programming/ai-msgbot/output_files" #@param {type:"string"}
os.makedirs(save_loc, exist_ok=True)

In [26]:
p_list = [
          ["how are you doing?"+"\n", "\n", "person beta:" + "\n"], 
           ["person alpha:"+"\n", "it is obvious that "],
           ["person alpha:"+"\n", "this is ridiculous, "],
           ["person alpha: \n", "can you help me with my homework?"+"\n", "\n", "person beta:" + "\n"],
           ["person beta:" + "\n"],
           ["sarah 'nacho cheese' stanley:" + "\n", 
            "hi! I got a new phone" + "\n",
            "\n",
            "person beta:\n",],
           ["person beta: \n", 
            "Hey I’m meeting the astrophysics professor via zoom after school any tips?"+"\n",
            "\n", 
            "person beta:" + "\n"],
           ["person beta:" + "\n",
            "I know"],
]


prompts = ["".join(line) for line in p_list]

In [27]:
from datetime import datetime
import pprint as pp

ds_date_time = datetime.now().strftime("%m.%d.%Y")

base_header = "gpt-{}-textgen-{}".format(model_size, ds_date_time)
prompt_IDs = [base_header + "_file-{}.txt".format(i) for i in range(1, len(prompts)+1)]

prompt_mng = {}
for pid, text in zip(prompt_IDs, prompts):
    prompt_mng[pid] = text
pp.pprint(prompt_mng)

{'gpt-355M-textgen-01.23.2022_file-1.txt': 'how are you doing?\n'
                                           '\n'
                                           'person beta:\n',
 'gpt-355M-textgen-01.23.2022_file-2.txt': 'person alpha:\nit is obvious that ',
 'gpt-355M-textgen-01.23.2022_file-3.txt': 'person alpha:\n'
                                           'this is ridiculous, ',
 'gpt-355M-textgen-01.23.2022_file-4.txt': 'person alpha: \n'
                                           'can you help me with my homework?\n'
                                           '\n'
                                           'person beta:\n',
 'gpt-355M-textgen-01.23.2022_file-5.txt': 'person beta:\n',
 'gpt-355M-textgen-01.23.2022_file-6.txt': "sarah 'nacho cheese' stanley:\n"
                                           'hi! I got a new phone\n'
                                           '\n'
                                           'person beta:\n',
 'gpt-355M-textgen-01.23.2022_file-7.txt': 'pers

In [28]:
from os.path import join

for pfile, my_prompt in prompt_mng.items():
    ai.generate_to_file(
        n=60,
        batch_size=20,
        prompt=my_prompt,
        max_length=512,
        temperature=0.85,
        top_p=0.9,
        destination_path=join(save_loc, pfile)
    )


01/23/2022 04:33:11 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-1.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:33:55 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-2.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:34:40 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-3.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:35:23 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-4.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:36:07 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-5.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:36:52 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-6.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:37:35 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-7.txt


  0%|          | 0/60 [00:00<?, ?it/s]

01/23/2022 04:38:18 — INFO — aitextgen — Generating 60 texts to /content/drive/MyDrive/Programming/ai-msgbot/output_files/gpt-355M-textgen-01.23.2022_file-8.txt


  0%|          | 0/60 [00:00<?, ?it/s]