<a href="https://colab.research.google.com/github/pollinations/hive/blob/main/notebooks/8%20Text-To-Text/1%20GPT-NEO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://pollinations.ai/ipfs/QmbKLcozvgHwkzojhGgqvkooA31zxdTP31113ToadbUo2h" width="300" height="300" />

GPT-NEO is an open-source language model similar to OpenAI's GPT-3. It was trained on huge amounts of raw text scraped from the internet and is able to convincingly write text.

For more detail, the text [Philosophers on GPT-3](https://dailynous.com/2020/07/30/philosophers-gpt-3/) has some interesting discussion on the philosophical implications of this new generation of language models.

It is possible to select from a few different models, some finetuned on erotic fiction.

In [None]:
# Text input which will be continued by GPT-NEO
text_prompt = "I think the carrot infinitely more fascinating than the geranium. The carrot has mystery. There is, you'll agree, a certain je ne sais quoi oh so very special about a firm young carrot.\n\n"  #@param {type: "string"}

# The type of language model to use.
model_type = "story"  #@param ['story', 'food recipes', 'erotic', 'ai dungeon']

# How long of a text to generate. One iteration generates a couple of sentences
iterations = 5 #@param {type: "integer"}


In [None]:
#@title Setup
#@markdown Run this for setting up dependencies or resetting actions
!pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3-rp-b
#!wget -c http://ftp.us.debian.org/debian/pool/main/m/megatools/megatools_1.11.0~git20200404-1_amd64.deb -O megatools.deb
#!dpkg -i megatools.deb
!pip install gdown
!nvidia-smi

import os

from transformers import GPTNeoForCausalLM, AutoTokenizer
import tarfile
import codecs
import torch
import threading
import time
import subprocess

from IPython.display import HTML, display
import ipywidgets as widgets

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))

def warn_timeout_thread():
  global warn_time
  while True:
    while warn_time == -1 or time.time() < warn_time:
      time.sleep(1)
    warn_time = -1
    if warn_timeout:
      try:
        warning_out.append_stdout("Ten minutes are up. Remember to rerun this cell so you don't get disconnected.")
      except:
        pass

try:
  initialized += 1
except:
  get_ipython().events.register('pre_run_cell', set_css)
  tail_free_sampling, top_k, top_p, temperature, number_generated_tokens, repetition_penalty, repetition_penalty_range, repetition_penalty_slope, number_show_last_actions = 0.95, 60, 0.9, 0.8, 40, 1.25, 300, 3.33, 15
  prevent_square_brackets, prevent_angle_brackets, prevent_curly_brackets = True, True, True
  enable_top_k, enable_top_p, enable_tfs = False, False, True
  bad_words_ids = None
  initialized = 0
  last_free_edit = ""
  last_prompt = ""
  warn_timeout = False
  warn_time = -1
  threading.Thread(target=warn_timeout_thread).start()

actions = []
memory = ("", torch.zeros((1, 0)).long())
lmi = ["", torch.zeros((1, 0)).long()]
an = ("", torch.zeros((1, 0)).long())
an_depth = 3
history = None

Collecting git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3-rp-b
  Cloning https://github.com/finetuneanon/transformers (to revision gpt-neo-localattention3-rp-b) to /tmp/pip-req-build-o8rmckwi
  Running command git clone -q https://github.com/finetuneanon/transformers /tmp/pip-req-build-o8rmckwi
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Wed Sep 22 14:08:48 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   

In [None]:
#@title Model setup
#@markdown horni was finetuned for one epoch on about 800MB worth of random blocks of text from literotica. Do not use the horni model if you dislike NSFW outputs. horni-ln uses horni as a base and was finetuned for one epoch on 579MB of text from a light novel dataset.
!sudo apt-get install p7zip-full aria2

print("Setting up model, this will take a few minutes. Don't interrupt this cell even takes a long while, or you can be left with broken, half unpacked files.")

#model_gdrive = "/content/drive/MyDrive/gpt-neo-2.7B-horni-ln.tar" #@param {type:"string"}
#use_gdrive = False #@param {type:"boolean"}
#@markdown If you download errors, the google drive downloads might be over their daily download quota. In that case, right-click, select "interrupt execution", download the checkpoint from mega yourself, upload to your google drive, tick use_gdrive and put the correct filename, e.g. `gpt-neo-2.7B-horni-ln.tar` and restart the cell.
#@markdown
#@markdown Warnings about certain attention bias parameters being uninitialized or about the google drive already having been mounted can be ignored.




#model_types = {"2.7B-horni": "https://mega.nz/file/6BNykLJb#B6gxK3TnCKBpeOF1DJMXwaLc_gcTcqMS0Lhzr1SeJmc",
#               "2.7B-horni-ln": "https://mega.nz/file/rQcWCTZR#tCx3Ztf_PMe6OtfgI95KweFT5fFTcMm7Nx9Jly_0wpg"}
model_map = { "story": "2.7B-picard",  
              "adventure": "2.7B-adventure",
              "erotic": "2.7B-horni",
              "horny": "2.7B-horni",
              "pornographic": "2.7B-shinen",
              "food recipes":"1.3B-ramsay",
              "ai dungeon":"2.7B-aid"}

model_name = model_map[model_type] #@param ["2.7B-horni-ln", "2.7B-horni", "EleutherAI/gpt-neo-2.7B"]


model_types = {"2.7B-horni": "https://storage.henk.tech/KoboldAI/gpt-neo-2.7B-horni.7z",
               "2.7B-horni-ln": "https://storage.henk.tech/KoboldAI/gpt-neo-2.7B-horni-ln.7z",
               "2.7B-picard": "https://storage.henk.tech/KoboldAI/gpt-neo-2.7B-picard.7z",
               "2.7B-shinen": "https://storage.henk.tech/KoboldAI/gpt-neo-2.7B-shinen.7z",
               "1.3B-ramsay":"https://storage.henk.tech/KoboldAI/gpt-neo-1.3B-ramsay.7z",
               "2.7B-aid":"https://storage.henk.tech/KoboldAI/gpt-neo-2.7B-aid.7z"}

custom_models = model_types.keys()

print("SELECTED MODEL", model_name)

model = None
tokenizer = None
pipeline = None
checkpoint = None
if not os.path.isdir("gpt-neo-"+model_name) and model_name in custom_models:
  model_url = model_types[model_name]
  print("Downloading:", model_url)
  #!megadl $model_url --no-ask-password
  !wget -N $model_url
  model_file = "gpt-neo-" + model_name + ".tar"
  model_file_compressed = "gpt-neo-" + model_name + ".7z"
  print("Unzipping model with 7z...")
  !/usr/bin/7z x $model_file_compressed
  print("Done unzipping")
  #tar = tarfile.open(model_file, "r")
  #tar.extractall()
  #tar.close()

if model_name in custom_models:
  checkpoint = torch.load("gpt-neo-" + model_name + "/pytorch_model.bin", map_location="cuda:0")
  model = GPTNeoForCausalLM.from_pretrained("gpt-neo-" + model_name, state_dict=checkpoint).half().to("cuda").eval()
  for k in list(checkpoint.keys()):
    del checkpoint[k]
  del checkpoint
else:
  from transformers.file_utils import cached_path, WEIGHTS_NAME, hf_bucket_url
  archive_file = hf_bucket_url(model_name, filename=WEIGHTS_NAME)
  resolved_archive_file = cached_path(archive_file)
  checkpoint = torch.load(resolved_archive_file, map_location="cuda:0")
  for k in checkpoint.keys():
    checkpoint[k] = checkpoint[k].half()
  model = GPTNeoForCausalLM.from_pretrained(model_name, state_dict=checkpoint).half().to("cuda").eval()
  for k in list(checkpoint.keys()):
    del checkpoint[k]
  del checkpoint
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# torch.multinomial fp16 bug is workarounded inside the transformers fork now
#if torch.cuda.get_device_properties(0).total_memory > 15000 * 1024 * 1024:
#  print("Big GPU detected, using fp32")
#  model = model.float()

Setting up model, this will take a few minutes. Don't interrupt this cell even takes a long while, or you can be left with broken, half unpacked files.
removed 'gpt-neo-2.7B-horni/all_results.json'
removed 'gpt-neo-2.7B-horni/merges.txt'
removed 'gpt-neo-2.7B-horni/pytorch_model.bin'
removed 'gpt-neo-2.7B-horni/config.json'
removed directory 'gpt-neo-2.7B-horni'
Downloading: https://drive.google.com/uc?id=1LWVp9HIyQl0FYEIDGhx5Kl3o_v8QQ2Jz
Downloading...
From: https://drive.google.com/uc?id=1LWVp9HIyQl0FYEIDGhx5Kl3o_v8QQ2Jz
To: /content/gpt-neo-2.7B-horni.tar
5.37GB [01:37, 55.3MB/s]


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
#@title Copy downloaded model to google drive (optional)
#@markdown If the model checkpoint was downloaded automatically in the previous step, you can copy it to your google drive here for more reliable access in the future
gdrive_target = "/content/drive/MyDrive/gpt-neo-2.7B-horni.tar" #@param {type:"string"}
copy_model_file = False #@param {type:"boolean"}

if copy_model_file:
  from google.colab import drive
  drive.mount('/content/drive')
  model_tar = '/content/' + model_name + ".tar"
  !cp -v $model_tar $gdrive_target

In [None]:
#@title Sampling settings (DO NOT SKIP)
#@markdown You can modify sampling settings here. Don't forget to run the cell again after changing. The number of generated tokens is subtracted from the context window size, don't set it high.
tail_free_sampling = 0.95 #@param {type:"number"}
top_k = 60 #@param {type:"number"}
top_p = 0.9 #@param {type:"number"}
temperature =  0.8#@param {type:"number"}
number_generated_tokens =  40#@param {type:"integer"}
repetition_penalty = 2.5 #@param {type:"number"}
repetition_penalty_range = 512 #@param {type:"number"}
repetition_penalty_slope = 3.33 #@param {type:"number"}
number_show_last_actions = 15 #@param {type:"integer"}

#@markdown If tail free sampling is enabled, top_p and top_k should probably not be used.
enable_tfs = False #@param {type:"boolean"}
enable_top_k = True #@param {type:"boolean"}
enable_top_p = True #@param {type:"boolean"}

if not enable_tfs:
  tail_free_sampling = None
if not enable_top_k:
  top_k = None
if not enable_top_p:
  top_p = None

#@markdown Temperatures seem to give results different from those in AID, so play around with it. Even 0.5 can give good results.

In [None]:
#@title Prevent tokens like [, <, > and { from being generated
#thanks STARSTRUCK

prevent_square_brackets = True #@param {type:"boolean"}
prevent_angle_brackets = True #@param {type:"boolean"}
prevent_curly_brackets = True #@param {type:"boolean"}

vocab = tokenizer.get_vocab()
vocab_keys = vocab.keys()
bad_keys = list()
find_keys = lambda char : [key for key in vocab_keys if key.find(char) != -1]

if prevent_square_brackets:
  bad_keys.extend(find_keys("["))
  #bad_keys.extend(find_keys("]"))

if prevent_angle_brackets:
  bad_keys.extend(find_keys("<"))
  bad_keys.extend(find_keys(">"))

if prevent_curly_brackets:
  bad_keys.extend(find_keys("{"))
  #bad_keys.extend(find_keys("}"))

bad_words_ids = list()
bad_keys_final = list()
for key in bad_keys:
  if key == "<|endoftext|>" or key in bad_keys_final:
    continue
  bad_id = vocab[key]
  bad_words_ids.append([bad_id])
  bad_keys_final.append(key)

if len(bad_words_ids) < 1:
  bad_words_ids = None

#print(f"Bad keys: {bad_keys_final} (Count: {len(bad_keys)})")
#print(f"Bad ids: {bad_words_ids}")

In [None]:
#@title Basic sampling

#@markdown Use this cell if you just want to sample from the model in a free form way.
from json import dumps, dump

def generate(prompt):
    ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cpu")
    n_ids = ids.shape[1]
    if n_ids < 1:
        n_ids = 1
        ids = torch.tensor([[tokenizer.eos_token_id]])
    max_length = n_ids + number_generated_tokens
    torch.cuda.empty_cache()
    basic_output = model.generate(
        ids.long().cuda(),
        do_sample=True,
        min_length=max_length,
        max_length=max_length,
        temperature=temperature,
        tfs = tail_free_sampling,
        top_k = top_k,
        top_p = top_p,
        repetition_penalty = repetition_penalty,
        repetition_penalty_range = repetition_penalty_range,
        repetition_penalty_slope = repetition_penalty_slope,
        use_cache=True,
        bad_words_ids=bad_words_ids,
        pad_token_id=tokenizer.eos_token_id
    ).long().to("cpu")
    torch.cuda.empty_cache()
    output=tokenizer.decode(basic_output[0])
    return output

for r in range(0,3):
    basic_prompt = text_prompt
    for i in range(iterations):
        basic_prompt = generate(basic_prompt)
        print("generated", basic_prompt)

    continuation_path = f"{output_path}/continuation_{r}"
    continuation_link = f"output/continuation_{r}/"
    !mkdir -p $continuation_path
    !cp -vr $input_path $continuation_path

    with open(f"{continuation_path}/input/text_prompt", "w") as f:
        dump(basic_prompt, f)
    !sleep 5
    

    basic_prompt = basic_prompt.replace(text_prompt,f"**Prompt**: {text_prompt}\n\n**Response {r+1}**: ")
    basic_prompt = basic_prompt + f"""


   [Continue]({continuation_link}) the text on Pollinations. 
    """
    with open(f'{output_path}/output_{r}.md', 'w') as outfile:
        outfile.write(basic_prompt)

    print("---OUTPUT---")
    print(basic_prompt)

The rays of the evening sun falling in through the window bathed the room in a soft, warm light. It was another beautiful day in May and with spring just around the corner I had my favorite view of nature's beauty. The tall, green grass and flowers were fresh and still in full bloom and
