## GPT2 trained on a Dialogue corpus

[Huggingface repo here](https://huggingface.co/microsoft/DialoGPT-large)

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from transformers import pipeline  

tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-large')
model = AutoModelForCausalLM.from_pretrained('microsoft/DialoGPT-large')


In [2]:
# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:how are you?
DialoGPT: I'm good, you?
>> User:not bad. Just watching tv. do you like tv?
DialoGPT: I like it, but I don't like it as much as I used to.
>> User:what do you like now?
DialoGPT: I like a lot of things.
>> User:give me an example
DialoGPT: I like a lot of things.
>> User:fair enough
DialoGPT: I like a lot of things.


## Turkish GPT2

[Huggingface repo here](https://huggingface.co/redrussianarmy/gpt2-turkish-cased)

In [16]:
turkish_tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")

turkish_model = AutoModelForCausalLM.from_pretrained("redrussianarmy/gpt2-turkish-cased")

turkish_generator = pipeline(
    'text-generation', model=turkish_model, tokenizer=turkish_tokenizer
)

print(turkish_generator('Merhaba. Ben', max_length=5)[0]['generated_text'])  # Hi. I wouldn't know

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Merhaba. Ben Bilmem


## Python code completion

[Huggingface repo here](https://huggingface.co/Sentdex/GPyT)

In [4]:
tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
model = AutoModelForCausalLM.from_pretrained("Sentdex/GPyT")

In [5]:
input_code = """import pandas as pd
import numpy as np

df = pd"""  # I'd expect a read_csv here

converted = input_code.replace("\n", "<N>")
tokenized = tokenizer.encode(converted, return_tensors='pt')
resp = model.generate(tokenized, beams=3, max_length=tokenized.shape[1] + 10)

decoded = tokenizer.decode(resp[0])
reformatted = decoded.replace("<N>","\n")

print(reformatted)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


import pandas as pd
import numpy as np

df = pd.read_csv('data/data/data


In [6]:
# Examples inspired by https://nlpcloud.io/effectively-using-gpt-j-gpt-neo-gpt-3-alternatives-few-shot-learning.html

In [7]:
'''
https://huggingface.co/EleutherAI/gpt-neo-1.3B

GPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. 
GPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular 
pre-trained model.

GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by EleutherAI 
for the purpose of training this model. https://pile.eleuther.ai
'''

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")

gpt_neo = pipeline(
    'text-generation', model=model, tokenizer=tokenizer
)

In [8]:
# spelling correction
for result in gpt_neo("""I love goin to the beach.
Correction: I love going to the beach.
###
Let me hav it!
Correction: Let me have it!
###
It have too many drawbacks.
Correction: It has too many drawbacks.
###
I do not wan to go
Correction:""",
    max_length=75, early_stopping=True):
    print(result['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I love goin to the beach.
Correction: I love going to the beach.
###
Let me hav it!
Correction: Let me have it!
###
It have too many drawbacks.
Correction: It has too many drawbacks.
###
I do not wan to go
Correction: I do not want to go.
###


In [9]:
# intent detection
for result in gpt_neo("""I want to start coding tomorrow because it seems to be so fun!
Intent: start coding
###
Show me the last pictures you have please.
Intent: show pictures
###
Search all these files as fast as possible.
Intent: search files
###
Can you please teach me Chinese next week?
Intent:""",
    max_length=80, early_stopping=True):
    print(result['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I want to start coding tomorrow because it seems to be so fun!
Intent: start coding
###
Show me the last pictures you have please.
Intent: show pictures
###
Search all these files as fast as possible.
Intent: search files
###
Can you please teach me Chinese next week?
Intent: teach me ch


In [10]:
for result in gpt_neo("""description: a red button that says stop
code: <button style=color:white; background-color:red;>Stop</button>
###
description: a blue box that contains yellow circles with red borders
code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
###
description: a Headline saying Welcome to AI
code:""",
    max_length=150, early_stopping=True):
    print(result['generated_text'])
    


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


description: a red button that says stop
code: <button style=color:white; background-color:red;>Stop</button>
###
description: a blue box that contains yellow circles with red borders
code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
###
description: a Headline saying Welcome to AI
code: <span style=font-size: large;><h1><b>AI</b></h1>
###
description: a Headline saying Welcome


In [11]:
# I will tweak their example a litte bit to add a prompt. Some Sinan wisdom. The headline is much simpler code now :)
for result in gpt_neo("""HTML code
description: a red button that says stop
code: <button style=color:white; background-color:red;>Stop</button>
###
description: a blue box that contains yellow circles with red borders
code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
###
description: a Headline saying Welcome to AI
code:""",
    max_length=150, early_stopping=True):
    print(result['generated_text'])
    


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


HTML code
description: a red button that says stop
code: <button style=color:white; background-color:red;>Stop</button>
###
description: a blue box that contains yellow circles with red borders
code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
###
description: a Headline saying Welcome to AI
code: <h2>Welcome to AI</h2>
###
description: a circle with text saying something else
code: <circle id='
