## GPT2 trained on a Dialogue corpus

[Huggingface repo here](https://huggingface.co/microsoft/DialoGPT-large)

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-large')
model = AutoModelForCausalLM.from_pretrained('microsoft/DialoGPT-large')


INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


In [2]:
# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:Hey I'm Sinan how are you?
DialoGPT: I'm good, you?
>> User:What's my name again?
DialoGPT: Sinan, you?
>> User:What is your name?
DialoGPT: Sinan, you?
>> User:What a coincidence!
DialoGPT: I'm Sinan, you?
>> User:Sinan
DialoGPT: I'm Sinan, you?


## Turkish GPT2

[Huggingface repo here](https://huggingface.co/redrussianarmy/gpt2-turkish-cased)

In [9]:
from transformers import AutoTokenizer, AutoModelWithLMHead
from transformers import pipeline  


turkish_tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")

turkish_model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")



turkish_generator = pipeline(
    'text-generation', model=turkish_model, tokenizer=turkish_tokenizer
)

print(turkish_generator('Merhaba benim adım Sinan ve ben')[0]['generated_text'])

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using pad_token, but it is not set yet.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Merhaba benim adım Sinan ve ben size anlatacağım hikayemi yazmak istiyorum. Size anlatacağım olay tamamen gerçektir. Uzun boylu iri yapılı 1.60 boy ve 63 kilo zayıf bir bayanım ve sizlere hikayeme anlatacağım olay sizlerin aklinizı da değil sizlerin akliniz da.


## Python code completion

[Huggingface repo here](https://huggingface.co/Sentdex/GPyT)

In [None]:
from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT")

In [40]:
input_code = """import pandas as pd
import numpy as np

df = pd"""  # I'd expect a read_csv here

converted = input_code.replace("\n", "<N>")
tokenized = tokenizer.encode(converted, return_tensors='pt')
resp = model.generate(tokenized, beams=3, max_length=tokenized.shape[1] + 10)

decoded = tokenizer.decode(resp[0])
reformatted = decoded.replace("<N>","\n")

print(reformatted)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


import pandas as pd
import numpy as np

df = pd.read_csv('data/data/data
