# Pocking at Ever Larger Language Models
## An introduction for (digital) humanists


- Transition from N-Gram to Neural Language Models

- Don't count, predict (when training a language models)

- What are parameters

- Terminology LM pre-training and fine-tuning
    - Why it works better

PLM variants:
- Causal/Autoregressive language models (GPT series)
- Masked Language Models (BERT and family)

Bias in Language Models

## Text Generation with GPT-2

Materials are taken from this [Blog Post](https://huggingface.co/blog/how-to-generate).

We discuss how you can generate text with GPT-2 and discuss the effects of Language Model Fine-tuning.

In [12]:
#sequence = 'the duke of'
#sequence = 'A no deal Brexit'
sequence = 'The UK is'

In [13]:
from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')
generator(sequence, max_length = 30, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "The UK is set to spend £7b on the NHS in 2010.\n\nThe Government's spending on the NHS will probably have to increase over"},
 {'generated_text': 'The UK is a major player in helping the UN and other international agencies to reach consensus around global greenhouse gas emissions reductions. However, while it is in'},
 {'generated_text': 'The UK is currently in a tight-knit network of nations, with the EU having the biggest role in all."\n\n"It would be foolish'}]

In [14]:
from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt-brexit',tokenizer='gpt2')
generator(sequence, max_length = 30, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The UK is due to return to the European Union on 1 October.\n\nAnd, it will be up to the EU to decide exactly what a'},
 {'generated_text': 'The UK is on course for a major trade deal with the EU after Brexit but it would come at the cost of billions more pounds of investment by the'},
 {'generated_text': 'The UK is in breach of the European convention on human rights and the European Convention on Human Rights, and the result is that there is severe and serious'}]

In [301]:
#help(GPT2LMHeadModel)

In [199]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Model


tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

In [200]:
model2 = GPT2Model.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

In [201]:
input_ids = tokenizer.encode(sequence, return_tensors='pt')


In [202]:
input_ids

tensor([[   32,   645,  1730, 11435]])

In [203]:
predictions = model(**tokenizer(sequence, return_tensors='pt'))

In [204]:
predictions.logits.shape

torch.Size([1, 4, 50257])

In [205]:
tokenizer.decode(np.argmax(predictions.logits[0,-1,:].detach().numpy()))

' would'

In [206]:
greedy_output = model.generate(input_ids, max_length=50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
A no deal Brexit would be a disaster for the UK, but for the EU as a whole.

"The UK is a member of the European Union, and we are not going to be able to leave the EU without the EU's help


# Few shot leaning