# GPT-2
In this project is use the GPT-2 transformer to generate text from the Stanford Encyclopedia of Philosophy's (SEP's) entry on well-being (https://plato.stanford.edu/entries/well-being/).

This is a rought first attempt at using GPT-2 and NLP to generate philosophical text and/or answers philosophical questions.

In [2]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.0-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 29.3 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 10.9 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 22.3 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 61.2 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninsta

With the transformers package installed, I now import the packages and create tokenizer and model objects.

In [1]:
import transformers
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [2]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

In [3]:
model = GPT2LMHeadModel.from_pretrained('gpt2')

We can look under the hood of GPT-2 by calling the parameters method.

In [4]:
model.num_parameters

<bound method ModuleUtilsMixin.num_parameters of GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (1): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0

### Scraping Text
With the model set up, now I need strted text to feed the generator. I use BeautifulSoup and scrape some sentences from an entry in the Stanford Encyclopedia of philosophy. 

In [8]:
import webbrowser
from bs4 import BeautifulSoup
import pandas as pd
import urllib.request

In [9]:
url = 'https://plato.stanford.edu/entries/well-being/'
request = urllib.request.Request(url)
raw_response = urllib.request.urlopen(request).read()
html = raw_response.decode("utf-8")
soup = BeautifulSoup(html, 'html.parser')

I clean the text and remove the html tags. I only display the first 500 characters so as not to clutter the output.

In [17]:
p_tags = soup.find_all('p')
text = [tag.text for tag in p_tags]
text = ''.join(text)
text = text.split('https')[0].replace('\n',' ').strip()
text[:500]

'Well-being is most commonly used in philosophy to describe what is non-instrumentally or ultimately good for a person. The question of what well-being consists in is of independent interest, but it is of great importance in moral philosophy, especially in the case of utilitarianism, according to which the only moral requirement is that well-being be maximized. Significant challenges to the very notion have been mounted, in particular by G.E. Moore and T.M. Scanlon. It has become standard to dist'

This is a very long text - too long for the GPT-2 model I am using. So I will only take the first 123 words (I initially chose 128 words, then rounded down to the end of the previous sentence.)

In [85]:
text2 = text.split()[:123]
text2 = ' '.join(text2)
text2

'Well-being is most commonly used in philosophy to describe what is non-instrumentally or ultimately good for a person. The question of what well-being consists in is of independent interest, but it is of great importance in moral philosophy, especially in the case of utilitarianism, according to which the only moral requirement is that well-being be maximized. Significant challenges to the very notion have been mounted, in particular by G.E. Moore and T.M. Scanlon. It has become standard to distinguish theories of well-being as either hedonist theories, desire theories, or objective list theories. According to the view known as welfarism, well-being is the only value. Also important in ethics is the question of how a person’s moral character and actions relate to their well-being.'

In [118]:
len(text2)

791

This text is 791 characters long. Knowing this, I can easily split off the generated text to display.

Let's tokenize it and generate text now. I set `do_sample = True` so that the generator will sample words probabilistically. This helps to avoid getting caught in deterministic loops, with the generator always choosing the most common next word.

In [112]:
inputs = tokenizer.encode(text2, return_tensors = 'pt')

encoding = model.generate(inputs, max_length= 500, do_sample = True, temperature=1.)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The result is not bad, probably comparable in quality to and coherence to many undergraduate papers. 

In [120]:
# showing only the generated text
generated_text = tokenizer.decode(encoding[0], skip_special_tokens=True)[792:]
generated_text

'Moral philosophers argue that well-being, in essence, is defined by what our desires, or desires for things, are. As with all things, the best people are also the ones who are best at maximizing the well-being of that person (the moral philosopher). As we see above, we find such an overall picture difficult to appreciate, and perhaps hard to attain. For some reasons, the concept of well-being has been associated with this view, perhaps reflecting both its importance for moral philosophy and its problems in non-philosophy. In Moral Philosophy Aristotle\'s view of well-being can be compared with that of C.S. Lewis who says that "well-being is defined in terms of what is not well in a situation in which it becomes a question of whether we ought to give it or not." In this view the well-being of the individual is determined by the level of our value to be valued for our own good. The best individuals are the ones who maximize the well-being of both. This view maintains that good qualities

Just for fun I decided to try out another, much shorter, prompt: "What is the meaning of life?"

In [12]:
text2 = 'What is the meaning of life?'
text_ids = tokenizer.encode(text2, return_tensors = 'pt')

encoding = model.generate(text_ids,max_length= 200, do_sample = True, temperature = 1.)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [16]:
answer = tokenizer.decode(encoding[0], skip_special_tokens=True)
answer.replace('\n',' ')

'What is the meaning of life? It should be the full story of the cosmos. You can never find a place in this history of the universe not within it, a place of meaning - an "inner" realm. The whole cosmos is as it is.  What is our purpose within this journey? We are going to become a powerful person. We\'re going to become a champion of others. We\'re going to have success. We\'re going to have love. We will make the world. We\'ll have a purpose for it. It will exist.  But of course there will always have been problems. There will always be tragedies. But there WILL always be good. A good work ethic can work. Maybe in a single moment that will save you. But I am sure it will never happen to you."  The man will take many actions and do many things to help others. However, one can see why this man is so much respected, respected by all and who'