<a href="https://colab.research.google.com/github/jamccarty/CaptionSentimentProject/blob/main/Jacqueline_McCarty_Large_Language_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring GPT-3

In this homework assignment we will walk you through how to use GPT-3 a large pre-trained neural language model developed by OpenAI.  

You will learn about the following topics:
* Prompts and completions.  You should observe that the the quality of the text generated is high quality, but not necessarially factually accurate.
* Probabilities.  You'll learn how to inspect probabilities assigned to words in the model's output.
* Few shot learning.  We'll see an example of few-shot learning with a small handful of examples.
* Zero shot learning.  We will explore the zero-shot capabilities of pre-trained LMs.  You'll design zero-shot prompts for
1. summarization
2. question-answering
3. simplification
4. translation
* How to fine tune a model.  You will learn how to fine-tune GPT-3 to take a Wikipedia infobox as input and generate the text of a biography as its ouput.  You'll then write your own code to do the reverse task – given a biography, extract the  attributes and values in the style of a Wikipedia infobox.



# Prompt Completion

As a warm-up we'll have you play with [the OpenAI Playground](https://beta.openai.com/playground).  Try inputting this prompt:

> One of my favorite professors at the University of Pennsylvania is

And the click the "Submit" button to generate a completion.

Copy and paste the text below (including your prompt).

You might notice that the text that GPT-3 generates ends mid-sentence.  GPT-3 will generate text until it either generates a special "stop sequence" token `<|endoftext|>`, or it outputs the number of tokens specified by the `maximum length` variable.
You can press Submit again to have it continue generatin, or you can increase the max length variable in the sliderbar on the right.

In [None]:
favorite_professor_completion_1 = """
My favorite professor at the University of Pennsylvania is Dr. Charlotte Emerson. I have had the pleasure of attending her lectures on a variety of topics, including history, literature, and philosophy. She is an incredibly engaging speaker who is able to make complex ideas understandable to the average student. I highly recommend her as a professor to anyone looking to gain a better understanding of the world around them!
"""

GPT-3 generates fluent text, but it is not always grounded in fact.  Let's do a Google search for the person that GPT-3 generated as your favorite professor and check
* Are they actually a professor?
* Where do they work?

In [None]:
# Extract the professor's name
professor_name_1 = "Charlotte Emerson"

# Do a Google search and answer these questions
actually_a_professor_1 = False

# Insitituion where they work
instituion_1 = "University of Florida"

When it generates its completions, GPT-3 generates each new word/token according to its probability distribution.  It draws each word at random in proportion to its propability.  That randomness means that it can generate different completions. You can re-generate and get different completions each time.

Generate another 4 completions for the professor prompt:

> One of my favorite professors at the University of Pennsylvania is

and do Google searches for them.

*Tip: You can generate another response with the Regenerate button to the right of the Submit button.  The Regenerate button has a recycle symbol on it.*

In [None]:
favorite_professor_completion_2 = """
My favorite professor at the University of Pennsylvania is Chancellorxs1. He is alwaysorganized and on top of his material. He is always willing to help out students and is always willing to engage in discussion.

Another great professor at the University of Pennsylvania is Dr. Carey Russ. He is always able to engage students in interesting discussions and is always willing to help out students.
"""

favorite_professor_completion_3 = """
My favorite professor at the University of Pennsylvania is Edward Said. I absolutely adore his lectures and readings, and can always learn something new from him. He has a very unique perspective on literature and culture, and I really enjoy hearing about his experiences as a foreign student and critic.
"""

favorite_professor_completion_4 = """
My favorite professor at the University of Pennsylvania is R. Stephen Montgomery. He is an incredible teacher, and his insights into literature and history are second to none. He has a great sense of humor, and he makes his classes both engaging and informative. I would highly recommend him to anyone looking for a gifted teacher who can help them develop their creative writing skills.
"""

favorite_professor_completion_5 = """
My favorite professor at the University of Pennsylvania is Allen Buchanan. I have had the opportunity to work with him in a coursework setting and in a research setting. I appreciate his willingness to share his knowledge, his expertise, and his experiences with his students. He is a excellent teacher and a exemplary scholar.

I have had a wonderful experience studying under Professor Buchanan. He is an outstanding educator who is passionately committed to providing his students with excellent instruction. He is always willing to work collaboratively with his students, and he is able to mobilize a wide range of resources to support their learning. He is also a very well- researched scholar, and his knowledge of the literature is constantly expanding. I have enjoyed participating in his research projects, and I have learned a great deal from his insights and experience. I would highly recommend Professor Buchanan as an instructor and a researcher.My favorite professor at the University of Pennsylvania is Allen Buchanan. I have had the opportunity to work with him in a coursework setting and in a research setting. I appreciate his willingness to share his knowledge, his expertise, and his experiences with his students. He is a excellent teacher and a exemplary scholar.

I have had a wonderful experience studying under Professor Buchanan. He is an outstanding educator who is passionately committed to providing his students with excellent instruction. He is always willing to work collaboratively with his students, and he is able to mobilize a wide range of resources to support their learning. He is also a very well- researched scholar, and his knowledge of the literature is constantly expanding. I have enjoyed participating in his research projects, and I have learned a great deal from his insights and experience. I would highly recommend Professor Buchanan as an instructor and a researcher.
"""

# Do a Google search for these professors

professor_name_2 = "Chancellorxs1"
actually_a_professor_2 = False
instituion_2 = "Unclear"

professor_name_3 = "Edward Said"
actually_a_professor_3 = False
instituion_3 = "Columbia University"

professor_name_4 = "R. Stephen Montgomery"
actually_a_professor_4 = False
instituion_4 = "University of Bristol"

professor_name_5 = "Allen Buchanan"
actually_a_professor_5 = False
instituion_5 = "University of Arizona"

## Probabilities

Just like with the n-gram language models that we stuided earlier in the course, neural language models like GPT-3 assign probabilities to each token in a sequence.  

In the playground, you can see the probabilities for the top-5 words predicted at each position by choosing the `Full Spectrum` option from the `Show probabilities` dropdown menu in the controls.  Try selecting that option and then generate a completion for the prompt

> My favorite class in the Computer Science Department was taught by Professor

If you mouse over the word after professor, you'll see something like this:
```
Joe = 8.21%
John = 4.25%
Nancy = 2.27%
David = 2.09%
Barbara = 2.05%
Total: -2.50 logprob on 1 tokens
(18.87% probability covered in top 5 logits
```

One critical observation about language models is that they often encode societal biases that appear in their data.  For instance, after the disovery that LM embeddings could be used to solve word analogy problems like "**man** is to **woman** as **king** is to ___" (the model predicts **queen**), researchers discovered that LMs had a surpisingly sexist answer to the analogy problem  "**man** is to **woman** as **computer programmer** is to ___" (the model predicts **homemaker**).  These kinds of biases are prevelant and pernicious.

Let's examine the most probable names that GPT3 assigns to different completions and analyze their gender.  We'll see if it associates different genders with different academic disciplines.  (You can also see this for different careers like *nurse*, *plumber*, or *school teacher*).

Please create dictionaries mapping GPT's predictions for the first names of professors in these departmemnts
* Computer Science
* Gender Studies
* Physics
* Linguisticss
* Bioengineering
Use the prompt:
> My favorite class in the {deparment_name} Department was taught by Professor

**Note: you can also add a stop sequence of `.` to get the model to complete only a single sentence.**



In [None]:

# Classify each name as male, female, partial word, or unknown
computer_science_genders = {
  "David" : "male",
  "John" : "male",
  "Michael" : "male",
  "S" : "partial word",
  "R" : "partial word",
  "Gerald" : "male"
}

gender_studies_genders = {
  "K" : "partial word",
  "Lisa" : "female",
  "L" : "partial word",
  "Laura" : "female",
  "Judith" : "female",
  "Carolina" : "female"
}

physics_genders = {
  "John" : "male",
  "David" : "male",
  "Emer" : "female",
  "K" : "partial word",
  "P" : "partial word",
  "Mark" : "male"
}

lingusitics_genders = {
  "John" : "male",
  "Barbara" : "female",
  "Debra" : "female",
  "K" : "partial word",
  "James" : "male",
  "Susan" : "female"
}

bioengineering_genders = {
  "John" : "male",
  "David" : "male",
  "Jian" : "male",
  "K" : "partial word",
  "S" : "partial word",
  "Reid" : "male"
}

(If you wanted to systematically explore the predictions of the model, you could use the API's logprobs argument to return the the log probabilities on the logprobs most likely tokens, as well the chosen tokens.)

# Few Shot Learning

One of the remarkable properties of large language models is a consequence of the fact that they have been trained on so much language data.  They encode that training data as background information that lets them learn new tasks and to generalize patterns using only a few examples.  This is called "Few shot learning".

Here is an example.  Imagine that we want to build a system that allows a student to say something they want to learn, and the system will recommend the subject for them to study.  Here are examples of inputs and outputs to our program:

```
how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
```

We can use these 7 examples (and probably fewer!) as a prompt to GPT-3, and it will perform few shot learning by figuring out what our pattern is, and being able to perform the task for new inputs.

Try pasting those examples into the Playground, and then listing out a few subjects to see what is output.

```
cellular respiration
respiratory disease
how to play saxophone
autonomic system
how write a screenplay
perform in a play
stock market
planetary orbits
relativity
```



Fill in the dictionary below using the playground by replacing the TODOs with the model's predictions.

In [None]:
few_shot_subject_classification_results = {
  "cellular respiration" : "biology",
  "respiratory disease" : "medical",
  "how to play saxophone" : "music",
  "autonomic system" : "overview",
  "how write a screenplay" : "physiology",
  "perform in a play" : "theater",
  "stock market" : "finance",
  "planetary orbits" : "astronomy",
  "relativity" : "theory - physics",
}

## Using the API

Now let's take a look at how to call the OpenAI API from our code, so that we don't have to manually enter inputs into the Playground.  

If you click on the "View code" button on the playground, you'll see a sample of code for whatever prompt you have.  For example, here's the code that we have for our few-shot learning that generates a subject to study for a topic that someone is interested in:

```python
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="how to program in Python - computer science\nfactors leading up to WW2 - history\nbranches of government - political science\nShakespeare's plays - English\ncellular respiration - biology\nrespiratory disease - medical\nhow to sculpt - art",
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
```
This is python code, so it'll be pretty easy for us to use this as a starting point and to modify it to create a function that we can call.


First, you'll need install the OpenAPI via pip.  You can use pip and other Unix command in a colab notebook by prefixing them with an exclamation point.  (The `%%capture` command before that just surpresses the output of running the Unix command.  You can remove it if you want to see the progress of the command).


In [1]:
%%capture
!pip install openai==0.28

Next, you will enter your secret key for the OpenAI API, then you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).  

We will enter it as a password, so that the raw text of it doesn't get saved in your Python notebook and you accidentally make your notebook public.  That would be bad because then other people could use your key and have you pay for their usage.

In [2]:
from getpass import getpass
import openai
import os

print('Enter OpenAI API key:')
openai.api_key = getpass()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
··········


Now let's write a function that takes a topic as input and then outputs a subject to study if you want to learn about that topic.

In [3]:
import openai
import os
import time

def generate_subject_few_shot(topic):
  few_shot_prompt = """how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
"""

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=few_shot_prompt + topic + " - ", # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )
  # I recommend putting a short wait after each call,
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion.
  return response['choices'][0]['text'].strip()

topic = "cellular respiration"
generate_subject_few_shot(topic)

'biology'

That's it!  That's an exampe of how to write a function call to the OpenAI API in order for it to output a subject for a topic.

Here is some information about the different arguments that we to the `openai.Completion.create` call:
 * `model` – OpenAI offers four different sized versionf of the GPT-3 model: davinci, currie, babbage and ada.  Davinci has the largest number of parameters and is [the most expensive to run](https://openai.com/api/pricing/).  Ada has the fewest parameters, is the fastest to run and is the least expensive.
 * `prompt` - this is the prompt that the model will generate a completion for
 * `temperature` - controls how much of the probability distribution the model will use when it is generating each token. 1.0 means that it samples from the complete probability distrubiton, 0.7 means that it drops the bottom 30% of the least likely tokens when it is sampling. 0.0 means that it will perform deterministically and always output the single most probable token for each context.
 * `top_p` - is an alternative way of controling the sampling.
 * `frequency_penalty` and `presence_penalty` are two ways of reduing the model from repeating the same words in one output.  You can set these to be >0 if you're seeing a lot of repetition in your output.
 * `max_tokens` is the maximum length in tokens that will be output by calling the function.  A token is a subword unit.  There are roughly 2 or 3 tokens per word on average.
 * `stop` is a list of stop sequences.  The model will stop generating output once it generates one of these strings, even if it hasn't reached the max token length. By default this is set to a special token `<|endoftext|>`.

You can read more about [the Completion API call in the documentation](https://beta.openai.com/docs/api-reference/completions).

# Zero shot learning

In addition to few shot learning, GPT-3 can sometimes also perform "zero shot learning" where instead of giving it several examples of what we want it to do, we can instead give it instructions of what we want it to do.

For example, for our topic - subject task we could give GPT-3 the prompt

> Given a topic, output the subject that a student should study if they want to know more about that topic.

Then if we append
> cellular respiration -

GPT3 will output biology.

Try to adapt the `generate_subject_few_shot` function to do a zero-shot version.

In [None]:
def generate_subject_zero_shot(prompt, link):
  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=prompt + " " + link, # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
  )

  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion.
  return response['choices'][0]['text'].strip()

A very cool recent finding is that training proceedure for large language models can be changed to improve this instruction following behavior.  If large LMs are [trained to do multiple tasks through prompting](https://arxiv.org/abs/2110.08207), they better generalize to complete new tasks in a zero-shot fashion.  The current version of GPT3 (text-davinci-2) uses this kind of training.

Try writing zero-shot prompts to do the following tasks:
1. Summarize a Wikipedia article.
2. Answer questions about an article.
3. Re-write an article so that it's suitable for a young child who is just learning how to read (age 8 or so).
4. Translate an article from Russian into English.

You should experiment with a few prompts in the playground to find a good prompt that seems to work well.

In [None]:
def summarize(article_text):
  # TODO - write this function
  prompt = "Summarize this article:"
  summary = generate_subject_zero_shot(prompt, article_text)
  return summary

def answer_question(article_text, question):
  # TODO - write this function
  answer = generate_subject_zero_shot(article_text, question)
  return answer

def simplify(article_text):
  # TODO - write a function to re-write an article so that it's suitable for a young child.
  prompt = "Rewrite this article so it would be suitable for an eight-year-old child. "
  simplified_article = generate_subject_zero_shot(article_text, prompt)
  return simplified_article

def translate(article_text, source_language, target_language):
    # TODO - write a function to translate an article from a source language to a target language.
  prompt = "Translate this article from " + source_language + " to " + target_language
  simplified_article = generate_subject_zero_shot(article_text, prompt)
  return simplified_article

Show your outputs in your prompts.  The colab notebook that you turn in should have these outputs for the TAs and professor to review.

In [None]:
article_text = """
Silly String (generically known as aerosol string) is a toy of flexible, sometimes brightly colored, plastic string propelled as a stream of liquid from an aerosol can. The solvent in the string quickly evaporates in mid-air, creating a continuous strand. Silly String is often used during weddings, birthday parties, carnivals and other festive occasions, and has also been used by the US military to detect tripwires.

Composition

Blue and pink Silly String
Silly String is made of a mixture of components dispersed throughout a liquid solvent in the product’s aerosol can. These substances include a polymer resin that provides the string’s structure, a plasticizer to tune the physical properties of the string, and a surfactant that promotes foaming of the product. Other ingredients include silicone fluid (to make the strands easier to clean up), flame retardant, and a pigment for color.[1]

A key component in Silly String is its aerosol spray can and the propellant that ejects the product mixture from the can. The product originally used chlorofluorocarbon propellant Freon 12 mixed with Freon 11, both part of a group of compounds that damage the ozone layer. In 1978, the United States banned the use of CFCs like Freon 11 and 12 in aerosol cans. The manufacturers then changed the formulation to use permitted propellants.[2] Aerosol propellants are liquids with very low boiling points. When under pressure inside the can, the propellant is in liquid form, but when the nozzle is opened, it rapidly escapes – along with the compounds mixed in it – and evaporates as it enters the air. The string takes shape as the propellant evaporates.

The product forms a string that holds itself together while remaining slightly sticky to the touch. This allows the product to weakly adhere to people and windows, for instance, but easily be cleaned up without the string falling apart or staining inert surfaces.[1]

The current formulation is not published, but one of the primary recipes in the original patent calls for 12.2% of the synthetic resin poly(isobutyl methacrylate) by weight. It additionally calls for 0.5% of the selected plasticizer, dibutyl phthalate, 2.5% of sorbitan trioleate surfactant, 0.35% silicon fluid such as dimethyl siloxane or methyl phenyl siloxane, 5.6% of flame retardant hexabromobenzene, and 2–3% pigment (all percentages by weight). The aerosol propellant represents the bulk of the product. Solubility of the resin and other materials in the product is enhanced by addition of another solvent, originally Freon 11, in 6.6% by weight.[1]

History

This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this section. Unsourced material may be challenged and removed. (December 2018) (Learn how and when to remove this template message)

Halloween revelers spray each other with Silly String
The invention of the original silly string was accidental. In 1972, A United States Patent was issued to Leonard A. Fish, an inventor, and Robert P. Cox, a chemist, for a "foamable resinous composition." The partners initially wanted to create a can of aerosol that one would be able to spray on a broken/sprained leg or arm and use as an instant cast. Their invention worked, but the two had to test 500 different types of nozzles. After testing about 30 or 40, Fish came upon one that produced a nice string, which shot about 30 feet across the room. This incident inspired Fish to turn the product into a toy. After altering the formula to be less sticky and adding colors, the pair decided to market their product. Because neither of them knew how to sell toys, they made an appointment with Wham-O. Fish described how, during that meeting, he sprayed the can all over the person he was meeting with and all over his office. This person became very upset and asked him to leave the premises. One day later, Fish received a telegram asking him to send 24 cans of "Squibbly" for a market test immediately, signed by the same individual who had kicked him out. He called them back and explained that, after he had finished cleaning up his office, the two owners of Wham-O had come back to talk to him, and one had noticed a piece of the string on a lamp shade he had overlooked while cleaning up. He explained where the string came from and the owners quickly asked him to send samples over for a market test. Two weeks later, Wham-O signed a contract with Fish and Cox to license the product now known as Silly String.

Silly String was licensed to and produced by Wham-O, in a range of colors including blue, red and green, until the Car-Freshner Corporation, the maker of Little Trees, acquired the Silly String trademark in 1997. Silly String Products, a division of Car-Freshner Corporation, manufactures Silly String in the United States and distributes Silly String in North America. The U.S. Patent #3705669 includes a description of preferred implementations.[3] Similar toys are Goofy String, Streamer String, Wacky String and Nickelodeon Smatter.

Safety
In December 2006, Tween Brands Inc., a retailer of girls' clothing and accessories in the United States, was fined $109,800 by the United States Environmental Protection Agency for "allegedly distributing canned confetti string damaging to the ozone". EPA said that the product marketed under various names by the retailer damages the stratospheric ozone layer. The production and use of chemicals harmful to that layer is controlled by U.S. federal law.[4]

Military use
Silly String and similar products have been used by the military to detect tripwires for explosive booby traps. The string is sprayed in the air over the area, revealing hidden tripwires by catching on them as it falls. The string is light enough that it does not break the wires and trigger the explosive.[5][6][7]

In 2006, it was being used by U.S. troops in Iraq for tripwire detection.[8][9][10] However, because the material is an aerosol, it could not be shipped privately to Iraq and is not provided by official channels. Thus, 80,000 cans were unintentionally stockpiled in New Jersey.[11] In October 2007, a shipping company with the required credentials was able to send the silly string overseas.[12]

Bans in the US

Sign in Los Angeles prohibiting the use of Silly String on Halloween night, punishable by a $1000 fine
The use of aerosol string products has been banned in several places for various reasons, including cleanup and removal costs and fears of potential damage to house or vehicle paint.

It has been banned in the city of Ridgewood, New Jersey, and a number of other places, and also at some public gatherings and events.[13] The town board of Huntington on Long Island banned the sale of Silly String within 1,500 feet (460 m) of the route of a parade.[14][15] In 2001, the town of Middleborough, Massachusetts, banned Silly String; offenders face a $300 fine.[16]

In 2004, Los Angeles enacted a city ordinance (LAMC Section 56.02) to ban aerosol string in Hollywood on Halloween night.[17][18]
"""

summarize(article_text)

''

In [None]:
article_text = """
Silly String (generically known as aerosol string) is a toy of flexible, sometimes brightly colored, plastic string propelled as a stream of liquid from an aerosol can. The solvent in the string quickly evaporates in mid-air, creating a continuous strand. Silly String is often used during weddings, birthday parties, carnivals and other festive occasions, and has also been used by the US military to detect tripwires.

Composition

Blue and pink Silly String
Silly String is made of a mixture of components dispersed throughout a liquid solvent in the product’s aerosol can. These substances include a polymer resin that provides the string’s structure, a plasticizer to tune the physical properties of the string, and a surfactant that promotes foaming of the product. Other ingredients include silicone fluid (to make the strands easier to clean up), flame retardant, and a pigment for color.[1]

A key component in Silly String is its aerosol spray can and the propellant that ejects the product mixture from the can. The product originally used chlorofluorocarbon propellant Freon 12 mixed with Freon 11, both part of a group of compounds that damage the ozone layer. In 1978, the United States banned the use of CFCs like Freon 11 and 12 in aerosol cans. The manufacturers then changed the formulation to use permitted propellants.[2] Aerosol propellants are liquids with very low boiling points. When under pressure inside the can, the propellant is in liquid form, but when the nozzle is opened, it rapidly escapes – along with the compounds mixed in it – and evaporates as it enters the air. The string takes shape as the propellant evaporates.

The product forms a string that holds itself together while remaining slightly sticky to the touch. This allows the product to weakly adhere to people and windows, for instance, but easily be cleaned up without the string falling apart or staining inert surfaces.[1]

The current formulation is not published, but one of the primary recipes in the original patent calls for 12.2% of the synthetic resin poly(isobutyl methacrylate) by weight. It additionally calls for 0.5% of the selected plasticizer, dibutyl phthalate, 2.5% of sorbitan trioleate surfactant, 0.35% silicon fluid such as dimethyl siloxane or methyl phenyl siloxane, 5.6% of flame retardant hexabromobenzene, and 2–3% pigment (all percentages by weight). The aerosol propellant represents the bulk of the product. Solubility of the resin and other materials in the product is enhanced by addition of another solvent, originally Freon 11, in 6.6% by weight.[1]

History

This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this section. Unsourced material may be challenged and removed. (December 2018) (Learn how and when to remove this template message)

Halloween revelers spray each other with Silly String
The invention of the original silly string was accidental. In 1972, A United States Patent was issued to Leonard A. Fish, an inventor, and Robert P. Cox, a chemist, for a "foamable resinous composition." The partners initially wanted to create a can of aerosol that one would be able to spray on a broken/sprained leg or arm and use as an instant cast. Their invention worked, but the two had to test 500 different types of nozzles. After testing about 30 or 40, Fish came upon one that produced a nice string, which shot about 30 feet across the room. This incident inspired Fish to turn the product into a toy. After altering the formula to be less sticky and adding colors, the pair decided to market their product. Because neither of them knew how to sell toys, they made an appointment with Wham-O. Fish described how, during that meeting, he sprayed the can all over the person he was meeting with and all over his office. This person became very upset and asked him to leave the premises. One day later, Fish received a telegram asking him to send 24 cans of "Squibbly" for a market test immediately, signed by the same individual who had kicked him out. He called them back and explained that, after he had finished cleaning up his office, the two owners of Wham-O had come back to talk to him, and one had noticed a piece of the string on a lamp shade he had overlooked while cleaning up. He explained where the string came from and the owners quickly asked him to send samples over for a market test. Two weeks later, Wham-O signed a contract with Fish and Cox to license the product now known as Silly String.

Silly String was licensed to and produced by Wham-O, in a range of colors including blue, red and green, until the Car-Freshner Corporation, the maker of Little Trees, acquired the Silly String trademark in 1997. Silly String Products, a division of Car-Freshner Corporation, manufactures Silly String in the United States and distributes Silly String in North America. The U.S. Patent #3705669 includes a description of preferred implementations.[3] Similar toys are Goofy String, Streamer String, Wacky String and Nickelodeon Smatter.

Safety
In December 2006, Tween Brands Inc., a retailer of girls' clothing and accessories in the United States, was fined $109,800 by the United States Environmental Protection Agency for "allegedly distributing canned confetti string damaging to the ozone". EPA said that the product marketed under various names by the retailer damages the stratospheric ozone layer. The production and use of chemicals harmful to that layer is controlled by U.S. federal law.[4]

Military use
Silly String and similar products have been used by the military to detect tripwires for explosive booby traps. The string is sprayed in the air over the area, revealing hidden tripwires by catching on them as it falls. The string is light enough that it does not break the wires and trigger the explosive.[5][6][7]

In 2006, it was being used by U.S. troops in Iraq for tripwire detection.[8][9][10] However, because the material is an aerosol, it could not be shipped privately to Iraq and is not provided by official channels. Thus, 80,000 cans were unintentionally stockpiled in New Jersey.[11] In October 2007, a shipping company with the required credentials was able to send the silly string overseas.[12]

Bans in the US

Sign in Los Angeles prohibiting the use of Silly String on Halloween night, punishable by a $1000 fine
The use of aerosol string products has been banned in several places for various reasons, including cleanup and removal costs and fears of potential damage to house or vehicle paint.

It has been banned in the city of Ridgewood, New Jersey, and a number of other places, and also at some public gatherings and events.[13] The town board of Huntington on Long Island banned the sale of Silly String within 1,500 feet (460 m) of the route of a parade.[14][15] In 2001, the town of Middleborough, Massachusetts, banned Silly String; offenders face a $300 fine.[16]

In 2004, Los Angeles enacted a city ordinance (LAMC Section 56.02) to ban aerosol string in Hollywood on Halloween night.[17][18]
"""
questions = [
    "Who invented silly string?",
    "What is silly string's connection to the U.S. military?",
    "What is one controversy surrounding silly string?",
    "How was silly string invented?",
    "What year was silly string invented?",
]

for question in questions:
  answer = answer_question(article_text, question)
  print(question)
  print(answer)
  print('---')


TODO - add questinon 1

---
TODO - add questinon 2

---
TODO - add questinon 3

---
TODO - add questinon 4

---
TODO - add questinon 5

---


In [None]:
article_text = """
Silly String (generically known as aerosol string) is a toy of flexible, sometimes brightly colored, plastic string propelled as a stream of liquid from an aerosol can. The solvent in the string quickly evaporates in mid-air, creating a continuous strand. Silly String is often used during weddings, birthday parties, carnivals and other festive occasions, and has also been used by the US military to detect tripwires.

Composition

Blue and pink Silly String
Silly String is made of a mixture of components dispersed throughout a liquid solvent in the product’s aerosol can. These substances include a polymer resin that provides the string’s structure, a plasticizer to tune the physical properties of the string, and a surfactant that promotes foaming of the product. Other ingredients include silicone fluid (to make the strands easier to clean up), flame retardant, and a pigment for color.[1]

A key component in Silly String is its aerosol spray can and the propellant that ejects the product mixture from the can. The product originally used chlorofluorocarbon propellant Freon 12 mixed with Freon 11, both part of a group of compounds that damage the ozone layer. In 1978, the United States banned the use of CFCs like Freon 11 and 12 in aerosol cans. The manufacturers then changed the formulation to use permitted propellants.[2] Aerosol propellants are liquids with very low boiling points. When under pressure inside the can, the propellant is in liquid form, but when the nozzle is opened, it rapidly escapes – along with the compounds mixed in it – and evaporates as it enters the air. The string takes shape as the propellant evaporates.

The product forms a string that holds itself together while remaining slightly sticky to the touch. This allows the product to weakly adhere to people and windows, for instance, but easily be cleaned up without the string falling apart or staining inert surfaces.[1]

The current formulation is not published, but one of the primary recipes in the original patent calls for 12.2% of the synthetic resin poly(isobutyl methacrylate) by weight. It additionally calls for 0.5% of the selected plasticizer, dibutyl phthalate, 2.5% of sorbitan trioleate surfactant, 0.35% silicon fluid such as dimethyl siloxane or methyl phenyl siloxane, 5.6% of flame retardant hexabromobenzene, and 2–3% pigment (all percentages by weight). The aerosol propellant represents the bulk of the product. Solubility of the resin and other materials in the product is enhanced by addition of another solvent, originally Freon 11, in 6.6% by weight.[1]

History

This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this section. Unsourced material may be challenged and removed. (December 2018) (Learn how and when to remove this template message)

Halloween revelers spray each other with Silly String
The invention of the original silly string was accidental. In 1972, A United States Patent was issued to Leonard A. Fish, an inventor, and Robert P. Cox, a chemist, for a "foamable resinous composition." The partners initially wanted to create a can of aerosol that one would be able to spray on a broken/sprained leg or arm and use as an instant cast. Their invention worked, but the two had to test 500 different types of nozzles. After testing about 30 or 40, Fish came upon one that produced a nice string, which shot about 30 feet across the room. This incident inspired Fish to turn the product into a toy. After altering the formula to be less sticky and adding colors, the pair decided to market their product. Because neither of them knew how to sell toys, they made an appointment with Wham-O. Fish described how, during that meeting, he sprayed the can all over the person he was meeting with and all over his office. This person became very upset and asked him to leave the premises. One day later, Fish received a telegram asking him to send 24 cans of "Squibbly" for a market test immediately, signed by the same individual who had kicked him out. He called them back and explained that, after he had finished cleaning up his office, the two owners of Wham-O had come back to talk to him, and one had noticed a piece of the string on a lamp shade he had overlooked while cleaning up. He explained where the string came from and the owners quickly asked him to send samples over for a market test. Two weeks later, Wham-O signed a contract with Fish and Cox to license the product now known as Silly String.

Silly String was licensed to and produced by Wham-O, in a range of colors including blue, red and green, until the Car-Freshner Corporation, the maker of Little Trees, acquired the Silly String trademark in 1997. Silly String Products, a division of Car-Freshner Corporation, manufactures Silly String in the United States and distributes Silly String in North America. The U.S. Patent #3705669 includes a description of preferred implementations.[3] Similar toys are Goofy String, Streamer String, Wacky String and Nickelodeon Smatter.

Safety
In December 2006, Tween Brands Inc., a retailer of girls' clothing and accessories in the United States, was fined $109,800 by the United States Environmental Protection Agency for "allegedly distributing canned confetti string damaging to the ozone". EPA said that the product marketed under various names by the retailer damages the stratospheric ozone layer. The production and use of chemicals harmful to that layer is controlled by U.S. federal law.[4]

Military use
Silly String and similar products have been used by the military to detect tripwires for explosive booby traps. The string is sprayed in the air over the area, revealing hidden tripwires by catching on them as it falls. The string is light enough that it does not break the wires and trigger the explosive.[5][6][7]

In 2006, it was being used by U.S. troops in Iraq for tripwire detection.[8][9][10] However, because the material is an aerosol, it could not be shipped privately to Iraq and is not provided by official channels. Thus, 80,000 cans were unintentionally stockpiled in New Jersey.[11] In October 2007, a shipping company with the required credentials was able to send the silly string overseas.[12]

Bans in the US

Sign in Los Angeles prohibiting the use of Silly String on Halloween night, punishable by a $1000 fine
The use of aerosol string products has been banned in several places for various reasons, including cleanup and removal costs and fears of potential damage to house or vehicle paint.

It has been banned in the city of Ridgewood, New Jersey, and a number of other places, and also at some public gatherings and events.[13] The town board of Huntington on Long Island banned the sale of Silly String within 1,500 feet (460 m) of the route of a parade.[14][15] In 2001, the town of Middleborough, Massachusetts, banned Silly String; offenders face a $300 fine.[16]

In 2004, Los Angeles enacted a city ordinance (LAMC Section 56.02) to ban aerosol string in Hollywood on Halloween night.[17][18]
"""

simplify(article_text)

''

In [None]:
russian_article = """
Анто́н Па́влович Че́хов (дореф. Антонъ Павловичъ Чеховъ) (17 (29) января 1860, Таганрог, Екатеринославская губерния (ныне Ростовская область), Российская империя — 2 (15) июля 1904, Баденвайлер, Германская империя[3][4]) — русский писатель, прозаик, драматург, публицист[5], врач, общественный деятель в сфере благотворительности[6][7][8].

Классик мировой литературы. Почётный академик Императорской академии наук по разряду изящной словесности (1900—1902). Один из самых известных драматургов мира. Его произведения переведены более чем на сто языков. Его пьесы, в особенности «Чайка», «Три сестры» и «Вишнёвый сад» на протяжении более ста лет ставятся во многих театрах мира.

За 25 лет творчества — с момента выпуска из гимназии в 1879 году вплоть до кончины в июле 1904 года — Чехов создал более пятисот различных произведений (коротких юмористических рассказов, фельетонов, серьёзных рассказов и повестей, пьес), многие из которых стали классикой мировой литературы. Особенное внимание обратили на себя повести «Степь», «Скучная история», «Дуэль», «Палата № 6» (1892), «Дом с мезонином» (1896), «Душечка», «Попрыгунья», «Рассказ неизвестного человека», «Мужики», «Человек в футляре» (1897-98), «В овраге», «Детвора», «Драма на охоте»; пьесы «Чайка» (1895), «Дядя Ваня» (1899), «Три сестры» (1901—1903), «Вишнёвый сад» (1903—1904).

Помимо литературной и врачебной работы Чехов придавал огромное значение благотворительной деятельности в сфере помощи голодающим[9], детям[6], крестьянам[10], туберкулёзным больным[7], был Уполномоченным Правления Ялтинского благотворительного общества[8], организовывал сборы средств в пользу нуждающихся и регулярно публиковал в газетах тексты, посвящённые положению социально уязвимых групп населения в России[5].
"""

source_language = "Russian"
target_language = "English"
translate(russian_article, source_language, target_language)

''

## TODO - Pick your own task

For this section you should pick some task that you'd like to have GPT3 do.  Add a description and code to your notebook here.  You should:
1. Write a short description of what task you tried, why you were interested in it.
2. Give some code so that we can reproduce what you did via an Open API call.  You should include output of your code in the Python Notebook that you turned in.
3. Write a short qualitative analysis of whether or not GPT3 did the task well.

TODO - your task description

In [None]:
# task description: I wanted ChatGPT to generate a list of gift ideas
# for one of my friends, as her birthday is coming up soon

# code
def generate_gift_ideas(interests, friend_name):
  # TODO - write this function
  prompt = f"Here are some of my friend {friend_name}'s interests:"
  i = 0
  for interest in interests:
    i += 1
    if i < len(interests):
      prompt = prompt + " " + interest + ", "
    else:
      prompt = prompt + " " + interest + ". "
  prompt = prompt + f"Generate a list of gift ideas for this {friend_name}."
  gift_ideas = generate_subject_zero_shot(prompt, "")
  return gift_ideas

alex_interests = ["painting", "neon/alternative fashion", "podcasts", "alternative music"]
generate_gift_ideas(alex_interests, "Alex")

generated_gifts = '''1. A set of high-quality paints and brushes
2. A gift certificate to a neon/alternative fashion store
3. A subscription to a popular painting or art history podcast
4. A gift certificate to an alternative music store or concert tickets to see a favorite band'''

qualitative_analysis = '''
I'd say that this is a fairly decent gift idea list. There's a good level of
association between the interests and inputted and the ideas ChatGPT came up
with, and all of them were feasible gift ideas.
'''

TODO - write a short paragraph giving your qualitative analysis of how well GPT3 did for your task.

# Fine Tuning

In addition to zero-shot and few-shot learning, another way of getting large language models to do your tasks is via a process called "fine tuning".  In fine-tuning the model updates its parameters so that it performs well on many training examples.  The training examples are in the form of input prompts paired with gold standard completions.

Large language models are pre-trained to perform well on general tasks like text completion but not on the specific task that you might be interested in.  The models can be fine tuned to perform you task, starting with the model parameters that are good for the general setting, and then updating them to be good for your task.

We'll walk through how to fine-tune GPT3 for a task.


For this example, we will show you how to fine tune GPT3 to write biographies. From data in the info boxes in Wikipedia pages.  For instance, given this input

```
notable_type: scientist
name: Zulima Aban
gender: female
birth_date: 05 December 1905
birth_place: Valencia, Spain
death_date: 09 August 1983
death_place: Detroit, Michigan, U.S.
death_cause: Pulmonary embolism
occupation: Astronomer
fields: Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis
known_for: The Search for Planet Nine
hometown: Detroit, Michigan, U.S.
nationality: Venezuelan
citizenship: Spanish, American
alma_mater: University of Valencia (B.Sc.), University of Madrid (Ph.D.)
thesis_title: The Formation of Planets by the Accretion of Small Particles
thesis_year: 1956
doctoral_advisor: Angela Carter
awards: Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics
institutions: Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI)
notable_students: Ryan Walls
influences: Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington
influenced: Joseph Weinberg
mother: Ana Aban
father: Joaquín Aban
partner: Georgina Abbott
children: Robert, Peter, Sarah
```

The fine-tuned model will generate this output:

> Zulima Aban was a Venezuelan astronomer, who was born on 05 December 1905 in Valencia, Spain to Ana Aban and Joaquín Aban. Her career involved the fields of Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis. Aban was known for The Search for Planet Nine. Aban went to University of Valencia (B.Sc.), University of Madrid (Ph.D.). Aban's thesis title was The Formation of Planets by the Accretion of Small Particles in 1956. Her doctoral advisor was Angela Carter. Aban received Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics. Aban went to Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI). Her notable students were Ryan Walls. Aban was influenced by Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington and she infuenced Joseph Weinberg. Aban was married to Georgina Abbott and together had three children, Robert, Peter, Sarah. Aban died on 09 August 1983 in Detroit, Michigan, U.S due to Pulmonary embolism.

The dataset that we will use was created for the paper [SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets](https://www.cis.upenn.edu/~ccb/publications/synthbio.pdf) by Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, and Sebastian Gehrmann. It was published in NeurIPS 2021.  The goal of the paper was to create a curated dataset for training large language models on synthetic data with the goal of avoiding the gender and geographic bias that is naturally present in Wikipedia due to cultural and historic reasons.


## Load the data

In [4]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json

--2023-12-11 14:48:52--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5807118 (5.5M) [text/plain]
Saving to: ‘SynthBio_train.json’


2023-12-11 14:48:52 (56.2 MB/s) - ‘SynthBio_train.json’ saved [5807118/5807118]



In [5]:
# Load a file called 'SynthBio.json' which is a list of json objects.
# Pretty the first 5 json examples, nicely formatted.

import json
import random

def load_wiki_bio_data(filename='SynthBio_train.json', num_bios=100, randomized=True):
  with open(filename) as f:
    synth_bio_data = json.load(f)
  random.shuffle(synth_bio_data)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = "notable_type: {notable_type} | {other_attributes}".format(
        notable_type = notable_type,
        other_attributes = data['serialized_attrs']
    )
    biography = data['biographies'][0]
    bios.append((attributes.replace(" | ", "\n"), biography))
  return bios[:min(num_bios, len(bios))]

wiki_bios = load_wiki_bio_data()


In [6]:
attributes, bio = wiki_bios[0]
print(attributes)
print('---')
bio


notable_type: spy
name: Besnik Ismaili
gender: male
nationality: Albanian
birth_date: 03 January 1918
birth_place: Tirana, Albania
death_date: 28 May 2001
death_place: Geneva, Switzerland
death_cause: heart attack
serviceyears: 1954-2000
known_for: espionage within Albania, East Germany, Italy, the United States, and Yugoslavia
alma_mater: University of Cambridge
occupation: government agent, then spy
codename: The Ghost
allegiance: Albania
agency: Albanian intelligence services
mother: Adea Ismaili
father: Faik Ismaili
partner: Mariana Ismaili
---


'Besnik Ismaili was Born in Tirana, Albania. Their parents were Adea Ismaili and Faik Ismaili and they were married to Mariana Ismaili. Besnik Ismaili died on 28 May 2001 of a heart attack in Geneva, Switzerland. He attended the University of Cambridge and was known for espionage within Albania, East Germany, Italy, the United States, and Yugoslavia and was active between 1954-2000. They were a government agent, then spy for the Albanian intelligence services. Ismail was known as "The Ghost".'

## Format Data for Fine-Tuning

Below, I show how to format data to fine-tune OpenAI.  The OpenAI API documentation has a [guide to fine-tuning models](https://beta.openai.com/docs/guides/fine-tuning) that you should read.   The basic format of fine-tuning data is a JSONL file (one JSON object per line) with two key-value pairs: `prompt:` and `completion:`.

```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
```

In the code below, I'll extract a prompt that contains the `attributes` variable from the intent dtermination data, and I'll have the completion be the `biography` variable.

In [None]:
import json

def create_wikibio_finetuning_data(wikibios, fine_tuning_filename):
  fine_tuning_data = []

  for attributes, bio in wiki_bios:
    prompt = "{attributes}\n---\n".format(attributes=attributes)
    completion = "Biography: {bio}\n###".format(bio=bio)
    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')


fine_tuning_filename='wikibio_finetuning_data.jsonl'
create_wikibio_finetuning_data(wiki_bios, fine_tuning_filename)

Next, we'll perform fine-tuning with this data using OpenAI.

In [7]:
%%capture
!pip install jsonlines
!pip install wandb

Once you've got access to the OpenAI API, you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).

In [None]:
import os
import openai

from getpass import getpass
print('Enter OpenAI API key:')
openai.api_key = getpass()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
··········


In [None]:
!head '{fine_tuning_filename}'

{"prompt": "notable_type: artist\nname: Ulf Wenger\ngender: male\nnationality: Swiss\nbirth_date: 28 July 1939\nbirth_place: Kerns, Switzerland\ndeath_date: February 9 2007\ndeath_place: Einsiedeln, Switzerland\ndeath_cause: pneumonia\nresting_place: Einsiedeln Abbey\nknown_for: installations, sculptures\nnotable_works: The Angel in the Garden of Eden (1989), The Devil in the Garden of Eden (1988)\nmovement: art na\u00eff\nalma_mater: Hochschule f\u00fcr Gestaltung, Zurich\nawards: Medaillen von der Ville de Gen\u00e8ve\nmother: Elisabeth Wenger\nfather: Karl Wenger\npartner: Marianne Wenger\nchildren: Laurent Wenger\n---\n", "completion": "Biography: Ulf Wenger (28 July 1939 - 9 February 2007) was a Swiss artist known for his installations, sculptures, and drawings. Wenger was born in Kerns, Switzerland and attended the Hochschule f\u00fcr Gestaltung in Zurich. He later married fellow artist Marianne Wenger and had one child. His notable works were The Angel in the Garden of Eden (198

## Run the fine-tuning API

Next, we'll make the fine tuning API call via the command line.  Here the -m argument gives the model.  There are 4 sizes of GPT3 models.  They go in alphabetical order from smallest to largest.
* Ada
* Baddage
* Currie
* Davinci

The models as the model sizes increase, so does their quality and their cost.  Davinci is the highest quality and highest cost model.  I recommend starting by fine-tuning smaller models to debug your code first so that you don't rack up costs.  Once you're sure that your code is working as expected then you can fine-tune a davinci model.


In [None]:
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m curie
#!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Upload progress:   0% 0.00/130k [00:00<?, ?it/s]Upload progress: 100% 130k/130k [00:00<00:00, 164Mit/s]
Uploaded file from wikibio_finetuning_data.jsonl: file-yRo5rJS4ro3tGKzVEHiRLrVv
Created fine-tune: ft-1qLctROUPbAjyEvCSzdnTRes
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-12-11 05:26:40] Created fine-tune: ft-1qLctROUPbAjyEvCSzdnTRes
[2023-12-11 05:26:45] Fine-tune costs $0.40
[2023-12-11 05:26:45] Fine-tune enqueued. Queue number: 0
[2023-12-11 05:26:46] Fine-tune started



You should copy down the fine-tune numbers which look like this:

```
Created fine-tune: ft-kloUh0jjVc6Jv8p9MfeGHd3s

[2022-08-06 00:43:56] Uploaded model: davinci:ft-ccb-lab-members-2022-08-06-00-57-57
```

If you forget to write it down, you can list your fine-tuned runs and models this way. These model names aren't mneumonic, so it is probably a good idea to make a note on what your model's inputs and outputs are.

In [None]:
!openai api fine_tunes.list

{
  "object": "list",
  "data": [
    {
      "object": "fine-tune",
      "id": "ft-zOzyIq4La0P1UKyXVEduY3oP",
      "hyperparams": {
        "n_epochs": 4,
        "batch_size": 1,
        "prompt_loss_weight": 0.01,
        "learning_rate_multiplier": 0.1
      },
      "organization_id": "org-S2GABm72ZVab4vrzvrnDm53m",
      "model": "curie",
      "training_files": [
        {
          "object": "file",
          "id": "file-1Lo8Uz0m0cr3VjP3W82xqFgx",
          "purpose": "fine-tune",
          "filename": "wikibio_finetuning_data.jsonl",
          "bytes": 123307,
          "created_at": 1702269624,
          "status": "processed",
          "status_details": null
        }
      ],
      "validation_files": [],
      "result_files": [
        {
          "object": "file",
          "id": "file-wFRBS6pzP6Ite6BG4petP33H",
          "purpose": "fine-tune-results",
          "filename": "compiled_results.csv",
          "bytes": 22111,
          "created_at": 1702269789,
          

You can run your fine tuned model in the OpenAI Playground.  After the model is finished finetuning you'll find it in the Engine dropdown menu (you might need to press reload in your browser for your fine-tuned model to appear).

## Call your fine-tuned model from the OpenAI API

Alternately, you can use your fine tuned model via the API by specifying it as the model.  Here's an example:

In [None]:
def generate_bio(attributes, finetuned_model):
  response = openai.Completion.create(
      model=finetuned_model,
      prompt="{attributes}\n---\n".format(attributes=attributes),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["###"]
      )
  return response['choices'][0]['text'].strip()

# Replace with your model's name
finetuned_model = "curie:ft-upenn-2023-12-11-04-43-08"

In [None]:
attributes = """
notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania
"""

biography = generate_bio(attributes, finetuned_model)
print(attributes)
print('---')
biography


notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania

---


'Biography: Chris Callison-Burch is an American professor at the University of Pennsylvania. He received his BS in Symbolic Systems from Stanford University and PhD in Informatics from the University of Edinburgh. His notable works include Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB). He has also taught AI, Crowdsourcing and NLP. He is enrolled in the most popular course 570 students. He has also worked at the University of Pennsylvania. His occupation is professor. He has also taught at the University of Pennsylvania. He has also worked at the University of Pennsylvania.'

## Analyze your model's output

Sometimes the model will add facts that are not present in the attributes.  For instance, one time it said
> He was a member of the research staff at IBM Research in Yorktown Heights.

which is not correct. Another time it said
> His most popular course was on AI, which had 570 students.

which is correct, but not specified in the attirbutes.

Try running your own fine-tuned model until it produces something that wasn't licensed by the attributes.

Save the good runs and the bad run below.

In [None]:
generations_with_correct_facts = [
   """Biography: Chris Callison-Burch is an American professor at the University
   of Pennsylvania. He is known for his work in artificial intelligence,
   natural language processing and crowdsourcing. He received his BS in
   Symbolic Systems from Stanford University, and his PhD in Informatics
   from the University of Edinburgh. His notable works include Moses:
   Open source toolkit for statistical machine translation, The Paraphrase
   Database (PPDB). He is enrolled in the most popular course 570 students
   and he taught AI, Crowdsourcing and NLP. He is the institution at
   University of Pennsylvania and his main interests are Artificial
   Intelligence, Natural Language Processing.""",
   """Biography: Chris Callison-Burch is an American professor at
   the University of Pennsylvania. He received his BS in Symbolic
   Systems from Stanford University and PhD in Informatics from the
   University of Edinburgh. His notable works include Moses: Open source
   toolkit for statistical machine translation, The Paraphrase Database
   (PPDB). He has also taught AI, Crowdsourcing and NLP. He is enrolled
   in the most popular course 570 students. He has also worked at the
   University of Pennsylvania. His occupation is professor. He has also
   taught at the University of Pennsylvania. He has also worked at the
   University of Pennsylvania.""",
                       ]

generation_with_incorrect_facts_= """
Biography: Chris Callison-Burch was born in California in 1973.
He attended Stanford University and the University of Edinburgh,
where he received a BS in Symbolic Systems, a PhD in Informatics,
and a post-doc. He is a professor at the University of Pennsylvania.
He taught AI, Crowdsourcing and NLP. He is the creator of Moses:
Open source toolkit for statistical machine translation, The Paraphrase
Database (PPDB). His notable works are Moses: Open source toolkit for
statistical machine translation, The Paraphrase Database (PPDB). He is the
member of American nationality. He enrolled in 570 students in most popular
course is AI. Chris Callison-Burch is married to Janet Callison-Burch.
"""

incorrect_facts = [
    """date of birth: 1973 was never specified. His marriage to Janet
    Callison-Burch is never specified in the attributes""",
]

# Fine Tune a New Model

Now that you've seen an example of how to do fine-tuning with the OpenAI API, let's have you write code to fine-tune your own model.

For this model, I'd like you to do the reverse direction of what we just did.  Given a Wikipedia Biograph like this:

> Jill Tracy Jacobs Biden (born June 3, 1951) is an American educator and the current first lady of the United States as the wife of President Joe Biden. She was the second lady of the United States from 2009 to 2017. Since 2009, Biden has been a professor of English at Northern Virginia Community College.

> She has a bachelor's degree in English and a doctoral degree in education from the University of Delaware, as well as master's degrees in education and English from West Chester University and Villanova University. She taught English and reading in high schools for thirteen years and instructed adolescents with emotional disabilities at a psychiatric hospital. From 1993 to 2008, Biden was an English and writing instructor at Delaware Technical & Community College. Biden is thought to be the first wife of a vice president or president to hold a paying job during her husband's tenure.

> Born in Hammonton, New Jersey, she grew up in Willow Grove, Pennsylvania. She married Joe Biden in 1977, becoming stepmother to Beau and Hunter, his two sons from his first marriage. Biden and her husband also have a daughter together, Ashley Biden, born in 1981. She is the founder of the Biden Breast Health Initiative non-profit organization, co-founder of the Book Buddies program, co-founder of the Biden Foundation, is active in Delaware Boots on the Ground, and with Michelle Obama is co-founder of Joining Forces. She has published a memoir and two children's books.

Your model should output something like this:
```
notable_type: First Lady of the United States
name: Jill Biden
gender: female
nationality: American
birth_date: 03 June 1951
birth_place: Hammonton, New Jersey
alma_mater: University of Delaware
occupation: professor of English at Northern Virginia Community College
notable_works: children's books and memoir
main_interests: education, literacy, women's health
partner: Joe Biden
children: Ashley Biden, Beau Biden (stepson), Hunter Biden (stepson)
```


In [9]:
import json

def create_wikibio_parser_finetuning_data(wikibios, fine_tuning_filename):
  fine_tuning_data = []

  for attributes, bio in wiki_bios:
    completion = "{attributes}\n---\n".format(attributes=attributes)
    prompt = "Biography: {bio}\n###".format(bio=bio)
    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')

fine_tuning_filename='wikibio_parser_finetuning_data.jsonl'
create_wikibio_parser_finetuning_data(wiki_bios, fine_tuning_filename)

In [10]:
# !openai api fine_tunes.create -t '{fine_tuning_filename}' -m ada
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Upload progress:   0% 0.00/130k [00:00<?, ?it/s]Upload progress: 100% 130k/130k [00:00<00:00, 169Mit/s]
Uploaded file from wikibio_parser_finetuning_data.jsonl: file-43LkiOWDC3SS4yCpZftHdgOS
Created fine-tune: ft-6XcsEFYQf9HpgwxtrOaA3Rge
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-12-11 14:51:21] Created fine-tune: ft-6XcsEFYQf9HpgwxtrOaA3Rge
[2023-12-11 14:51:27] Fine-tune costs $4.01
[2023-12-11 14:51:27] Fine-tune enqueued. Queue number: 0
[2023-12-11 14:51:29] Fine-tune started



In [15]:
!openai api fine_tunes.list

{
  "object": "list",
  "data": [
    {
      "object": "fine-tune",
      "id": "ft-zOzyIq4La0P1UKyXVEduY3oP",
      "hyperparams": {
        "n_epochs": 4,
        "batch_size": 1,
        "prompt_loss_weight": 0.01,
        "learning_rate_multiplier": 0.1
      },
      "organization_id": "org-S2GABm72ZVab4vrzvrnDm53m",
      "model": "curie",
      "training_files": [
        {
          "object": "file",
          "id": "file-1Lo8Uz0m0cr3VjP3W82xqFgx",
          "purpose": "fine-tune",
          "filename": "wikibio_finetuning_data.jsonl",
          "bytes": 123307,
          "created_at": 1702269624,
          "status": "processed",
          "status_details": null
        }
      ],
      "validation_files": [],
      "result_files": [
        {
          "object": "file",
          "id": "file-wFRBS6pzP6Ite6BG4petP33H",
          "purpose": "fine-tune-results",
          "filename": "compiled_results.csv",
          "bytes": 22111,
          "created_at": 1702269789,
          

In [17]:
def parse_bio(biography, finetuned_bio_parser_model):
  # TODO call the API with your fine-tuned model, return a string representing the attributes
  response = openai.Completion.create(
      model=finetuned_bio_parser_model,
      prompt = "Biography: {bio}\n###".format(bio=biography),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n---\n"]
      )
  return response['choices'][0]['text'].strip()

# Replace with your model's name
finetuned_bio_parser_model="davinci:ft-upenn-2023-12-11-14-57-06"
biography = '''Biography: Jill Tracy Jacobs Biden (born June 3, 1951) is an American educator and the current first lady of the United States as the wife of President Joe Biden. She was the second lady of the United States from 2009 to 2017. Since 2009, Biden has been a professor of English at Northern Virginia Community College.

She has a bachelor's degree in English and a doctoral degree in education from the University of Delaware, as well as master's degrees in education and English from West Chester University and Villanova University. She taught English and reading in high schools for thirteen years and instructed adolescents with emotional disabilities at a psychiatric hospital. From 1993 to 2008, Biden was an English and writing instructor at Delaware Technical & Community College. Biden is thought to be the first wife of a vice president or president to hold a paying job during her husband's tenure.

Born in Hammonton, New Jersey, she grew up in Willow Grove, Pennsylvania. She married Joe Biden in 1977, becoming stepmother to Beau and Hunter, his two sons from his first marriage. Biden and her husband also have a daughter together, Ashley Biden, born in 1981. She is the founder of the Biden Breast Health Initiative non-profit organization, co-founder of the Book Buddies program, co-founder of the Biden Foundation, is active in Delaware Boots on the Ground, and with Michelle Obama is co-founder of Joining Forces. She has published a memoir and two children's books.'''
parse_bio(biography, finetuned_bio_parser_model)

'notable_type: politician\nname: Jill Biden\ngender: female\nbirth_date: 03 June 1951\nbirth_place: Hammonton, New Jersey\nalma_mater: University of Delaware\noccupation: educator\noffice: First Lady of the United States\nparty: Democratic\nposition: First Lady\nterm_start: 20 Jan 2009\nterm_end: 20 Jan 2017\nmother: Jeanette Jacobs\nfather: Joseph Jacobs\npartner: Joe Biden\nchildren: Ashley Biden, Hunter Biden, Beau Biden'

## Test your parser

Next we will test your parser.  This will involve calling your `parse_bio` function about 250 times, so be sure that you've got it properly debugged and working before running this code.

In [18]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json

--2023-12-11 15:00:58--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 665457 (650K) [text/plain]
Saving to: ‘SynthBio_test.json’


2023-12-11 15:00:58 (10.1 MB/s) - ‘SynthBio_test.json’ saved [665457/665457]



In [21]:
import json

def load_wiki_bio_test_set(filename='SynthBio_test.json', max_test_items=256, randomized=True):
  """
  Loads our wikibio test set, and returns a list of tuples
  biographies (text), attributes (dictionaires)
  """
  with open(filename) as f:
    synth_bio_data = json.load(f)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = data['attrs']
    attributes['notable_type'] = notable_type
    biography = data['biographies'][0]
    bios.append((biography, attributes))
  return bios[:min(max_test_items, len(bios))]


def convert_to_dict(predcited_attributes_txt):
  """
  Converts predicted attributes from text format into a dictionary.
  """
  predicted_attributes = {}
  for line in predcited_attributes_txt.split('\n'):
    attribute, value = line.split(':')
    predicted_attributes[attribute.strip()] = value.strip()
  return predicted_attributes



Helper function for computing precision, recall and f-score.

In [22]:
from collections import Counter

def update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes):
  # Compute true positives and false negatives
  for attribute in gold_attributes:
    all_attributes[attribute] += 1
    if attribute in predicted_attributes:
      # some attributes have multiple values.
      gold_values = gold_attributes[attribute].split(',')
      for value in gold_values:
        if value.strip() in predicted_attributes[attribute]:
          true_positives[attribute] += 1
        else:
          false_negatives[attribute] += 1
    else:
      false_negatives[attribute] += 1
  # Compute false positives
  for attribute in predicted_attributes:
    if attribute not in gold_attributes:
      all_attributes[attribute] += 1
    if not attribute in gold_values:
      false_positives[attribute] += 1
    else:
      # some attributes have multiple values.
      predicted_values = predicted_attributes[attribute].split(',')
      for value in predicted_values:
        if value.strip() not in gold_values[attribute]:
          false_positives[attribute] += 1



In [23]:

def evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5):
  """
  Computer the precision, recall and f-score for each of the attributes
  that appears more than the treshold count
  """
  true_positives = Counter()
  false_positives = Counter()
  false_negatives = Counter()
  all_attributes = Counter()

  for bio, gold_attributes in wiki_bio_test:
    predicted_attributes = convert_to_dict(parse_bio(bio, finetuned_bio_parser_model))
    update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes)

  average_precision = 0
  average_recall = 0
  total = 0

  for attribute in all_attributes:
    if all_attributes[attribute] < threshold_count:
      continue
    print(attribute.upper())
    try:
      precision = true_positives[attribute] / (true_positives[attribute] + false_positives[attribute])
    except:
      precision = 0.0
    try:
      recall = true_positives[attribute] / (true_positives[attribute] + false_negatives[attribute])
    except:
      recall = 0.0
    print("precision:", precision)
    print("recall:", recall)
    print("f-score:", (precision+recall)/2)
    print('---')
    average_precision += precision
    average_recall += recall
    total += 1

  print("AVERAGE")
  average_precision = average_precision/total
  average_recall = average_recall/total
  print("precision:", average_precision)
  print("recall:", average_recall)
  print("f-score:", (average_precision+average_recall)/2)
  print('---')


If you would like to evaluate on the full test set, there are 237 test items.  You can set `max_test_items=237`.  Doing so will call your `parse_bio` function about 237 times, so be sure that you've got it properly debugged and working before running this code.

In [24]:
testset_filename='SynthBio_test.json'
max_test_items=10
wiki_bio_test = load_wiki_bio_test_set(testset_filename, max_test_items)
evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5)

NAME
precision: 0.4117647058823529
recall: 0.7
f-score: 0.5558823529411765
---
GENDER
precision: 0.5
recall: 1.0
f-score: 0.75
---
NATIONALITY
precision: 0.5
recall: 1.0
f-score: 0.75
---
BIRTH_DATE
precision: 0.47368421052631576
recall: 0.9
f-score: 0.6868421052631579
---
BIRTH_PLACE
precision: 0.6
recall: 0.8823529411764706
f-score: 0.7411764705882353
---
KNOWN_FOR
precision: 0.5
recall: 0.6666666666666666
f-score: 0.5833333333333333
---
ALMA_MATER
precision: 0.5714285714285714
recall: 0.6153846153846154
f-score: 0.5934065934065934
---
AWARDS
precision: 0.45454545454545453
recall: 0.5
f-score: 0.4772727272727273
---
MOTHER
precision: 0.4444444444444444
recall: 0.8
f-score: 0.6222222222222222
---
FATHER
precision: 0.35714285714285715
recall: 0.5555555555555556
f-score: 0.4563492063492064
---
PARTNER
precision: 0.4375
recall: 0.875
f-score: 0.65625
---
CHILDREN
precision: 0.5238095238095238
recall: 0.6470588235294118
f-score: 0.5854341736694678
---
NOTABLE_TYPE
precision: 0.5
recall: 1

How well did your model perform?

In [None]:
# TODO - fill in these values
average_precision = 0.4642715439445316
average_recall = 0.7237080729727788
average_fscore = 0.5939898084586552

# What attributes had the highest F-scorre
best_attributes = {
    "DEATH_PLACE" : 0.8333333333333333,
}

# What attributes had the lowest F-scorre
worst_attributes = {
    "OCCUPATION" : 0.26785714285714285,
}

# What could you do the perform the model's performance?
potential_improvements = """
In the few shot examples, we could include a step-by-step process extracting a
person's occupation from their biography. This is likely because many bios
identify a person with a few different nouns as to the roles they've filled
throughout their lives (i.e. identifying Jill Biden as both an educator and
the First Lady). So we could prep the model by going through each of those roles
manually and being like "is 'educator' an occupation?", "is 'First Lady' an
occupation?", "therefore Jill Biden's occupation is 'educator'". Something like
that.
"""

# Feedback questions

In [None]:
# How many hours did you spend on this assignment? Just an approximation is fine.
num_hours_spent = 6

# What did you think?  This was the first time we tried this assignment
# so you're feedback is valable.
feedback = """
I thought it was interesting. It was a bit confusing trying to get it to work
with Colab though -- I had to spend a lot of trial and error time figuring
out that I couldn't use the latest openai version and then how to reset
my notebook's kernel. I also ran into a weird bug where it wouldn't let me
create an ada model because it said there was already a file with that name?
It was weird.
"""

