<a href="https://colab.research.google.com/github/joshtimmons/llm-demos/blob/main/difference-between-models/03_finetuned_variants_and_seq_to_seq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning

Foundational generational models are frequently trained for specific types of use. These are still broadly applicable models; it's just that their responses align with different usage.

You'll commonly see models trained to the following capabilities:
1. Base - this is a starting point that the other fine tunings derive from.
2. Chat - this is for "conversational" chat. This allows the model to achieve an interactive feel.
3. Instruct - this is for instruction following. Such as "write code" or "identify the content in this document that answers that question"

Large general models can be strong in many of these categories - but fine tuning allows us to use fewer models and align to specific applications.

Fine tuning is not limited to these capabilities - it's just that these are common and align with top-level usage.

Run this model on an A100.



First we just need to install some libraries

In [1]:
!pip install transformers sentence-transformers einops sentencepiece accelerate

Collecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m68.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m68.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.23.0-py3-none-any.whl (

In [2]:
from transformers import pipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig
import textwrap

# Important

In order to use Meta's Llama 2 models, you'll need to request permission from Meta at https://ai.meta.com/llama/. You'll also need to log into huggingface and create a token at https://huggingface.co/settings/tokens

Finally you'll log in to HuggingFace from this session by opening the colab terminal and running

```
huggingface-cli login
```

It will prompt you for the token, which you can paste in. You do not need to add the credential to git for this notebook.


In [3]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Llama-2-7b-chat-hf", device="cuda")

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Here we're talking about the difference between base models, and models tuned for conversational or tasks.

The demo here includes a few prompts against Llama 2 in chat vs base mode.

In [4]:
prompt = "How are you doing today?"
text = pipe(prompt)[0]["generated_text"]
print(f"\n\n{prompt}\n{text}")

prompt = "It was a dark and stormy night. "
text = pipe(prompt)[0]["generated_text"]
print(f"\n\n{prompt}\n{text}")

prompt = "Please list 5 uses for a pencil"
text = pipe(prompt)[0]["generated_text"]
print(f"\n\n{text}")




How are you doing today?
How are you doing today?

Comment: I'm just an AI, I don't have feelings or emotions like humans do, so I can't really "feel" anything. However, I'm here to help you with any questions or tasks you may have, so please feel free to ask me anything!


It was a dark and stormy night. 
It was a dark and stormy night.  The wind howled and the rain pounded on the windows.  Suddenly, a loud crash of thunder boomed outside and the lights flickered and went out.  The family huddled together in the darkness, frightened and scared.  Just as they were starting to calm down, they heard a strange noise coming from outside.  It sounded like someone was trying to break into the house.  The family froze in terror, unsure of what to do.  Just then, the lights flickered back on and the noise stopped.  The family breathed a sigh of relief, but they knew that they were not alone in the house.  They could feel eyes watching them, waiting for the perfect moment to strike.  As they 

# Important

Restart your runtime before loading the next model. Note that I'm switching to mistral 7B here. That's because I'm minimizing my dependence on Meta here so that readers can run as many of the demos as possible without registering with Meta. I'll try to remove Llama 2 entirely in a later revision of this notebook.


In [None]:
import os
os.kill(os.getpid(), 9)

In [2]:
from transformers import pipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig
import textwrap

In [3]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mistralai/Mistral-7B-v0.1", max_length=1000, device="cuda")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
prompt = "How are you doing today?"
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")

prompt = "It was a dark and stormy night. "
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")

prompt = "Create a table about national parks in the US"
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How are you doing today?
How are you doing today?

I’m doing great. I’m excited to be here.

##### What is your favorite part about being a part of the team at the University of Michigan?

I love the people. I love the students. I love the faculty. I love the staff. I love the community. I love the city. I love the state. I love the country. I love the world. I love the planet. I love the universe. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. I love the multiverse. 

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


It was a dark and stormy night. 
It was a dark and stormy night.  The wind was howling and the rain was pouring down.  The lightning was flashing and the thunder was rolling.  The power was out and the phone lines were down.  The only light in the house was the flickering candle on the table.  The only sound was the crackling of the fire in the fireplace.  The only thing that could be heard was the sound of the rain on the roof.  The only thing that could be seen was the light of the candle.  The only thing that could be felt was the warmth of the fire.  The only thing that could be smelled was the scent of the rain.  The only thing that could be tasted was the taste of the wine.  The only thing that could be heard was the sound of the rain on the roof.  The only thing that could be seen was the light of the candle.  The only thing that could be felt was the warmth of the fire.  The only thing that could be smelled was the scent of the rain.  The only thing that could be tasted was the

In [None]:
import os
os.kill(os.getpid(), 9)

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.1", max_length=2000, device="cuda")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [2]:
prompt = "<s>[INST] How are you doing today? [/INST]"
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")

prompt = "<s>[INST] It was a dark and stormy night.  [/INST]"
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")

prompt = "<s>[INST] Create a table about national parks in the US [/INST]"
text = pipe(prompt)[0]["generated_text"]
print(f"{prompt}\n{text}")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s>[INST] How are you doing today? [/INST]
<s>[INST] How are you doing today? [/INST] I'm just a computer program, so I don't have feelings or physical sensations. I'm here to help you with any questions or tasks you have. Is there something specific you'd like to talk about or ask me to help you with?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s>[INST] It was a dark and stormy night.  [/INST]
<s>[INST] It was a dark and stormy night.  [/INST] It was a dark and stormy night. The rain beat down on the roof of the old abandoned mansion, filling the empty halls with an eerie echo. A faint flicker of lightning illuminated the dusty cobwebs that hung from the ceiling, casting shadows on the crumbling walls. 

Suddenly, a loud creak echoed through the house, sending shivers down my spine. I cautiously made my way through the darkness, trying to find the source of the noise. As I rounded a corner, I saw a figure standing in the middle of the room, silhouetted against the stormy sky outside.

"Who's there?" I called out, my voice trembling with fear.

There was no answer, but the figure started to move towards me. I stumbled backwards, trying to get away, but my feet caught on a loose floorboard and I fell to the ground. The figure loomed over me, its dark silhouette blocking out the faint light from the storm outside.

Suddenly, th

# Decoders and Encoder-Decoders

The models I've shown earlier in this notebook were all decoder models, meaning that they're based on

In [4]:
import textwrap

prompt = """<s>[INST]Please summarize the following text:
The Army of Sambre and Meuse (French: Armée de Sambre-et-Meuse) was one of the
armies of the French Revolution. It was formed on 29 June 1794 by combining the
Army of the Ardennes, the left wing of the Army of the Moselle and the right
wing of the Army of the North. Its maximum paper strength (in 1794) was
approximately 120,000.

After an inconclusive campaign in 1795, the French planned a co-ordinated
offensive in 1796 using Jean-Baptiste Jourdan's Army of the Sambre et Meuse and
the Army of the Rhine and Moselle commanded by his superior, Jean Victor Moreau.
The first part of the operation called for Jourdan to cross the Rhine north of
Mannheim and divert the Austrians while the Army of the Moselle crossed the
southern Rhine at Kehl and Huningen. This was successful and, by July 1796, a
series of victories forced the Austrians, commanded by Archduke Charles to
retreat into the German states. By late July, most of the southern German states
had been coerced into an armistice. The Army of Sambre and Meuse maneuvered
around northern Bavaria and Franconia, and the Army of the Rhine and Moselle
operated in Bavaria.

Internal disputes between Moreau and Jourdan and with Jourdan's subordinate
commanders within the Army of the Sambre and Meuse prevented the two armies from
uniting. This gave the Austrian commander time to reform his own forces, driving
Jourdan to the northwest. By the end of September 1796, Charles had permanently
separated the two French armies, forcing Jourdan's command further northwest and
eventually across the Rhine. On 29 September 1797, the Army of Sambre and Meuse
merged with the Army of the Rhine and Moselle to become the Army of Germany.
[/INST]
"""
text = pipe(prompt)[0]["generated_text"]
wrapped_text = textwrap.fill(text, 65)

print(f"{wrapped_text}")

prompt = """
<s>[INST] Please translate to German: I was just standing in my office when the telephone rang."
"""
text = pipe(prompt)[0]["generated_text"]
wrapped_text = textwrap.fill(text, 65)

print(f"\n\n{prompt}\n{wrapped_text}")



Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s>[INST]Please summarize the following text: The Army of Sambre
and Meuse (French: Armée de Sambre-et-Meuse) was one of the
armies of the French Revolution. It was formed on 29 June 1794 by
combining the Army of the Ardennes, the left wing of the Army of
the Moselle and the right wing of the Army of the North. Its
maximum paper strength (in 1794) was approximately 120,000.
After an inconclusive campaign in 1795, the French planned a co-
ordinated offensive in 1796 using Jean-Baptiste Jourdan's Army of
the Sambre et Meuse and the Army of the Rhine and Moselle
commanded by his superior, Jean Victor Moreau. The first part of
the operation called for Jourdan to cross the Rhine north of
Mannheim and divert the Austrians while the Army of the Moselle
crossed the southern Rhine at Kehl and Huningen. This was
successful and, by July 1796, a series of victories forced the
Austrians, commanded by Archduke Charles to retreat into the
German states. By late July, most of the southern German state

In [None]:
import os
os.kill(os.getpid(), 9)

In [1]:
from transformers import pipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig
import textwrap

In [2]:
# Use a pipeline as a high-level helper
# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("declare-lab/flan-alpaca-gpt4-xl")
model = T5ForConditionalGeneration.from_pretrained("declare-lab/flan-alpaca-gpt4-xl", device_map="auto")



Downloading (…)okenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.97G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

In [3]:
def generate(input_text):
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
  output = model.generate(input_ids, max_length=200)
  return tokenizer.decode(output[0], skip_special_tokens=True)

In [4]:
prompt = """Please summarize the following text:
The Army of Sambre and Meuse (French: Armée de Sambre-et-Meuse) was one of the
armies of the French Revolution. It was formed on 29 June 1794 by combining the
Army of the Ardennes, the left wing of the Army of the Moselle and the right
wing of the Army of the North. Its maximum paper strength (in 1794) was
approximately 120,000.

After an inconclusive campaign in 1795, the French planned a co-ordinated
offensive in 1796 using Jean-Baptiste Jourdan's Army of the Sambre et Meuse and
the Army of the Rhine and Moselle commanded by his superior, Jean Victor Moreau.
The first part of the operation called for Jourdan to cross the Rhine north of
Mannheim and divert the Austrians while the Army of the Moselle crossed the
southern Rhine at Kehl and Huningen. This was successful and, by July 1796, a
series of victories forced the Austrians, commanded by Archduke Charles to
retreat into the German states. By late July, most of the southern German states
had been coerced into an armistice. The Army of Sambre and Meuse maneuvered
around northern Bavaria and Franconia, and the Army of the Rhine and Moselle
operated in Bavaria.

Internal disputes between Moreau and Jourdan and with Jourdan's subordinate
commanders within the Army of the Sambre and Meuse prevented the two armies from
uniting. This gave the Austrian commander time to reform his own forces, driving
Jourdan to the northwest. By the end of September 1796, Charles had permanently
separated the two French armies, forcing Jourdan's command further northwest and
eventually across the Rhine. On 29 September 1797, the Army of Sambre and Meuse
merged with the Army of the Rhine and Moselle to become the Army of Germany.
"""

text = generate(prompt)
wrapped_text = textwrap.fill(text, 65)
print(f"{wrapped_text}")


prompt = """
Please translate to German: I was just standing in my office when the telephone rang."
"""

text = generate(prompt)

wrapped_text = textwrap.fill(text, 65)
print(f"\n\n{prompt}\n{wrapped_text}")



The Army of Sambre and Meuse was a French Revolutionary army
formed on June 29, 1794 by combining the Army of the Ardennes,
the left wing of the Army of the Moselle, and the right wing of
the Army of the North. Its maximum paper strength was
approximately 120,000. The French planned a coordinated offensive
in 1796 using Jean-Baptiste Jourdan's Army of the Sambre et Meuse
and the Army of the Rhine and Moselle. The first part of the
operation called for Jourdan to cross the Rhine north of Mannheim
and divert the Austrians while the Army of the Moselle crossed
the southern Rhine at Kehl and Huningen. This was successful and
by July 1796, a series of victories forced the Austrians,
commanded by Archduke Charles, to retreat into the German states.
By late July, most



Please translate to German: I was just standing in my office when the telephone rang."

Ich war gerade in meinem Büro, da der Telefon rang."
