<a href="https://colab.research.google.com/github/nemanovich/LLM-essentials/blob/main/Week1_practice_session.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this week's practice session and homework you will:

- Learn to make API calls to some of the popular llms,
- Try and solve some tasks using llm api,
- Explore all the different parameters for llm api inference,
- Explore multimodality in llm api.

# Introduction

Text generative models (commonly refered to as LLMs) like ChatGPT or Claude are capable of doing great things, but you can't actually run them on your servers. So, there are two options:
- Use open source models that are usually somewhat less capable than the top proprietary ones.
- Use the most powerful models by API.

In this part of the course we will concentrate on the second option.

# Prerequisite: working with APIs basics

We'll start by getting acquainted with the `requests` library. It's widely used to call url's, access api's, etc. You can jump to the next section **Tasks you can solve with LLMs** if you're already good with API

### URL requests basic

There are two main types of request you can make to a server:

- `POST` requests ask a web server to accept the data. For example, it can be used to update a database.

- `GET` requets are used to receive information from the server.

In this practice session we will only use `GET` requests, since we only want to call an api and receive a response.

Let's practice using `requests` by calling to the API of `catfact.ninja` web resource that return a random fact about cats.

In [1]:
import requests

r = requests.request("GET", "https://catfact.ninja/fact")

It's always good to check the status code. The right one is `201`, others usually indicate various kinds of errors, see https://en.wikipedia.org/wiki/List_of_HTTP_status_codes for reference.

In [None]:
r.status_code

By accessing the `content` attribute of our request we can see what answer the server returned us

In [None]:
r.content

API's usually return information in some structured formats like JSON, YAML, ProtoBuf, etc. As we can see, in this case the response is JSON encoded. So let's decode it and see what we got!

In [None]:
import json

json.loads(r.text)

If something goes wrong, the `get` function will return an error message with an error code. For example, if we send request to a non-existing address, we'll get:

In [None]:
r = requests.request("GET", "https://catfact.ninja/facta")
r.content, r.status_code

### Working with API

The next two important things we need to learn is:

1. How to authorize ourselves with an API key.

2. How to request specific information from a server and not just a random thing.

Let's try that using [TheDogAPI](https://thedogapi.com/), a simple API created for educational purposes. It is free to try, but requires registration by email. It will provide you with an API key after registration.

When you've registered, create a file `.dog-api-key` in the directory where you're working now and put the API key there

In [None]:
with open('.dog-api-key') as api_file:
    dog_api_key = api_file.read().strip()

TheDogAPI provides several services. For example, we can get a picture of a dog belonging to a specific breed. Let's analyze the code that is doing it:

In [None]:
url = "https://api.thedogapi.com/v1/images/search?format=json"

params = {"breed_ids": [10]}
headers = {
  'Content-Type': 'application/json',
  'x-api-key': dog_api_key
}

response = requests.request(
    "GET", url, headers=headers, params=params
)

You're already familiar with the `requests` call. Now we also pass:

- `headers` containing metadata such as the type of content and the API key.
- `params` containing whatever parameters the API expects from us. In this case we need to provide the id of a breed (`10` means American Bulldog).

If you want to know more about TheDogAPI, feel free to browse the documentation: https://documenter.getpostman.com/view/5578104/2s935hRnak#9e7e4cf9-0e0a-4258-8ace-ed1862843c96

Let's check if everything went ok. As a top-level verification we can check the status code, it should be 200

In [None]:
response.status_code

If the code is 200, we can check the breed name and see a photo of an American Bulldog

In [None]:
from IPython.display import Image

dog_json = json.loads(response.text)
print("Breed Name: ", dog_json[0]['breeds'][0]['name'])
Image(dog_json[0]['url'], height=256)

Congrats, we mastered calling the APIs and ready to get on with the real calls.

# Tasks you can solve with LLMs

In this section we will browse through several text-related generative tasks.

Large Language Models (LLMs) can already solve a vast variety of different tasks, and they continue to improve. This illustration is a bit outdated, but still shows you main evolution paths of LLMs:

<img src="https://github.com/Mooler0410/LLMsPracticalGuide/blob/main/imgs/tree.jpg?raw=true"  width="60%" height="60%">


There are many proprietary API offering for LLMs. Currently two of the most notables ones are OpenAIs and Anthropics, according to for example this benchmark: https://artificialanalysis.ai/.

There're also models offered by Google, but their authentication system is notoriously complicated and we don't want to scare you too much already.



### OpenAI API basics

OpenAI API requires registration and moreover it is commercial (but hopefully quite affordable at least on the scale required at our course). So please don't forget to register, acquire the API key and if you are following in collab put it in your secrets.

With the API key we can just call the API, but OpenAI has a special convenient library:

In [None]:
!pip install openai -q

We need to load the API key into it:

In [3]:
import openai
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_API_KEY')

Now we're ready to harness the power of GPT!

Let's try something simple:

In [4]:
chat_completion = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello world"}]
)
chat_completion

ChatCompletion(id='chatcmpl-BvOQBazN5UNAfRaae3ug2sBJajFJF', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1753018271, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

Let's analyze the API call and the output.

We need to input:

1) **Model name**, you can choose from https://platform.openai.com/docs/models/, just mind the pricing.

2) **Conversation history**. Unlike the web version of ChatGPT which memorizes your conversations, with API you need to provide the background story. We'll talk a more about it a bit later.

The output contains a `message` of the *assistant* (that's how an OpenAI module presents itself) and indicates that the generation ended naturally because the message was complete (`"finish_reason"` is `"stop"`).

The actual responce of the model can be obtained as:

In [None]:
model_answer = chat_completion.choices[0].message.content
model_answer

Let's write a single shortcut function that takes a prompts and returns a completion suggested by the model:

In [None]:
# Write a function which for a given text returns ChatGPT response
def get_chatgpt_answer(message: str) -> str:
    pass

In [None]:
def get_chatgpt_answer(message: str) -> str:
    chat_completion = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": message}]
    )
    return chat_completion.choices[0].message.content

In [None]:
get_chatgpt_answer("Hello World!")

The API also provides statistics of token `usage`. It can be important for you because OpenAI bills you based on the number of tokens its models process for you.

Note that tokens are not the same as words, they are subword units of sort. You will deal with them in more detail in the homework.

You can control the length of the model answer with the `max_tokens` parameter. For example:

In [None]:
chat_completion = openai.chat.completions.create(
    model="gpt-4o-mini",
    max_tokens=5,
    messages=[
        {
            "role": "user",
            "content": "Please generate a long sentence"
        }
    ]
)
chat_completion

Please note that `"finish_reason"` is now `"length"` which means that `max_tokens` was reached before the generation stopped naturally.

You can't ask the model to generate a text of arbitrary length. Each model has a restriction on a total number of tokens in **prompt + completion**. For example, it's 4,096 tokens for `gpt-3.5-turbo`. You can find the restrictions for each model at the [OpenAI model reference page](https://platform.openai.com/docs/models/gpt-3-5).

### Roles and communication history

With `openai` library you can pass more than a prompt to the API. The `messages` parameter takes a list of messages with several possible roles, among them:

- `"user"`, that's you.
- `"assistant"`, a model's cue.
- `"system"` used to pass our wishes regarding the assistant's tone of voice, restrictions etc.

Let's look at a toy example. We can use `"system"` input to make a model only answer in rhymes:

In [None]:
MODEL = "gpt-4o-mini"
response = openai.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant" \
                "and you only answer in rhymed sentences."
        },
        {
            "role": "user",
            "content": "I need to write some code but "
                "I'd prefer to go for a stroll."},
    ]
)

print(response.choices[0].message.content)

Another useful way to utilize `system` is to ask for answers in a structured format, for example json.

In [None]:
MODEL = "gpt-4o-mini"
response = openai.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant" \
                "and you only answer in json format."
        },
        {
            "role": "user",
            "content": "I need to write some code but "
                "I'd prefer to go for a stroll."},
    ]
)

response.choices[0].message.content

Another way of changing the tone of voice of the model is showing some actual examples. Let's make our AI optimist provide short slogan-ish answers:

In [None]:
MODEL = "gpt-4o-mini"
response = openai.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content":
         "You are a helpful assistant and you are sure that Generative AI can solve any problem."},
        {"role": "user", "content":
         "HR specialists spend too much time writing letters to the candidates."},
        {"role": "assistant", "content":
         "We should create a Generative AI-powered mailing bot to help them!"},
        {"role": "user", "content":
         "I want to submit a paper to NeurIPS, but I don't have time to write it."},
        {"role": "assistant", "content":
         "Just generate something with ChatGPT and become a NeurIPS star!"},
        {"role": "user", "content":
         "My child should draw a picture for tomorrow's art lesson, but doesn't have inspiration for this."},
        {"role": "assistant", "content":
         "Don't worry, you can generate something with a diffusion model!"},
        {"role": "user", "content":
         "I need to write some code by I'd prefer to go for a stroll."},
    ]
)

response.choices[0].message.content

So, we've taught our assistant to exhibit certain behavior without actual training just by showing some examples in a prompt. This is an example of **few-shot learning**.

**Note 1**. Please don't mistake the recommendations of this assistant as actual advice. Applying AI to real-world problems should come in an ethical and safe way. For example, we need to make sure that an HR-mailing bot produces no offensive, biased or incoherent answers before it comes to production, and this can be very tricky.

**Note 2**. Actually we can do the same thing in one `"user"` message, like:

```
You are a helpful assistant and you are sure that Generative AI can solve any problem.

Q: HR specialists spend too much time writing letters to the candidates.
A: We should create a Generative AI-powered mailing bot to help them!
Q: I want to submit a paper to NeurIPS, but I don't have time to write it.
A: Just generate something with ChatGPT and become a NeurIPS star!
Q: My child should draw a picture for tomorrow's art lesson, but doesn't have inspiration for this.
A: Don't worry, you can generate something with a diffusion model!
Q: I need to write some code by I'd prefer to go for a stroll.
A:
```

Moreover, this can actually work better, especially in some older models.

We'll discuss few-shot learning more next week.

## Anthropic API Basics

In [None]:
!pip install anthropic -q

As you can see in the following code, the request to Anthropic API is quite similar to OpenAI's with a bit of change.

In [None]:
from anthropic import Anthropic

client = Anthropic(
    api_key=userdata.get("anthropic_key")
)

message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Hello, Claude",
        }
    ],
    model="claude-3-5-sonnet-20240620",
)
print(message.content)

Let's write a similar function for `get_anthropic_answer`

In [None]:
client = Anthropic(
    api_key=userdata.get("anthropic_key")
)

def get_anthropic_answer(message: str) -> str:
    answer = client.messages.create(
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": message,
            }
        ],
        model="claude-3-sonnet-20240229",
    )
    return answer.content[0].text

In [None]:
get_anthropic_answer("Hello World!")

# Summarization


For this task we will use openai api, so make sure that you have your key in `.open-ai-api-key` file

In the practice folder you can find a file `wikipedia_article.txt` with contents of Wikipedia article about paws.

You can take a look at it yourself, but if you don't have much time, you can ask an LLM to summarise it for us. To do this, we feed to the API the text of the article together with a specific prompt indicating which task we want the model to solve.

In [None]:
def summarise_with_gpt(text: str):
    return get_chatgpt_answer(
        f"Write a short summary of the following text.\n{text}"
    )

def summarise_with_claude(text: str):
    return get_anthropic_answer(
        f"Write a short summary of the following text.\n{text}"
    )

Let's compare the two responses.

In [None]:
article = open("wikipedia_article.txt").read()

summarise_with_gpt(article)

In [None]:
article = open("wikipedia_article.txt").read()

summarise_with_claude(article)

You can check that this is really a coherent summary.

If you modify the prompt, you can add a specific flavor to the summary. For example, you can ask the model to do it in simple English avoiding scientific terminology (try it!).

You can also control the length of the summary using prompt like: `"Summarize the following text in 2-3 sentences."` Typically, the more precise your prompt is, the more stable the results you get.

As you can see, the differences between the two models aren't that big. We will focus on using OpenAI's models in the future. But you can choose either for any of the future tasks.

Just be sure not to spend too much on the API calls :)

# Translation

Another task you can solve with ChatGPT is translation. Even though the quality of such translations is not the best, it's still fascinating that one model can do so many things.

In [None]:
# Write a function to translate to another language with the following signature
# You can take summarise_with_gpt as an example

def translate_with_gpt(text: str, target_language: str) -> str:
    pass

In [None]:
def translate_with_gpt(text: str, target_language: str) -> str:
    return get_chatgpt_answer(
        f"Translate the following text to {target_language}:\n"\
        f"{text}"
    )

Now let's try it in action

In [None]:
translate_with_gpt(
    "I am a language model, nice to meet you!",
    target_language="Spanish"
)

In this simple example we can already see that it gives us a different translation, compared to a dedicated translation engine. Try pasting it to Google Translate and see the difference

Now let's try to implement a more complex pipeline with ChatGPT API. In the practice directory there's a file with an article about paws, but in japanese.

Let's try to translate it to English using chatgpt API and then summarise and create a title for it.

In [None]:
def create_title_with_chat_gpt(text: str) -> str:
    return get_chatgpt_answer(
        f"Create a title for this text:\n{text}"
    )

def translate_and_summarise_with_chat_gpt(text):
    print("Making a translate request")
    translated_text = translate_with_gpt(
        text,
        target_language="English"
    )
    print('Making summarisation request')
    summarized_text = summarise_with_gpt(translated_text)
    print("making title request")
    title = create_title_with_chat_gpt(translated_text)
    return {
        "title": title,
        "original": text,
        "translated_text": translated_text,
        "summary": summarized_text
    }

In [None]:
from IPython.display import display
result = translate_and_summarise_with_chat_gpt(
    open('wikipedia_article_japanese.txt').read()
)
display(f"Title:\n{result['title']}")
display(result['summary'])

We hope that you're as curious as we are and you also wonder, what would happen if we do the same thing in a different order: summarise first, then translate. Let's try!

In [None]:
def summarise_and_translate_with_chat_gpt(text):
    print('Making summarisation request')
    summarized_text = summarise_with_gpt(text)
    print("making title request")
    title = create_title_with_chat_gpt(text)
    print("Making a translate request")
    translated_text = translate_with_gpt(
        summarized_text,
        target_language="English"
    )
    translated_title = translate_with_gpt(
        title,
        target_language="English"
    )
    return {
        "title": translated_title,
        "original": text,
        'summarized_text': summarized_text,
        "translated_text": translated_text,
    }

In [None]:
result = summarise_and_translate_with_chat_gpt(
    open('wikipedia_article_japanese.txt').read()
)
display(f"Title:\n{result['title']}")
display(result['summarized_text'])
display(result['translated_text'])

As we can see, the result depends on the order of summarization and translation. Moreover, if we translate after summarization, the output is somewhat less coherent.

On top of that, because we wrote the prompt in English, the summarization result is already in English.

**Note**. Even if we don't change our pipeline and use the same request several times, ChatGPT's response might differ. This is due to the fact that we cannot directly controll the random state of the model with this API. You can make generation more reproducible using the parameter `temperature` of `openai.ChatCompletion.create`. It can take values between 0 and 2 with values closer to 0 making outputs more deterministic those above 0.8 making outputs more random and creative.

If you run LLM locally (we'll do this in the second part of the course), you can make the responses deterministic by fixing the random states.

# Generation parameters

If you dive into the [API reference](https://platform.openai.com/docs/api-reference/chat) of for example OpenAI's text completion models, you can see a lot of different parameters, which you can set during the generation.
Here's the ones we think are the most useful day-to-day:
- `max_tokens` - the amount of tokens to generate. Can be useful if you want to generate an especially long completion, or if you want to only generate one work like "True" or "False",
- `n` - the amount of generations you want to receive. Can be useful for checking consistency of the answer,
- `response_format` - allows you to request json completion from the model,
- `temperature` - a value between 0 and 2, where higher values increase the randomness of the answer. Can be useful for example when the answer requires creativity, or vice versa you want it to be very consistent,
- `top_p` - an alternative sampling technique, the value you sent is the probability mass of tokens to consider. For example setting it to 0.1 will make it only use the first most probable tokens which probability sums up to 0.1.

Let's try some of those parameters in action!

## Temperature

The bigger the temperature, the more random is the output.

In [None]:
MODEL = "gpt-4o-mini"
for _ in range(3):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "I need to write some code but "
                    "I'd prefer to go for a stroll."},
        ],
        temperature=0
    )

    print(response.choices[0].message.content)
    print("-" * 100)

In [None]:
MODEL = "gpt-4o-mini"
for _ in range(3):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "I need to write some code but "
                    "I'd prefer to go for a stroll."},
        ],
        temperature=1
    )

    print(response.choices[0].message.content)
    print("-" * 100)

## Top_p

In [None]:
MODEL = "gpt-4o-mini"
for _ in range(3):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "I need to write some code but "
                    "I'd prefer to go for a stroll."},
        ],
        top_p=0.1
    )

    print(response.choices[0].message.content)
    print("-" * 100)

In [None]:
MODEL = "gpt-4o-mini"
for _ in range(3):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "I need to write some code but "
                    "I'd prefer to go for a stroll."},
        ],
        top_p=0.5
    )

    print(response.choices[0].message.content)
    print("-" * 100)

## Structured outputs

Modern LLMs support outputting in a specific format, for example we can use "JSON mode" to force outputs to be in JSON format.

In [None]:
import os
from google.colab import userdata

# os.environ['OPENAI_API_KEY'] = open(".open-ai-api-key")
os.environ['OPENAI_API_KEY'] = userdata.get("open_ai_api_key")

from openai import OpenAI

client = OpenAI()

non_json_output = client.chat.completions.create(
    messages=[{
        'role': 'user',
        'content': 'Design a role play character\'s name, class and a short description'
    }],
    model="gpt-4o-mini",
).choices[0].message.content
print(non_json_output)
print("-" * 100)

json_output = client.chat.completions.create(
    messages=[{
        'role': 'user',
        'content': 'Design a role play character\'s name, class and a short description in json format'
    }],
    model="gpt-4o-mini",
    response_format={"type": "json_object"}
).choices[0].message.content
print(json_output)

This is useful, because that'll make it much easier for you later to parse the outputs:

In [None]:
import json
json.loads(json_output)

We can go another step further and actually define a `pydantic` model for our outputs:

In [None]:
from typing import List
from pydantic import BaseModel

class CharacterProfile(BaseModel):
    name: str
    age: int
    special_skills: List[str]
    traits: List[str]
    character_class: str
    origin: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "Design a role play character"}
    ],
    response_format=CharacterProfile,
)

completion.choices[0].message.parsed

So no we have predefined format of outputs, which is easy to work with.

# Prompt caching

There's a capability, which is present in Antropic's API only as a beta, but is fully implemented for Gemini.

Prompt caching allows you to reuse you prompt's prefix if it was already present in previous request. This potentially saves a lof of computations.

Read more [here](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#how-prompt-caching-works), there's quite a lot of limitations right now.

In [None]:
from anthropic import Anthropic

# here we are inflating the article size so that it
# hits 1024 token limit to start caching
article = open("wikipedia_article.txt").read() * 6

client = Anthropic(
    api_key=userdata.get("anthropic_key"),
)

messages = [
    {
        "role": "user",
        "content": [{
            "type": "text",
            "text": f"Here's an article {article}, summarise it",
            "cache_control": {"type": "ephemeral"}
        }]
    }
]


response = client.beta.prompt_caching.messages.create(
    max_tokens=1024,
    messages=messages,
    model="claude-3-5-sonnet-20240620"
)
response

PromptCachingBetaMessage(id='msg_012FGL9C4RwSfZGvFTY5tRi2', content=[TextBlock(text="This article provides an overview of animal paws, focusing on their common characteristics and the animals that possess them. Here's a summary:\n\nPaw Characteristics:\n1. Thin, pigmented, keratinized, hairless epidermis\n2. Subcutaneous collagenous and adipose tissue forming pads\n3. Heart-shaped metacarpal/palmar (forelimb) or metatarsal/plantar (rear limb) pad\n4. Usually four load-bearing digital pads, sometimes five or six toes\n5. Carpal pad on forelimb for extra traction\n6. Horn-like, beak-shaped claws on each digit\n7. Generally hairless, but some animals have fur on paw soles (e.g., red panda)\n\nAnimals with Paws:\n1. Felids (cats, tigers)\n2. Canids (dogs, foxes)\n3. Lagomorphs (rabbits)\n4. Bears and raccoons\n5. Mustelids (weasels)\n6. Rodents\n\nThe article notes that paws act as cushions for load-bearing limbs and that some animals, like rabbits, have sharp nails but no pads underneath 

And now let's try to alter the end of the prompt

In [None]:
messages.append(
    {
        "role": "assistant",
        "content": response.content[0].text
    }
)
messages.append(
    {
        "role": "user",
        "content": f"Translate it to Japanese"
    }
)

client.beta.prompt_caching.messages.create(
    max_tokens=1024,
    messages=messages,
    model="claude-3-5-sonnet-20240620"
)

PromptCachingBetaMessage(id='msg_01A6fN4gUhkDu7YeTwYe1ZGK', content=[TextBlock(text="Here's a Japanese translation of the summary:\n\n動物の肉球に関する概要：\n\n肉球の特徴：\n1. 薄く、色素があり、角化した、無毛の表皮\n2. 皮下の膠原組織と脂肪組織からなるパッド\n3. ハート形の中手/手掌（前肢）または中足/足底（後肢）パッド\n4. 通常4つの荷重用指パッド、時に5本または6本の指\n5. 前肢にある手根パッドで追加の牽引力を得る\n6. 各指に角状の嘴のような爪\n7. 通常は無毛だが、一部の動物（例：レッサーパンダ）では肉球に毛がある\n\n肉球を持つ動物：\n1. ネコ科（猫、虎）\n2. イヌ科（犬、キツネ）\n3. ウサギ目（ウサギ）\n4. クマとアライグマ\n5. イタチ科（イタチ）\n6. げっ歯類\n\nこの記事では、肉球が荷重肢のクッションとして機能することや、ウサギのように鋭い爪を持つが肉球の下にパッドがない動物もいることが指摘されています。", type='text')], model='claude-3-5-sonnet-20240620', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=PromptCachingBetaUsage(cache_creation_input_tokens=0, cache_read_input_tokens=1944, input_tokens=299, output_tokens=382))

# Multimodality

Modern LLMs also increasingly incorporate other modalities, usually Images. We can see that represented in Anthropic's and OpenAI's APIs. Let's try to feed out LLM evolution image to Claude and see what it things about it.

In [None]:
import base64
import requests

img_data = requests.get("https://github.com/Mooler0410/LLMsPracticalGuide/blob/main/imgs/tree.jpg?raw=true").content

base_64_encoded_data = base64.b64encode(img_data)
base64_string = base_64_encoded_data.decode('utf-8')

client = Anthropic(api_key=userdata.get("anthropic_key"))
MODEL_NAME = "claude-3-5-sonnet-20240620"


message_list = [
    {
        "role": 'user',
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/jpeg", "data": base64_string}
            },
            {"type": "text", "text": "What are the most important models in LLM evolution tree?"}
        ]
    }
]

response = client.messages.create(
    model=MODEL_NAME,
    max_tokens=2048,
    messages=message_list
)
print(response.content[0].text)

Based on the evolutionary tree shown in the image, some of the most important models in the development of Large Language Models (LLMs) appear to be:

1. GPT series: GPT-1, GPT-2, GPT-3, GPT-4, and their variations are prominently featured, showing their significant impact on the field.

2. BERT: This model appears as one of the earlier branches, indicating its importance in the evolution of LLMs.

3. T5: Shown as a major branch in the middle years of the evolution.

4. LLaMA: Featured prominently in the most recent years, with multiple variations like LLaMA-2.

5. PaLM: Another significant model in the recent developments.

6. Claude: Shown as one of the latest developments in the tree.

7. BLOOM: A large-scale model featured in the recent part of the evolution.

8. Chinchilla: Positioned as an important recent development.

9. ELECTRA and RoBERTa: Earlier models that contributed to the evolution.

10. Galactica: A more recent model shown in the upper branches.

These models represent

# Let's summarise

**We learned:**

* How to use `requests` library and call different type of API's

* How to call GenAI api's and solve different task with them

* Some of the caveats of using specific api's and how to tackle them

\

To follow up on the topics we touched on in this seminar, welcome to this week's homework. We'll practice the following skills:

* Summarizing long texts

* Extracting information using LLMs

* Understanding LLM's shortcomings