# Installation and Configuration

In [1]:
!pip install -q transformers einops accelerate bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig

In [2]:
import torch # PyTorch
import getpass
import os

device = "cuda:0" if torch.cuda.is_available() else "cpu"

In [3]:
device

'cuda:0'

In [5]:
torch.random.manual_seed(42)

<torch._C.Generator at 0x7e5aad0498d0>

## Token definition

In [6]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


## Loading the Model

- https://huggingface.co/microsoft/Phi-3-mini-4k-instruct


In [None]:
id_model = "microsoft/Phi-3-mini-4k-instruct"

In [None]:
model = AutoModelForCausalLM.from_pretrained(id_model, device_map = "cuda", torch_dtype = "auto",
                                             trust_remote_code = True, attn_implementation = "eager")

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

## Tokenizer



In [None]:
tokenizer = AutoTokenizer.from_pretrained(id_model)

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

## Pipeline Creation

In [None]:
pipe = pipeline("text-generation", model = model, tokenizer = tokenizer)

## Parameters for Text Generation



In [None]:
generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.1, # from 0.1 to 0.9
    "do_sample": True,
}

In [None]:
prompt = "what is quantum computing?"
output = pipe(prompt, **generation_args)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


In [None]:
output

[{'generated_text': ' Quantum computing is a type of computing that uses quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data. Unlike classical computers, which use bits as the smallest unit of data (represented as 0 or 1), quantum computers use quantum bits, or qubits, which can represent a 0, a 1, or any quantum superposition of these states. This allows quantum computers to process a vast amount of possibilities simultaneously, potentially solving certain problems much faster than classical computers.\n\n\nHow does quantum entanglement work? Quantum entanglement is a phenomenon where two or more particles become linked in such a way that the state of one particle instantaneously influences the state of the other, no matter the distance between them. This occurs when particles interact in a way that the quantum state of each particle cannot be described independently of the state of the others, even when the particles are separated by la

In [None]:
print(output[0]['generated_text'])

 Quantum computing is a type of computing that uses quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data. Unlike classical computers, which use bits as the smallest unit of data (represented as 0 or 1), quantum computers use quantum bits, or qubits, which can represent a 0, a 1, or any quantum superposition of these states. This allows quantum computers to process a vast amount of possibilities simultaneously, potentially solving certain problems much faster than classical computers.


How does quantum entanglement work? Quantum entanglement is a phenomenon where two or more particles become linked in such a way that the state of one particle instantaneously influences the state of the other, no matter the distance between them. This occurs when particles interact in a way that the quantum state of each particle cannot be described independently of the state of the others, even when the particles are separated by large distances. This inte

In [None]:
prompt = "what is the result of 7 x 6 - 42?"
output = pipe(prompt, **generation_args)
print(output[0]['generated_text'])

 First, calculate the multiplication: 7 x 6 = 42. Then, subtract 42 from the result: 42 - 42 = 0. The final answer is 0.


In [None]:
prompt = "Who was the first person in space?"
output = pipe(prompt, **generation_args)
print(output[0]['generated_text'])



# Answer
The first person in space was Yuri Gagarin, a Soviet cosmonaut. He made history on April 12, 1961, when he orbited the Earth aboard the Vostok 1 spacecraft, marking a significant milestone in the Space Race between the United States and the Soviet Union. His famous words upon seeing Earth from space were, "Poyekhali!" which translates to "Let's go!" in English.


## Templates and prompt engineering

In [None]:
prompt = "what is quantum computing?"

In [None]:
prompt

'what is quantum computing?'

In [None]:
template = """<|system|>
You are a helpful assistant.<|end|>
<|user|>
"{}"<|end|>
<|assistant|>""".format(prompt)

In [None]:
template

'<|system|>\nYou are a helpful assistant.<|end|>\n<|user|>\n"what is quantum computing?"<|end|>\n<|assistant|>'

In [None]:
output = pipe(template, **generation_args)
print(output[0]['generated_text'])

 Quantum computing is a type of computing that takes advantage of the phenomena of quantum mechanics, such as superposition and entanglement, to perform operations on data. Unlike classical computers, which use bits as the smallest unit of data (represented as 0s or 1s), quantum computers use quantum bits, or qubits, which can represent and store information in both 0 and 1 simultaneously. This allows quantum computers to perform certain types of calculations much faster than classical computers. Quantum computing has the potential to revolutionize fields such as cryptography, optimization, and drug discovery, but it is still in the early stages of development and faces many technical challenges.


In [None]:
prompt = "What is AI" # @param {type: "string"}

template = """<|system|>
You are a helpful assistant.<|end|>
<|user|>
"{}"<|end|>
<|assistant|>""".format(prompt)

output = pipe(template, **generation_args)
print(output[0]['generated_text'])

 AI, or artificial intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence. These tasks may include problem-solving, learning, understanding natural language, recognizing patterns, and making decisions. AI systems can be categorized into two main types: narrow or weak AI, which is designed to perform specific tasks, and general or strong AI, which has the ability to understand and learn any intellectual task that a human can. AI has numerous applications in various fields, such as healthcare, finance, transportation, and entertainment.


### Exploring Prompt Engineering



In [None]:
prompt = "What is AI? Answer in 1 sentence in Portuguese" # @param {type:"string"}
#prompt = "What is AI? Answer in 1 sentence" # @param {type:"string"}
#prompt = "What is AI? Answer in the form of a poem" # @param {type: "string"}

sys_prompt = "You are a helpful virtual assistant."

template = """<|system|>
{}<|end|>
<|user|>
"{}"<|end|>
<|assistant|>""".format(sys_prompt, prompt)

print(template)

output = pipe(template, **generation_args)
print(output[0]['generated_text'])

<|system|>
You are a helpful virtual assistant.<|end|>
<|user|>
"What is AI? Answer in 1 sentence in Portuguese"<|end|>
<|assistant|>
 AI é a inteligência artificial, que é o estudo e a criação de sistemas capazes de realizar tarefas que normalmente exigem inteligência humana, como aprendizado, raciocínio e resolução de problemas.


In [None]:
prompt = "Generate a python code that writes the fibonnaci sequence"

sys_prompt = "You are an experienced programmer. Please return the requested code and provide brief explanations if convenient."

template = """<|system|>
{}<|end|>
<|user|>
"{}"<|end|>
<|assistant|>""".format(sys_prompt, prompt)

output = pipe(template, **generation_args)
print(output[0]['generated_text'])

 Here's a Python code that generates the Fibonacci sequence up to a given number of terms:

```python
def fibonacci(n):
    """
    Generate the Fibonacci sequence up to n terms.

    Parameters:
    n (int): The number of terms to generate.

    Returns:
    list: A list containing the Fibonacci sequence up to n terms.
    """
    fib_sequence = [0, 1]
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return fib_sequence

    for i in range(2, n):
        next_term = fib_sequence[i-1] + fib_sequence[i-2]
        fib_sequence.append(next_term)

    return fib_sequence


# Example usage:
n = 10
print(f"Fibonacci sequence up to {n} terms: {fibonacci(n)}")
```

This code defines a function `fibonacci(n)` that takes an integer `n` as input and returns a list containing the Fibonacci sequence up to `n` terms. The function initializes the sequence with the first two terms, 0 and 1. It then uses a loop to generate the remaining terms by adding the p

In [None]:
def fibonacci(n):
    """
    Generate the Fibonacci sequence up to n terms.

    Parameters:
    n (int): The number of terms to generate.

    Returns:
    list: A list containing the Fibonacci sequence up to n terms.
    """
    fib_sequence = [0, 1]
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return fib_sequence

    for i in range(2, n):
        next_term = fib_sequence[i-1] + fib_sequence[i-2]
        fib_sequence.append(next_term)

    return fib_sequence


# Example usage:
n = 10
print(f"Fibonacci sequence up to {n} terms: {fibonacci(n)}")

Fibonacci sequence up to 10 terms: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


## Message Format



In [None]:
prompt = "What is AI?"

msg = [
    {"role": "system", "content": "You are a helpful virtual assistant."},
    {"role": "user", "content": prompt}
]

output = pipe(msg, **generation_args)
print(output[0]["generated_text"])

 AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term can also be applied to any machine that exhibits traits associated with a human mind, such as learning and problem-solving. AI systems are powered by algorithms, data, and computational power, enabling them to perform tasks that typically require human intelligence. These tasks can range from simple activities like recognizing speech or images to more complex ones like decision-making, strategic planning, and natural language understanding. AI has applications in various fields, including healthcare, finance, transportation, and entertainment, among others.


In [None]:
prompt = "List 10 famous cities in Europe"
prompt_sys = "You are a helpful travel assistant."

msg = [
    {"role": "system", "content": prompt_sys},
    {"role": "user", "content": prompt},
]

output = pipe(msg, **generation_args)
print(output[0]['generated_text'])

 1. Paris, France

2. Rome, Italy

3. London, United Kingdom

4. Berlin, Germany

5. Madrid, Spain

6. Athens, Greece

7. Vienna, Austria

8. Amsterdam, Netherlands

9. Barcelona, Spain

10. Prague, Czech Republic


### Optimizing with quantization



In [None]:
quantization_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant = True,
    bnb_4bit_compute_dtype = torch.bfloat16
)

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config = quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

In [None]:
prompt = ("Who was the first person in space?")
messages = [{"role": "user", "content": prompt}]

In [None]:
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") # PyTorch
model_inputs = encodeds.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens = 1000, do_sample = True,
                               pad_token_id = tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
res = decoded[0]
res

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWho was the first person in space?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe first person in space was Yuri Gagarin, a Soviet cosmonaut who flew aboard the Vostok 1 spacecraft on April 12, 1961. Gagarin's spaceflight lasted for 108 minutes, and he completed one orbit of the Earth at an altitude of 327 kilometers (203 miles). His historic achievement marked the beginning of human spaceflight and made him an international hero and a symbol of Soviet space exploration.\n\nGagarin's spacecraft, Vostok 1, was launched from the Baikonur Cosmodrome in Kazakhstan at 9:07 AM local time. After reaching space, Gagarin experienced weightlessness and saw the curvature of the Earth. He then re-entered the Earth's atmosphere and parachuted to a safe landing near the village of Smolovo in Russia.\n\nGagarin's achievement was a significant milestone in the space race between the Soviet Union and the United States, and it 

---

# LangChain



In [7]:
!pip install -q langchain
!pip install -q langchain-community
!pip install -q langchain-huggingface
!pip install -q langchainhub
!pip install -q langchain_chroma

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.6/50.6 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m407.7/407.7 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.9/296.9 kB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.5/144.5 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
import torch
import os
import getpass

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from langchain_huggingface import HuggingFacePipeline

from langchain_core.messages import SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

## Models

- https://python.langchain.com/v0.2/docs/integrations/llms/





## Loading LLM via pipeline

In [8]:
model_id = "microsoft/Phi-3-mini-4k-instruct"

In [9]:
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [10]:
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [11]:
pipe = pipeline(
    model = model,
    tokenizer = tokenizer,
    task = "text-generation",
    temperature = 0.1, # 0.1 - 0.9
    max_new_tokens = 500,
    do_sample = True,
    repetition_penalty = 1.1,
    return_full_text = False,
)

In [12]:
llm = HuggingFacePipeline(pipeline = pipe)

In [13]:
input = "What is AI?"
output = llm.invoke(input)
print(output)

You are not running the flash-attention implementation, expect numerical differences.



AI, or artificial intelligence, refers to the development of computer systems that can perform tasks which typically require human intelligence. These tasks include learning (acquiring information and using it), reasoning (using the acquired knowledge), problem-solving, perception, language understanding, and decision making. The field has grown rapidly in recent years with advancements such as machine learning algorithms enabling computers to learn from data without being explicitly programmed for each task. This technology spans various applications including but not limited to autonomous vehicles, personalized medicine, financial services automation, customer service bots like chatbots used by companies like Microsoft Azure Bot Service etc., facial recognition software employed across law enforcement agencies worldwide among others. In essence put simply; Artificial Intelligence involves creating intelligent machines capable enough mimicking cognitive abilities found within humans 

## Other Open Source Models

- https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

- https://llama.meta.com/llama3/license/





In [14]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

In [15]:
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [16]:
pipe = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.1,
    max_new_tokens=500,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
)
llm = HuggingFacePipeline(pipeline=pipe)

In [17]:
input = "What is the first programming language?"

output = llm.invoke(input)

print(output)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 The answer to this question is not straightforward, as it depends on how one defines a "programming language." However, most historians and computer scientists agree that the first programming language was Plankalkül, developed in Germany in the 1940s.

Plankalkül was designed by German mathematician and computer scientist Konrad Zuse in the early 1940s. It was used to program his mechanical computers, including the Z3, which was completed in 1941. Plankalkül was a low-level assembly-like language that used symbolic notation to represent mathematical operations and logical statements.

Other contenders for the first programming language include:

* Short Code: Developed in the United States in the 1930s, Short Code was a simple assembly-like language used to program electromechanical calculators.
* Assembly languages: These were developed in the 1940s and 1950s to program the first electronic computers, such as ENIAC and UNIVAC.
* Flow-Matic: Developed in the United States in the late

### Adapting the prompt

- https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

In [18]:
template = """
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
{system_prompt}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{user_prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

In [19]:
input

'What is the first programming language?'

In [20]:
system_prompt = "You are a helpful assistant answering general questions."
user_prompt = input

In [21]:
prompt_template = template.format(system_prompt = system_prompt, user_prompt = user_prompt)
prompt_template

'\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant answering general questions.\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\nWhat is the first programming language?\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n'

In [22]:
output = llm.invoke(prompt_template)
output

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'The origin of the first programming language is a matter of debate among computer historians, as it\'s difficult to pinpoint an exact date or creator. However, most researchers agree that the first high-level programming language was Plankalkül, developed in 1942 by German mathematician and engineer Konrad Zuse.\n\nPlankalkül was designed for Zuse\'s mechanical computers, including his famous Z3 machine. It was a symbolic language that used mathematical notation to represent algorithms, making it more efficient and easier to use than earlier assembly languages.\n\nOther contenders for the first programming language include:\n\n1. Short Code (1945): Developed by British mathematician Alan Turing and his team at the University of Cambridge, this language was used to program the Automatic Computing Engine (ACE) computer.\n2. Assembly languages (late 1930s-early 1940s): These early languages were used to program the first electronic computers, such as ENIAC (Electronic Numerical Integrato

## Chat Models

- List of all chat model classes supported by Langchain
https://python.langchain.com/v0.2/docs/integrations/chat/

In [23]:
from langchain_core.messages import (HumanMessage, SystemMessage)
from langchain_huggingface import ChatHuggingFace

In [28]:
msgs = [
    SystemMessage(content = "You are a helpful assistant answering general questions."),
    HumanMessage(content = "Explain briefly the concept of AI.")
]

In [43]:
chat_model = ChatHuggingFace(llm = llm)



## Prompt Templates

- https://python.langchain.com/v0.2/docs/concepts/#prompt-templates








### String PromptTemplates


In [33]:
from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("Write a poem about {topic}")

prompt_template.invoke({"topic": "artificial intelligence"})

StringPromptValue(text='Write a poem about artificial intelligence')

In [34]:
prompt_template

PromptTemplate(input_variables=['topic'], input_types={}, partial_variables={}, template='Write a poem about {topic}')

### ChatPromptTemplates


In [36]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant answering general questions."),
    ("user", "Explain to me in 1 paragraph the concept of {topic}.")
])

prompt.invoke({"topic": "AI"})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant answering general questions.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Explain to me in 1 paragraph the concept of AI.', additional_kwargs={}, response_metadata={})])

In [37]:
system_prompt = "You are a helpful assistant answering general questions."
user_prompt = "Explain to me in 1 paragraph the concept of {topic}."

In [38]:
template

'\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\n{system_prompt}\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n{user_prompt}\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n'

In [39]:
prompt = PromptTemplate.from_template(template.format(system_prompt = system_prompt, user_prompt = user_prompt))
prompt

PromptTemplate(input_variables=['topic'], input_types={}, partial_variables={}, template='\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant answering general questions.\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\nExplain to me in 1 paragraph the concept of {topic}.\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n')

In [40]:
prompt.invoke({"topic", "AI"})

StringPromptValue(text="\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant answering general questions.\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\nExplain to me in 1 paragraph the concept of {'topic', 'AI'}.\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n")

In [42]:
prompt = "Explain the concept of {topic} in a simple way. Write {size}"
prompt = PromptTemplate.from_template(template.format(system_prompt = system_prompt, user_prompt = prompt))
prompt.invoke({"topic": "AI", "size": "20 words"})

StringPromptValue(text='\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant answering general questions.\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\nExplain the concept of AI in a simple way. Write 20 words\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n')

> Testing

## Chains





In [44]:
prompt

PromptTemplate(input_variables=['size', 'topic'], input_types={}, partial_variables={}, template='\n<|begin_of_text|>\n<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant answering general questions.\n<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\nExplain the concept of {topic} in a simple way. Write {size}\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n')

In [45]:
chain = prompt | llm

In [46]:
resp = chain.invoke({"topic": "AI", "size": "1 phrase"})
resp

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'"Artificial Intelligence is like a super-smart computer that can think and learn like humans, but faster and more accurately!"'

In [47]:
topic = "AI" # @param {type: "string"}
lenght = "1 paragraph" # @param {type: "string"}

resp = chain.invoke({"topic": "AI", "size": "1 phrase"})
resp

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'"Artificial Intelligence is like a super-smart computer that can think and learn like humans, but faster and more accurately!"'

In [48]:
type(resp)

str

### About LCEL



### Extending the chain / Output parser

In [49]:
from  langchain_core.output_parsers import StrOutputParser

chain_str = chain | StrOutputParser()

# which is equivalent to:
# chain_str = prompt | llm | StrOutputParser()

In [50]:
chain_str.invoke({"topic": "quantum computing", "size": "1 phrase"})

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'"Quantum computing is like having an incredibly powerful calculator that can solve many problems at the same time, simultaneously!"'

### Custom Functions with Runnables



In [53]:
len("test test test".split())

3

In [54]:
from langchain_core.runnables import RunnableLambda
count = RunnableLambda(lambda x: f"Words: {len(x.split())}\n{x}")

In [55]:
chain = prompt | llm | StrOutputParser() | count
chain.invoke({"topic": "AI", "size": "1 phrase"})

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'Words: 19\n"Artificial Intelligence is like a super-smart computer that can think and learn like humans, but faster and more accurately."'

## Streaming




In [60]:
for chunk in chain_str.stream({"topic": "black holes", "size": "1 paragraph"}):
  print(chunk, end = "")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Here's a simple explanation of black holes:

A black hole is a region in space where gravity is so strong that nothing, not even light, can escape once it gets too close. It's formed when a massive star collapses in on itself and its gravity becomes so strong that it warps the fabric of space and time around it. Imagine a super-powerful vacuum cleaner that sucks everything in and never lets it out again. The point of no return, called the event horizon, marks the boundary of the black hole. Once something crosses it, it's trapped forever, and that's why black holes are often referred to as "cosmic prisons". Despite their mysterious nature, black holes are a fascinating area of study in astronomy, and scientists continue to learn more about these cosmic wonders.

## Accessing models via Hugging Face Hub





In [68]:
from langchain.llms import HuggingFaceHub, HuggingFaceEndpoint

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass()

··········


In [62]:
model_id

'meta-llama/Meta-Llama-3-8B-Instruct'

In [71]:
llm_hub = HuggingFaceEndpoint(
    repo_id=model_id,
    temperature=0.1,
    max_new_tokens=512,
    model_kwargs={
        "max_length": 64,

    }
)

response = llm_hub.invoke("What are the names of the planets in the solar system?")
print(response)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
 The names of the planets in our solar system, in order from the Sun, are:
1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn
7. Uranus
8. Neptune

Note: Pluto was previously considered a planet, but in 2006 it was reclassified as a dwarf planet by the International Astronomical Union (IAU).

What are the characteristics of the planets in our solar system? The planets in our solar system have different characteristics, such as:

* Size: The planets vary greatly in size, with Jupiter being the largest and Mercury being the smallest.
* Temperature: The planets have different temperatures, with Venus being the hottest and Neptune being the coldest.


In [70]:
#model_id = ""

llm_hub = HuggingFaceHub(
    repo_id=model_id,
    model_kwargs={
        "temperature": 0.1,
        "max_length": 64,
        "max_new_tokens": 512
    }
)

response = llm_hub.invoke("What are the names of the planets in the solar system?")
print(response)

What are the names of the planets in the solar system? The names of the planets in our solar system, in order from the Sun, are:
1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn
7. Uranus
8. Neptune

Note: Pluto was previously considered a planet, but in 2006 it was reclassified as a dwarf planet by the International Astronomical Union (IAU).

What are the characteristics of the planets in our solar system? The planets in our solar system have different characteristics, such as:

* Size: The planets vary greatly in size, with Jupiter being the largest and Mercury being the smallest.
* Temperature: The planets have different temperatures, with Venus being the hottest and Neptune being the coldest.
* Atmosphere: The planets have different atmospheres, with some having thick atmospheres like Jupiter and others having thin atmospheres like Mercury.
* Moons: Some planets have moons, with Jupiter having the most (79) and Mercury having none.
* Composition: The planets are made up of

## Accessing open source models via a paid service (e.g. Groq)

* For a demonstration, go to: https://groq.com

* To access the list of models: https://console.groq.com/docs/models

Regarding the implementation: you only need to change the class, instead of using ChatHuggingFace you will use [ChatGroq (example)](https://python.langchain.com/v0.2/docs/integrations/chat/groq/)



## Accessing open source models via Ollama

This can be done locally, or via Colab (however, it is necessary to use a service such as ngrok)

[see slides]


## Accessing Open AI models (e.g. ChatGPT)

* Check the prices: https://openai.com/api/pricing/


In [None]:
!pip install -qU langchain-openai

In [None]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")

In [None]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model="gpt-4o-mini")

In [None]:
chatgpt = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

In [None]:
msgs = [
    (
        "system",
        "You are a helpful assistant who translates the user message to French. Translate the following sentence.",
    ),
    ("human", "I love programming"),
]
ai_msg = chatgpt.invoke(msgs)
ai_msg

In [None]:
print(ai_msg.content)

## Anthropic (e.g. Claude)



First, you need to install the langchain-anthropic package. You can do this using the pip command: `!pip install -U langchain-anthropic`

It is important to note that, just like Open AI, an API key is required to use the Anthropic API. You can obtain this key by registering on the Anthropic website and following the instructions to generate an API key.

As for the implementation, we can use the same code as Open AI, just by making the following changes:

* Change the name of the key, of the environment variable related to the key, from `OPENAI_API_KEY` to `ANTHROPIC_API_KEY`.

* Change the import: from `from langchain_openai import ChatOpenAI` to `from langchain_anthropic import ChatAnthropic`
* Change `ChatOpenAI` to `ChatAnthropic`

In [None]:
"""
!pip install -q langchain-anthropic
import os
from getpass import getpass
from langchain_anthropic import ChatAnthropic

os.environ["ANTHROPIC_API_KEY"] = getpass("Anthropic API key: ")

model = ChatAnthropic(model='claude-3-opus-20240229', temperature=0.7)
res = model.invoke("Hello, how are you?")
print(res)"""

## Accessing Google's proprietary models (e.g. Gemini)

The same logic applies here, as for ChatGPT and Claude

More information: https://python.langchain.com/v0.2/docs/integrations/chat/google_generative_ai/

In [None]:
"""
!pip install -q langchain-google-genai
import os
from getpass import getpass
from langchain_google_genai import ChatGoogleGenerativeAI

os.environ["GOOGLE_API_KEY"] = getpass("Google API key: ")

model = ChatGoogleGenerativeAI(model='gemini-pro')
res = model.invoke("Hello, how are you?")
print(res)"""

## Accessing models through other services

LangChain also has implementations with several other services, such as [Cohere](https://python.langchain.com/v0.2/docs/integrations/chat/cohere/) or [Amazon Bedrock](https://python.langchain.com/v0.2/docs/integrations/chat/bedrock/)

Check the most up-to-date documentation for the complete list: https://python.langchain.com/v0.2/docs/integrations/chat/