**TASK** Recap:

- Jak działa perceptron?
- Jaka jest podstawowa operacja podczas propagacji informacji w przód w Sieci Feed Forward?
- Po co są funkcje aktywacji w sieciach neuronowych?
- Jakie są ograniczenia uczenia SSN metodą Stochastycznego Spadku wzdłuż gradientu?
- Co to są warsty Dropout i BatchNormalization i po co się je stosuje w sieciach neuronowych?
- Czym charakteryzują się neuronowe sieci konwolucyjne?
- Co to jest operacja splotu?
- Czym charakteryzują się neuronowe sieci rekurencyjne?
- Jakie istnieją metody poprawy skuteczności modeli rekurencyjnych w przetwarzaniu języka naturalnego?

<style>
td {
  text-align: center;
}

th {
  text-align: center;
}
</style>

## LLAMA

**NOTE** Ten notebook zadziała tylko na środowisku z GPU T4.

In [None]:
!pip install transformers accelerate bitsandbytes

Load the Llama model locally - might take a few sec.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "meta-llama/Llama-3.2-3B-Instruct" # "meta-llama/Llama-2-7b-chat-hf"
prompt = "Tell me a joke about artificial neural networks"
access_token = "PUT_YOUR_APIKEY_HERE"



model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True,  use_auth_token=access_token)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_auth_token=access_token)

**NOTE** Knowing how much memory is needed to store a 3B-LLM (above cell run), how much memory is needed for Gemini/GPT with ~200B parameters?

Test the model

In [None]:
model_inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")

output = model.generate(**model_inputs)

In [None]:
print(output[0])

In [None]:
print(tokenizer.decode(output[0], skip_special_tokens=True))

**TASK** Perform entity extraction using Llama model on the below text. Extract the following information (keys) as JSON key value pairs:
- PERSON
- ORGANIZATION
- LOCATION
- DATE
- MONEY

In [None]:
prompt_enity = """
On March 15, 2023, OpenAI announced a new partnership with Microsoft during a press conference held in San Francisco, California. The collaboration aims to integrate advanced AI models into Microsoft’s Azure cloud platform, with a projected investment of over $1 billion.
According to Sam Altman, CEO of OpenAI, the partnership will accelerate the democratization of artificial intelligence across industries. In a related development, Google DeepMind unveiled a new version of its Gemini model on the same day in London.
Meanwhile, the European Commission is preparing new AI regulations, expected to be finalized by Q4 of 2024. These regulations could affect companies like Amazon, Meta, and IBM, which are actively developing generative AI applications.
Analysts from Goldman Sachs and McKinsey & Company predict that AI could contribute up to $15.7 trillion to the global economy by 2030, with the healthcare and finance sectors likely to benefit the most.
"""


**TIP** For this inference a transformer pipeline might be more suitable - **restart kernel** before execution.

In [None]:
import torch
from transformers import pipeline

In [None]:
model_id = "meta-llama/Llama-3.2-3B-Instruct"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

In [None]:
messages = [
    {"role": "system", "content": "MODEL_EXPERTISE"},
    {"role": "user", "content": "USER_PROMPT"},
]

In [None]:
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

In [None]:
print(outputs[0]["generated_text"][-1]["content"])

**TASK** Test the Llama model on some coding tasks with your prompt:
- Write a Python function that computes the nth Fibonacci number using recursion. Include basic input validation and a docstring.
- Write a Python function that returns the first non-repeating character in a string. If all characters repeat, return None.
- The coding task of your choice

**TIP** You might want to test models fine tuned for code (if you have access to them): https://huggingface.co/meta-llama

In [None]:
coding_prompt = """
PUT_YOUR_PROMPT_HERE
"""

**TASK** ($$) Build a simple chatbot class using local Llama model:
- start with a simple chatbot base prompt, that specifies LLM expertise
- pass user question to a chatbot by adding the user question to the base prompt (inference through a method `invoke`)
- return a string answer to the user (tokenize LLM output)
- ($) keep history of last 3 conversations in the chatbot, by putting them into the prompt. If the conversation is to long, prepare a summary of each message and put the summaries into the LLM prompt

To learn more about transformers and attention mechanism - the very engine of LLMs - you might:
- read original article on attention mechanism (~ 175k cites): https://arxiv.org/abs/1706.03762
- watch a brilliant explanatory video on transformers: https://www.youtube.com/watch?v=KJtZARuO3JY&ab_channel=GrantSanderson
- practice coding transformers from scratch with tensorflow: https://www.tensorflow.org/text/tutorials/transformer?hl=pl  