In [12]:
%pip install mistralai cohere bitsandbytes accelerate transformers torch --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Mistral Model Family

### Mistral Base Model

In [1]:
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
#from google.colab import userdata
import dotenv
import os

In [2]:
#api_key = userdata.get("MISTRAL_API_KEY")
dotenv.load_dotenv()
model = "open-mistral-7b"

In [7]:
client = MistralClient(api_key = os.environ["MISTRAL_API_KEY"])

chat_response = client.chat(
    model=model,
    messages=[ChatMessage(role="user", content="What is meant by a Transformer in Large Language Models?")]
)

print(chat_response.choices[0].message.content)

In the context of large language models, a Transformer is a type of model architecture introduced in the paper "Attention is All You Need" by Vaswani et al. The Transformer model is particularly useful for tasks that involve understanding and generating human language.

The Transformer model uses self-attention mechanisms to allow the model to focus on different parts of the input sequence when producing an output. This is in contrast to traditional recurrent neural networks (RNNs) or long short-term memory (LSTM) networks, which process the input sequentially and may struggle with long-range dependencies.

The self-attention mechanism allows the model to assign different weights to different parts of the input sequence when producing an output. This means that the model can consider the context of the entire input sequence when making predictions, which is particularly useful for tasks like language translation and text generation.

In large language models, the Transformer architectu

### Codestral Model

In [8]:
model = "codestral-latest"
client = MistralClient(api_key = os.environ["MISTRAL_API_KEY"])

chat_response = client.chat(
    model=model,
    messages=[ChatMessage(role="user", content="Write the cypher query for printing age values greater than 45")]
)

print(chat_response.choices[0].message.content)

Assuming you're using a graph database like Neo4j, the Cypher query for printing age values greater than 45 would look like this:

```cypher
MATCH (n)
WHERE n.age > 45
RETURN n.age
```

This query will match all nodes (n) in the database, filter out those where the age property is greater than 45, and then return the age property of the remaining nodes.

Please replace `n` and `age` with your actual node label and property name if they are different.


## Cohere Model Family

### Command Model

In [3]:
import cohere

#co = cohere.Client(userdata.get("COHERE_API_KEY")) 
co = cohere.Client(os.environ["COHERE_API_KEY"])

In [7]:
response = co.chat(message="What is meant by a Transformer in Large Language Models? Explain in only 100 words")

print(response.text)

A Transformer is a neural network architecture that revolutionized the field of Natural Language Processing (NLP) by allowing models to process input sequences (such as sentences or documents) in parallel, enabling more efficient training and improved performance on a variety of tasks. 

The key innovation of the Transformer is its use of self-attention mechanisms, which allow the model to weigh the importance of different input elements when generating a response. This attention mechanism enables the model to capture long-range dependencies and generate contextually relevant representations of the input data.


### Command R Plus Model

In [8]:
response = co.chat(model="command-r-plus", message="What is meant by a Transformer in Large Language Models? Explain in only 100 words")

print(response.text)

A Transformer is a neural network architecture that has revolutionized the field of natural language processing (NLP). It was introduced by Vaswani et al. in 2017. Unlike traditional recurrent neural networks (RNNs), the Transformer relies solely on attention mechanisms to process sequential data, such as text. The key innovation of the Transformer is its ability to process input sequences in parallel, making it highly efficient and capable of handling long-range dependencies in text. This has led to significant improvements in tasks such as machine translation, text summarization, and question answering.


## LLaMa Model

In [3]:
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16, "load_in_4bit": False}
)

Downloading shards:   0%|          | 0/4 [13:10<?, ?it/s]
Downloading shards: 100%|██████████| 4/4 [09:40<00:00, 145.20s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:52<00:00, 13.21s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [6]:
messages = [
    {"role": "system", "content": "Answer questions"},
    {"role": "user", "content": "What is meant by a Transformer in Large Language Models?"},
]

In [7]:
prompt = pipeline.tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
)

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    temperature=0.2,
)
print(outputs[0]["generated_text"][len(prompt):],end='.')

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
