In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
# hf_
# 
# FMDlPgKapgl
# HzEcXNeVxZzep
# KWyHfsFVlj
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])


In [None]:
## Libraries Required
!pip install langchain-huggingface
## For API Calls
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install  bitsandbytes
!pip install langchain

In [None]:
import os
os.environ['HF_TOKEN'] = ""

In [None]:
hf_token=os.environ['HF_TOKEN']


In [None]:
## Environment secret keys
from google.colab import userdata
sec_key=userdata.get("llm_model")
print(sec_key)

In [None]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"]=sec_key

In [None]:
from langchain_huggingface import HuggingFaceEndpoint
repo_id="mistralai/Mistral-7B-Instruct-v0.2"
llm=HuggingFaceEndpoint(repo_id=repo_id,temperature=0.7,huggingfacehub_api_token=sec_key, task="conversational")

In [None]:
llm.invoke("What is machine learning")

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=hf_token)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto", use_auth_token=hf_token)

# Prompt
prompt = "Explain quantum physics like I'm five."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Great! Let's break down **Mistral from scratch** into a detailed, beginner-friendly guide. This will help you understand and work with **Mistral models** ‚Äî open-weight, decoder-only large language models (LLMs) released by [Mistral AI](https://mistral.ai).

---

## ‚úÖ What is Mistral?

**Mistral** refers to a family of **open-weight Large Language Models (LLMs)**, similar to GPT, developed by [Mistral AI](https://mistral.ai). It is:

* **Decoder-only** transformer architecture.
* **Optimized for speed and performance** (uses grouped-query attention).
* Released under **Apache 2.0 license**.
* Good for **text generation, chatbots, summarization, reasoning, coding**, etc.

---

## üß† Prerequisites

Before starting with Mistral:

* ‚úÖ Python basics
* ‚úÖ PyTorch (for working with models)
* ‚úÖ Hugging Face Transformers
* ‚úÖ Understanding of Transformer architecture (optional but helpful)

---

## üì¶ Models Released

1. **Mistral 7B** ‚Äì 7 billion parameter model
2. **Mixtral (Mixture of Experts)** ‚Äì 12.9B active params, 2-of-8 MoE
3. **Mistral-instruct** ‚Äì Fine-tuned for instruction following

---

## üõ†Ô∏è Setup Guide

### üîß 1. Install Required Libraries

```bash
pip install torch transformers accelerate
```

### üîß 2. Load Mistral from Hugging Face

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

# Prompt
prompt = "Explain quantum physics like I'm five."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(output[0], skip_special_tokens=True))
```

---

## üß™ Use Cases

| Use Case          | Prompt Example                            |
| ----------------- | ----------------------------------------- |
| **Q\&A Bot**      | "What is the capital of France?"          |
| **Summarization** | "Summarize this article: ..."             |
| **Code Gen**      | "Write a Python function to sort a list." |
| **Chatbot**       | "Hello, how can I help you today?"        |

---

## üîç Behind the Scenes ‚Äì How Mistral Works

* **Decoder-only architecture**: Like GPT.
* **Sliding window attention** (for longer context without full quadratic cost).
* **Grouped-query attention (GQA)**: Faster inference.
* **Rotary positional embeddings (RoPE)**: For positional awareness.
* **Supports long context (up to 32k tokens)** in Mixtral.

---

## üß† Instruction Tuning (Mistral-Instruct)

Mistral Instruct is trained with prompt/response data to follow instructions better:

Example prompt:

```
<s>[INST] What's the fastest animal on land? [/INST]
```

---

## üöÄ Deployment Ideas

* Host it locally using `transformers`
* Deploy on Hugging Face Spaces with Gradio
* Run on cloud GPUs (Google Colab, AWS, RunPod, etc.)

---

## üß© Tips

* Use `torch_dtype=torch.float16` for lower memory usage
* Use `device_map="auto"` to utilize multiple GPUs (if available)
* Try prompt engineering to improve results

---

## üìö Resources

* ü§ñ [Hugging Face Mistral page](https://huggingface.co/mistralai)
* üìò [Transformers Docs](https://huggingface.co/docs/transformers/index)
* üß† [Mistral blog](https://mistral.ai/news/)

---

## üì¶ Sample Project Ideas

1. **Chatbot UI with Gradio + Mistral**
2. **CSV Question Answering with Mistral**
3. **Summarizer API using FastAPI + Mistral**
4. **Voice to Text to Mistral Response (Whisper + Mistral)**

---

Would you like help building a **full beginner project** using Mistral (e.g., chatbot, Q\&A from PDF/CSV, API deployment)?


Great! Let's break down **Mistral from scratch** into a detailed, beginner-friendly guide. This will help you understand and work with **Mistral models** ‚Äî open-weight, decoder-only large language models (LLMs) released by [Mistral AI](https://mistral.ai).

---

## ‚úÖ What is Mistral?

**Mistral** refers to a family of **open-weight Large Language Models (LLMs)**, similar to GPT, developed by [Mistral AI](https://mistral.ai). It is:

* **Decoder-only** transformer architecture.
* **Optimized for speed and performance** (uses grouped-query attention).
* Released under **Apache 2.0 license**.
* Good for **text generation, chatbots, summarization, reasoning, coding**, etc.

---

## üß† Prerequisites

Before starting with Mistral:

* ‚úÖ Python basics
* ‚úÖ PyTorch (for working with models)
* ‚úÖ Hugging Face Transformers
* ‚úÖ Understanding of Transformer architecture (optional but helpful)

---

## üì¶ Models Released

1. **Mistral 7B** ‚Äì 7 billion parameter model
2. **Mixtral (Mixture of Experts)** ‚Äì 12.9B active params, 2-of-8 MoE
3. **Mistral-instruct** ‚Äì Fine-tuned for instruction following

---

## üõ†Ô∏è Setup Guide

### üîß 1. Install Required Libraries

```bash
pip install torch transformers accelerate
```

### üîß 2. Load Mistral from Hugging Face

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

# Prompt
prompt = "Explain quantum physics like I'm five."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(output[0], skip_special_tokens=True))
```

---

## üß™ Use Cases

| Use Case          | Prompt Example                            |
| ----------------- | ----------------------------------------- |
| **Q\&A Bot**      | "What is the capital of France?"          |
| **Summarization** | "Summarize this article: ..."             |
| **Code Gen**      | "Write a Python function to sort a list." |
| **Chatbot**       | "Hello, how can I help you today?"        |

---

## üîç Behind the Scenes ‚Äì How Mistral Works

* **Decoder-only architecture**: Like GPT.
* **Sliding window attention** (for longer context without full quadratic cost).
* **Grouped-query attention (GQA)**: Faster inference.
* **Rotary positional embeddings (RoPE)**: For positional awareness.
* **Supports long context (up to 32k tokens)** in Mixtral.

---

## üß† Instruction Tuning (Mistral-Instruct)

Mistral Instruct is trained with prompt/response data to follow instructions better:

Example prompt:

```
<s>[INST] What's the fastest animal on land? [/INST]
```

---

## üöÄ Deployment Ideas

* Host it locally using `transformers`
* Deploy on Hugging Face Spaces with Gradio
* Run on cloud GPUs (Google Colab, AWS, RunPod, etc.)

---

## üß© Tips

* Use `torch_dtype=torch.float16` for lower memory usage
* Use `device_map="auto"` to utilize multiple GPUs (if available)
* Try prompt engineering to improve results

---

## üìö Resources

* ü§ñ [Hugging Face Mistral page](https://huggingface.co/mistralai)
* üìò [Transformers Docs](https://huggingface.co/docs/transformers/index)
* üß† [Mistral blog](https://mistral.ai/news/)

---

## üì¶ Sample Project Ideas

1. **Chatbot UI with Gradio + Mistral**
2. **CSV Question Answering with Mistral**
3. **Summarizer API using FastAPI + Mistral**
4. **Voice to Text to Mistral Response (Whisper + Mistral)**

---

Would you like help building a **full beginner project** using Mistral (e.g., chatbot, Q\&A from PDF/CSV, API deployment)?


Great choice! Here's a **step-by-step interview-ready project**:
üîä **Voice to Text to Mistral Response using Whisper + Mistral**

This project shows your expertise in:

* Speech recognition (Whisper)
* LLM text generation (Mistral)
* Real-time interaction (optional: Gradio or Streamlit UI)
* Integration of two AI models

---

## ‚úÖ Project Title

**AI Voice Assistant: Talk to an LLM using Whisper and Mistral**

---

## üß© Overview

1. üéôÔ∏è **Voice Input**: User speaks a question or instruction
2. üìù **Transcription**: Whisper converts voice to text
3. ü§ñ **Mistral Response**: Mistral generates a natural-language response
4. üí¨ **Display/Play Output**: Text is shown or spoken back

---

## üì¶ Required Libraries

```bash
pip install torch transformers openai-whisper gradio
```

---

## üõ†Ô∏è Full Python Code

```python
import whisper
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import gradio as gr

# Load Whisper model (tiny, base, small, medium, large)
whisper_model = whisper.load_model("base")

# Load Mistral model
mistral_model_id = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(mistral_model_id)
mistral_model = AutoModelForCausalLM.from_pretrained(
    mistral_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Function to convert voice to text
def transcribe(audio):
    print("Transcribing...")
    result = whisper_model.transcribe(audio)
    return result['text']

# Function to generate response using Mistral
def mistral_response(text):
    prompt = f"<s>[INST] {text} [/INST]"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = mistral_model.generate(**inputs, max_new_tokens=100)
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.split("[/INST]")[-1].strip()

# Combined function
def voice_to_mistral(audio):
    text = transcribe(audio)
    response = mistral_response(text)
    return f"üîà You said: {text}\nü§ñ Mistral says: {response}"

# Gradio UI
app = gr.Interface(
    fn=voice_to_mistral,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text",
    title="Voice to Mistral AI Assistant"
)

app.launch()
```

---

## üß† Interview Talking Points

**1. Why Whisper?**
‚ÄúWhisper is a state-of-the-art open-source speech recognition model by OpenAI, capable of multilingual and noisy speech transcription.‚Äù

**2. Why Mistral?**
‚ÄúMistral is an efficient, open-weight decoder-only LLM, optimized for text generation with competitive performance and open licensing.‚Äù

**3. Use Cases**

* Voice Assistants
* Accessibility Tools
* Voice-controlled Chatbots
* Hands-free Question Answering

**4. Challenges**

* Latency in large model inference
* Handling diverse accents with Whisper
* GPU requirements for Mistral

---

## ü™õ Optional Enhancements

* üîä Add TTS (text-to-speech) using `gTTS` or `pyttsx3`
* üíæ Save transcript and response as conversation logs
* üåê Deploy on Hugging Face Spaces or Streamlit Cloud

---

## üìÅ Folder Structure

```
voice_to_mistral/
‚îÇ
‚îú‚îÄ‚îÄ app.py                  # Main app script
‚îú‚îÄ‚îÄ requirements.txt        # Dependencies
‚îî‚îÄ‚îÄ README.md               # Project overview
```

---

Would you like:

* A version with TTS response?
* A PDF resume-ready project description?
* Deployment tutorial (Hugging Face / Streamlit Cloud)?

Let me know how you'd like to present it in your interview!


In [None]:
pip install openai-whisper

In [None]:
import whisper
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import gradio as gr

# Load Whisper model (tiny, base, small, medium, large)
whisper_model = whisper.load_model("base")

# Load Mistral model
mistral_model_id = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(mistral_model_id)
mistral_model = AutoModelForCausalLM.from_pretrained(
    mistral_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Function to convert voice to text
def transcribe(audio):
    print("Transcribing...")
    result = whisper_model.transcribe(audio)
    return result['text']

# Function to generate response using Mistral
def mistral_response(text):
    prompt = f"<s>[INST] {text} [/INST]"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = mistral_model.generate(**inputs, max_new_tokens=100)
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.split("[/INST]")[-1].strip()

# Combined function
def voice_to_mistral(audio):
    text = transcribe(audio)
    response = mistral_response(text)
    return f"üîà You said: {text}\nü§ñ Mistral says: {response}"

# Gradio UI
app = gr.Interface(
    fn=voice_to_mistral,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text",
    title="Voice to Mistral AI Assistant"
)

app.launch()