# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**:

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉 Encoder-only models are best for text understanding only, not text generation.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**:

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉 It struggles because accuracy of the transcription depends on clarity of speech and also on internet connection sometimes.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**:

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉 I would want clarity in its voice the most, and then maybe the tone to be a little softer and natural.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [2]:
!pip install groq

Collecting groq
  Downloading groq-0.31.0-py3-none-any.whl.metadata (16 kB)
Downloading groq-0.31.0-py3-none-any.whl (131 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/131.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.4/131.4 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.31.0


In [3]:
from groq import Groq

client = Groq(api_key="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")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI, also known as conversational systems or conversational interfaces, refers to technology that enables computers to understand and respond to natural language inputs from humans, creating a conversation-like experience.

Conversational AI uses various techniques from natural language processing (NLP), machine learning (ML), and human-computer interaction (HCI) to process and generate human-like responses. This technology can be used in various applications, such as:

1. Chatbots: Automated customer support systems that interact with users through text or voice.
2. Virtual Assistants: AI-powered assistants like Siri, Google Assistant, or Alexa that can answer questions, provide information, and perform tasks.
3. Voice Assistants: AI-powered systems that use voice inputs to perform tasks, such as Google Duplex.
4. Virtual Customer Service Representatives (VCSRs): AI-powered agents that interact with customers to resolve issues or answer questions.
5. Live Chat Systems: A

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding (☑️)

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.


Encoder-only models are best for understanding text, like BERT. Decoder-only models are best for text generation, like ChatGPT. And lastly, Encoder-Decoder models are robust text transformers, like BART.

### 📝 Assignment 2: STT/TTS Exploration (☑️)

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

Speech-to-Text (STT) Model: DeepSpeech (by Mozilla)

* What it does: Converts spoken language into written text using a deep learning model trained on large speech datasets.

* Application: Can be used for creating real-time transcription tools for online meetings.

Text-to-Speech (TTS) Model: FastSpeech 2 (by Microsoft Research)

* What it does: A non-autoregressive TTS model that converts text into high-quality, natural-sounding speech quickly.

* Application: Useful for real-time applications like screen readers for visually impaired users.

### 📝 Assignment 3: Build a Chatbot with Memory (☑️)

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [7]:
from groq import Groq

# Initialize client
client = Groq(api_key="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")

# Store last 5 exchanges (user + bot)
conversation = []

while True:
    user_input = input("You: ")

    if user_input.lower() == "quit":
        print("Ending chat...")
        break

    # Add user message
    conversation.append({"role": "user", "content": user_input})

    # Keep only last 5 exchanges (10 messages: 5 user + 5 bot)
    if len(conversation) > 10:
        conversation = conversation[-10:]

    # Send to Groq API
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=conversation
    )

    bot_reply = response.choices[0].message.content
    print("Bot:", bot_reply)

    # Add bot reply
    conversation.append({"role": "assistant", "content": bot_reply})

    # Again, trim history
    if len(conversation) > 10:
        conversation = conversation[-10:]


You: give me short answers only. what do you think is pakistan
Bot: Pakistan is a country located in South Asia, situated east of Iran and west of India.
You: what about india
Bot: India is a country in South Asia, known for being the world's largest democracy, diverse culture, and vibrant cities.
You: and saudi arabia
Bot: Saudi Arabia is a Middle Eastern country, home to Islam's holiest site, the holy cities of Mecca and Medina, and a major oil producer.
You: and nigeria
Bot: Nigeria is a West African country, known for its rich oil reserves, diverse cultures, and vibrant music scene, especially Afrobeats.
You: and turkey
Bot: Turkey is a Middle Eastern and Mediterranean country, bridging Europe and Asia, with a mix of Islamic and secular culture.
You: and turkmenistan
Bot: Turkmenistan is a Central Asian country and one of the world's most isolated nations, with a unique blend of Turkic and Soviet influences, rich oil reserves, and a strict authoritarian government.
You: and afghani

### 📝 Assignment 4: Preprocessing Function (☑️)

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [8]:
import string

def clean_text(text: str) -> str:
    # Lowercase
    text = text.lower()
    # Remove punctuation
    text = text.translate(str.maketrans("", "", string.punctuation))
    # Strip extra spaces
    text = " ".join(text.split())
    return text

# Test
sample = "  HELLo!!!  How ARE you?? "
print(clean_text(sample))

hello how are you


### 📝 Assignment 5: Text Preprocessing (☑️)

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [10]:
import spacy

# Load English pipeline (small model is enough)
# Run this in terminal once if not installed:  python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")

def preprocess_text_spacy(text: str) -> str:
    # Process text with spaCy
    doc = nlp(text.lower())  # lowercase automatically

    processed = []
    for token in doc:
        # Remove stopwords, punctuation, numbers, and short words (<3 chars)
        if (not token.is_stop
            and not token.is_punct
            and not token.like_num
            and len(token.lemma_) >= 3):

            # Keep only nouns, verbs, adjectives
            if token.pos_ in {"NOUN", "VERB", "ADJ"}:
                processed.append(token.lemma_)

    return " ".join(processed)


# ---------- Test ----------
sample = "The cats are running quickly in the 2025 streets, and they looked very happy!"
print(preprocess_text_spacy(sample))

cat run street look happy


### 📝 Assignment 6: Reflection (☑️)

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?

    * ANS: It's important to remember the context of the conversation, so that further down the line the conversation you've with your chatbot remains personalized and effective and you won't have to remind it again and again about what you were just talking about. Kills the purpose of chatbots if there's no memory involved.
    * Why should beginners always check **API limits and pricing**?
    * ANS: If you don't check the pricing it can cost you more than you expected it to cost you, and checking limits is necessary as well like how many times can you use the model via the API key, or else you'll bottleneck the limit which may result in losing access to the model.

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.