# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**: 

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉BERT understands well but cannot freely write long answers, so it struggles with open-ended conversational responses.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**: 

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉Meeting transcription apps struggle when multiple people talk at once because this creates overlapping speech, which is a type of noise for the model.

- The model has to separate different voices while understanding each word.

- When two or more people speak simultaneously, the audio is mixed together, making it hard for the speech model to identify who said what.

- This often leads to misheard words, jumbled sentences, or incorrect transcription.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**: 

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉The TTS voice should be clear, natural, engaging, and easy to follow, just like a real teacher.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [None]:
from groq import Groq

client = Groq(api_key="")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI refers to the technology that enables computers or digital systems to simulate human-like conversations with humans. This is achieved through the use of natural language processing (NLP) and machine learning algorithms that allow AI systems to understand, interpret, and respond to human input in a way that feels natural and intuitive.

Conversational AI can take many forms, including:

1. **Chatbots**: These are AI-powered software programs that can engage in text-based conversations with humans, often used to provide customer support, answer frequently asked questions, or facilitate transactions.
2. **Virtual assistants**: These are AI-powered digital assistants, such as Siri, Google Assistant, or Alexa, that can understand voice commands and respond with relevant information or actions.
3. **Voice-controlled interfaces**: These are AI-powered interfaces that allow users to interact with devices using voice commands, such as smart speakers or home automation systems.

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.

<p style="color:#1F618D; font-weight:bold; font-size:1.1em;">1. Encoder-only LLMs</p>
<ul>
  <li><b>What it does:</b> <span style="color:#117A65;">Focus on understanding input text, great for tasks like classification or extracting meaning.</span></li>
  <li><b>Example:</b> <span style="color:#D35400;">RoBERTa for spam detection.</span></li>
</ul>

<p style="color:#884EA0; font-weight:bold; font-size:1.1em;">2. Decoder-only LLMs</p>
<ul>
  <li><b>What it does:</b> <span style="color:#117A65;">Generate text from a given prompt, ideal for creative writing or chatbots.</span></li>
  <li><b>Example:</b> <span style="color:#D35400;">GPT-Neo for story or dialogue generation.</span></li>
</ul>

<p style="color:#2874A6; font-weight:bold; font-size:1.1em;">3. Encoder-decoder LLMs</p>
<ul>
  <li><b>What it does:</b> <span style="color:#117A65;">Can both understand input and generate output, suitable for summarization or translation.</span></li>
  <li><b>Example:</b> <span style="color:#D35400;">BART for summarizing news articles.</span></li>
</ul>

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

<p style="color:#1F618D; font-weight:bold; font-size:1.1em;">1. STT Model: DeepSpeech (by Mozilla)</p>
<ul>
  <li><b>What it does:</b> <span style="color:#117A65;">Converts spoken audio into written text (speech-to-text).</span></li>
  <li><b>Application:</b> <span style="color:#D35400;">Real-time transcription for meetings or podcasts.</span></li>
</ul>

<p style="color:#884EA0; font-weight:bold; font-size:1.1em;">2. TTS Model: FastSpeech 2</p>
<ul>
  <li><b>What it does:</b> <span style="color:#117A65;">Converts written text into natural, human-like speech (text-to-speech).</span></li>
  <li><b>Application:</b> <span style="color:#D35400;">Audiobook generation or voice assistants.</span></li>
</ul>

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [None]:
from groq import Groq

# Step 1: Create client
client = Groq(api_key="")

# Step 2: Memory for last 5 messages
memory = []

print('Chatbot: Type "Quit" to exit!')

while True:
    userInput = input("You: ")
    
    if userInput.lower() == "quit":
        print("Chatbot: Bye!")
        break
    
    # Add user input to memory
    memory.append({"role": "user", "content": userInput})
    
    # Keep only last 5 messages
    if len(memory) > 5:
        memory = memory[-5:]
    
    # Send to Groq API
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=memory
    )
    
    # Get AI reply
    bot_reply = response.choices[0].message.content
    
    # Add AI response to memory
    memory.append({"role": "assistant", "content": bot_reply})
    
    print("Chatbot:", bot_reply)


### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [4]:
import re

def pre_processing(user_input):
    # Lowercase the text
    user_input = user_input.lower()
    
    # Remove punctuation
    user_input = re.sub(r'[^\w\s]', '', user_input)
    
    # Remove extra spaces
    user_input = re.sub(r'\s+', ' ', user_input).strip()
    
    return user_input


In [5]:
text = "  Hello, World! This is   amazing...  "
cleaned = pre_processing(text)
print(cleaned)


hello world this is amazing


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [7]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# Download required NLTK data
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

# Initialize stopwords and lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# POS tags we want to keep: nouns, verbs, adjectives
keep_pos = ['NN', 'NNS', 'NNP', 'NNPS',  # Nouns
            'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ',  # Verbs
            'JJ', 'JJR', 'JJS']  # Adjectives

def preprocess_text(text):
    # 1. Lowercase
    text = text.lower()
    
    # 2. Remove punctuation and numbers
    text = re.sub(r'[^a-z\s]', '', text)  # Keep only letters and spaces
    
    # 3. Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    # 4. Tokenize
    words = word_tokenize(text)
    
    # 5. Remove stopwords and short words (<3 characters)
    words = [w for w in words if w not in stop_words and len(w) >= 3]
    
    # 6. Lemmatize all words (default as verbs)
    words = [lemmatizer.lemmatize(w, pos='v') for w in words]
    
    # 7. POS tagging
    pos_tags = nltk.pos_tag(words)
    
    # 8. Keep only nouns, verbs, adjectives
    filtered_words = [w for w, pos in pos_tags if pos in keep_pos]
    
    return filtered_words

# Example usage
sample_text = "The cats are running on the green grass and chasing the mice!"
cleaned_words = preprocess_text(sample_text)
print(cleaned_words)


['cat', 'run', 'green', 'grass', 'chase', 'mice']


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\shama\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\shama\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\shama\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\shama\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?
    
<p style="color:#1F618D; font-weight:bold; font-size:1.1em;">• Why is context memory important in chatbots?</p>
<ul>
  <li><b>Explanation:</b> <span style="color:#117A65;">Context memory lets a chatbot <b>remember previous messages</b> in a conversation. Without it, the bot would treat each message as <b>a brand-new question</b>, making interactions feel <b>disjointed and unnatural</b>.</span></li>
  <li><b>Benefit:</b> <span style="color:#D35400;">With memory, the bot can provide <b>coherent, personalized, and relevant responses</b>.</span></li>
</ul>

<p style="color:#884EA0; font-weight:bold; font-size:1.1em;">• Why should beginners always check API limits and pricing?</p>
<ul>
  <li><b>Explanation:</b> <span style="color:#117A65;">APIs (like Groq, OpenAI, etc.) often <b>charge per request or per token</b> and may have <b>rate limits</b>. Ignoring this can lead to <b>unexpected bills or failed requests</b>.</span></li>
  <li><b>Benefit:</b> <span style="color:#D35400;">Checking limits ensures <b>efficient usage, cost control, and smooth experimentation</b>.</span></li>
</ul>

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.