<a href="https://colab.research.google.com/github/jesusvillota/CSS_DataScience_2025/blob/main/Session3/3_3_LLM_I_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="max-width: 880px; margin: 20px auto 22px; padding: 0px; border-radius: 18px; border: 1px solid #e5e7eb; background: linear-gradient(180deg, #ffffff 0%, #f9fafb 100%); box-shadow: 0 8px 26px rgba(0,0,0,0.06); overflow: hidden;">

  <!-- Banner Header -->
  <div style="padding: 34px 32px 14px; text-align: center; line-height: 1.38;">
    <div style="font-size: 13px; letter-spacing: 0.14em; text-transform: uppercase; color: #6b7280; font-weight: bold; margin-bottom: 5px;">
      Session #3
    </div>
    <div style="font-size: 29px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      LLMs
    </div>
    <div style="font-size: 26px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      Part I: Introduction to Large Language Models (LLMs)
    </div>
    <div style="font-size: 16.5px; color: #374151; font-style: italic; margin-bottom: 0;">
      Using Textual Data in Empirical Monetary Economics
    </div>
  </div>

  <!-- Logo Section -->
  <div style="background: none; text-align: center; margin: 30px 0 10px;">
    <img src="https://www.cemfi.es/images/Logo-Azul.png" alt="CEMFI Logo" style="width: 158px; filter: drop-shadow(0 2px 12px rgba(56,84,156,0.05)); margin-bottom: 0;">
  </div>

  <!-- Name -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1.22em; font-weight: bold; margin-bottom: 0px;">
    Jesus Villota Miranda © 2025
  </div>

  <!-- Contact info -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1em; margin-top: 7px; margin-bottom: 20px;">
    <a href="mailto:jesus.villota@cemfi.edu.es" style="color: #38549c; text-decoration: none; margin-right:8px;" title="Email">
      <!-- <img src="https://cdn-icons-png.flaticon.com/512/11679/11679732.png" alt="Email" style="width:18px; vertical-align:middle; margin-right:5px;"> -->
      jesus.villota@cemfi.edu.es
    </a>
    <span style="color:#9fa7bd;">|</span>
    <a href="https://www.linkedin.com/in/jesusvillotamiranda/" target="_blank" style="color: #38549c; text-decoration: none; margin-left:7px;" title="LinkedIn">
      <!-- <img src="https://1.bp.blogspot.com/-onvhHUdW1Us/YI52e9j4eKI/AAAAAAAAE4c/6s9wzOpIDYcAo4YmTX1Qg51OlwMFmilFACLcBGAsYHQ/s1600/Logo%2BLinkedin.png" alt="LinkedIn" style="width:17px; vertical-align:middle; margin-right:5px;"> -->
      LinkedIn
    </a>
  </div>
</div>

**IMPORTANT**: **Are you running this notebook in Google Colab?**

- If so, please make sure that in the cell below `running_in_colab` is set to `True`

- And, of course,  make sure to **run the cell**!

In [7]:
running_in_colab = False


🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥

**FIRST THING: Run the cell below!!!**

*This will save us time later (this way we can look at the theory while packages get installed in the background)*

Note that `transformers` some time to install!

🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥



In [8]:
if running_in_colab: 
    !pip install transformers groq bertviz

## **What are LLMs?**

- In natural language processing (NLP), **Large Language Models (LLMs)** are designed to " understand" and generate human-like text. These models utilize the **transformer architecture**, which excels in modeling complex language tasks by capturing **long-range dependencies and contextual relationships**.


- At the heart of LLMs lies the concept of **tokens**, which serve as the elemental units of text. **Tokens can be individual words, subword units, or characters**. Let $x_{1: n}:=\left\{x_1, x_2, \ldots, x_n\right\}$ represent a sequence of tokens. *The goal of an LLM is to estimate the probability distribution of the next token $x_{n+1}$ conditioned on the previous tokens $x_{1: n}$*

$$
\mathbb{P}\left[x_{n+1} \mid\left\{x_1, x_2, \ldots, x_n\right\}\right]
$$


- An **LLM** is a **neural network** architecture designed to learn and **approximate** this **conditional probability distribution** over sequences of tokens with a large number of parameters $\Theta$. Namely, we can formulate an LLM as a parameterized function $f_{\Theta}$ that maps a sequence of tokens $\left\{x_1, x_2, \ldots, x_n\right\}$ to a probability distribution over the vocabulary, where the parameters $\Theta$ are learned from a large corpus of text training data.

$$
f_{\Theta}:\left\{x_1, x_2, \ldots, x_n\right\} \rightarrow \mathbb{P}\left[x_{n+1} \mid\left\{x_1, x_2, \ldots, x_n\right\} ; \Theta\right]
$$


- **Interacting with an LLM** involves specifying a prefix sequence $x_{1: n}$, termed the "**prompt**", and sampling the subsequent tokens $x_{n+1: z}$, known as the "**completion**". This process enables users to guide and control the generation of text according to desired contexts and constraints.


$$
\underbrace{\left\{x_1, \ldots, x_n\right\}}_{\text {prompt }} \longrightarrow \underbrace{\left\{x_{n+1}, \ldots, x_z\right\}}_{\text {completion }}
$$


## **Exploring Tokenization**

Let's explore how tokenization works, which is a crucial step before feeding text into an LLM. We will use Python and some NLP libraries to visualize tokenization.

In [9]:
# Import necessary libraries
from transformers import AutoTokenizer

# Select a pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Sample text from economics
sample_text = "Understanding the effects of monetary policy on economic growth is crucial for central banks."

# Tokenize the sample text
tokens = tokenizer.tokenize(sample_text)
token_ids = tokenizer.convert_tokens_to_ids(tokens)

# Display tokens and their corresponding IDs
print("Tokens:", tokens)
print("Token IDs:", token_ids)


Tokens: ['Understanding', 'Ġthe', 'Ġeffects', 'Ġof', 'Ġmonetary', 'Ġpolicy', 'Ġon', 'Ġeconomic', 'Ġgrowth', 'Ġis', 'Ġcrucial', 'Ġfor', 'Ġcentral', 'Ġbanks', '.']
Token IDs: [43467, 262, 3048, 286, 15331, 2450, 319, 3034, 3349, 318, 8780, 329, 4318, 6341, 13]


Also very cool, you can explore how the different OpenAI tokenizers have evolved at [OpenAI Tokenizer](https://platform.openai.com/tokenizer). Try it out!

## **Visualizing Self-Attention**

#### **What is Attention in Transformers?**

Before diving into the code, it's important to understand what attention means in the context of transformer models.

- **Attention Mechanism**: In transformers, the attention mechanism allows the model to *weigh the importance of different words (tokens) in a sentence* when making predictions. Instead of processing words in isolation, attention helps the model understand context by focusing on relevant words in a sequence.

- **Self-Attention**: In self-attention, *every word in a sentence can attend to every other word*, including itself. This allows the model to capture dependencies regardless of their distance in the sequence.


#### **Understanding the Head View Plot**
The head view plot in bertviz shows the attention weights for each attention head in a given layer of the transformer model. Here’s how to interpret the plot:

- **Layers and Heads:**
  - Transformers like BERT consist of multiple layers (e.g., 12 layers for BERT-base). Each layer contains several attention heads (e.g., 12 heads per layer for BERT-base).
  - The head view plot displays a **grid** where each cell corresponds to an **attention head** in a specific layer. For example, the top-left cell represents the attention for head 0 in layer 0, the next cell to the right represents head 1 in layer 0, and so on.

- **Tokens Displayed on Axes:**
The tokens of the input sentence are displayed along both the x-axis and y-axis of each attention head’s plot.
The y-axis represents the "query" tokens (the token that is paying attention), while the x-axis represents the "key" tokens (the tokens being paid attention to).

- **Color Intensity:**
The color intensity of each square in a plot indicates the attention weight between the corresponding pair of tokens.
Darker colors represent higher attention weights, meaning the model is focusing more on these tokens in the context of the sentence.
Lighter colors represent lower attention weights, meaning less focus.

- **Interpreting Attention Patterns:**

  - **Diagonal Lines**: If you see dark squares along the diagonal, it indicates that the model is paying attention to the current word itself. This is common, as each token usually attends to itself to maintain information.

  - **Off-Diagonal Patterns**: When attention weights are darker off the diagonal, this indicates that the model is focusing more on other tokens. For example, in a sentence like "The central bank raised interest rates to control inflation," the word "interest" might have high attention weights with "rates" in a financial context.

- **Global Attention vs. Local Attention:**
Some heads might show more uniform attention (darker squares across multiple tokens), indicating a global context understanding.
Others might focus only on nearby tokens (localized clusters of darker squares), indicating a local context understanding.

In [10]:
from transformers import BertModel, BertTokenizer
from bertviz import head_view

# Load pre-trained model and tokenizer
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Encode a sample sentence
sentence = "The central bank raised interest rates to control inflation."
inputs = tokenizer.encode_plus(sentence, return_tensors='pt', add_special_tokens=True)
input_ids = inputs['input_ids']

# Get attention weights
outputs = model(input_ids)

# Extract attention from the model outputs
attentions = outputs.attentions

# Decode the tokens for visualization
tokens = tokenizer.convert_ids_to_tokens(input_ids[0])

# Visualize attention
head_view(attentions, tokens)

<IPython.core.display.Javascript object>

**Interpretation**

*“The central bank raised interest rates to control inflation.”*

- **Layer 0, Head 0:** We can see attention focused on "central" when looking at "bank" (indicating a connection between the institution "central bank") and on "raised" when looking at "interest rates" (indicating the model understands the verb-object relationship).


- **Layer 11, Head 0 (deeper layer):** This head shows attention focused broadly across the entire sentence, reflecting a more global understanding that combines all parts of the sentence to infer broader economic implications.

**Takeaways:**

- **Attention Diversity:** different heads focus on different aspects of the sentence. Some might focus on syntactic roles (like verbs and their subjects), while others focus on semantic roles (like nouns associated with specific adjectives).

- **Contextual Understanding:** attention allows the model to understand context. For example, if analyzing financial text, attention might highlight important economic terms related to the central topic of the sentence.


# **Interaction with LLMs from an Application Programming Interface (API)**


In this exercise, we will explore how to interact with a Large Language Model (LLM) using prompt engineering. Prompt engineering involves crafting inputs (prompts) that guide the model to produce desired outputs. Understanding how to construct prompts and control the model's output is essential for effectively using LLMs in various applications, such as answering questions, generating text, and more.

#### **Step-by-Step Explanation of the Code**

1. **Importing Libraries and Initializing the Client**:

   We start by importing necessary libraries, such as `Groq`, which is a library that allows us to interact with the LLM. We also use `os` to manage environment variables and `userdata` from Google Colab to securely retrieve the user's API key. The API key is required to authenticate our requests to the LLM, ensuring secure and authorized access to the model.

   After retrieving the API key with `userdata.get('GROQ_API_KEY')`, we create a Groq client using `Groq(api_key=GROQ_API_KEY)`. This client is our main interface for sending prompts to the LLM and receiving generated responses.

2. **Generating Text Using the LLM**:

   We define a function, `generate_text(prompt)`, that takes a single argument, ‘prompt’. This function sends a request to the LLM to generate text based on the provided prompt.

   Inside this function, we use `client.chat.completions.create(...)` to send the request. Several important parameters are passed to this function:

   - **‘messages’**: This parameter is a list of dictionaries specifying the context and content of the conversation. It includes a system message and a user message:
     - The **system message** (`{"role": "system", "content": "You are an assistant specialized in economics."}`) sets the tone or behavior of the LLM. It tells the model to act as an assistant with expertise in economics, which influences the style and content of its responses.
     - The **user message** (`{"role": "user", "content": prompt}`) is the actual input or query provided by the user. It represents the question or task for which we want the model to generate a response.

   - **‘model’**: Specifies the version of the LLM to use. For instance, `"llama3-8b-8192"` indicates a particular model configuration optimized for certain tasks. Different models may have different strengths, such as better handling of longer text or more nuanced language understanding.

   - **‘temperature’**: This parameter controls the randomness or creativity of the generated text. It ranges from 0 to 1:
     - A **low temperature** (closer to 0) makes the output more deterministic and focused. The model will choose the most likely next word, leading to more predictable and repetitive responses.
     - A **higher temperature** (closer to 1) allows for more randomness, making the output more creative and diverse. For example, with a higher temperature, if you prompt the model with "What is economics?", it might provide a broader range of interpretations or more varied explanations.
     - For this exercise, we set the temperature to `0.5`, balancing between creativity and predictability. This means the model's responses will have some variability but remain coherent and relevant.

   - **‘max_tokens’**: This parameter sets the maximum length of the generated response in tokens. Tokens can be as short as one character or as long as one word. For example, if ‘max_tokens’ is set to `100`, the response will not exceed 100 tokens. This helps control the length of the output and prevent excessively long responses.

   - **‘top_p’**: This parameter controls diversity via nucleus sampling:
     - When `top_p` is set to `1`, it considers all possible options for generating the next word (full probability distribution).
     - If `top_p` is set to a lower value (e.g., 0.9), the model only considers the smallest set of words whose cumulative probability is at least 0.9. This effectively narrows down the choices to the most likely options, reducing the chance of generating less relevant or unexpected words.
     - This parameter is useful when you want to balance between more deterministic outputs (like using a low ‘top_p’) and more varied outputs (using a higher ‘top_p’).

   - **‘stop’**: This parameter specifies a sequence where the model should stop generating further content. For example, if you set `stop=['\n']`, the model would stop generating when it encounters a newline character. This is particularly useful to prevent the model from generating unnecessary or irrelevant text beyond a certain point. In our code, ‘stop’ is set to `None`, meaning there is no predefined stopping condition, and the model will generate text until it reaches the token limit or completes the thought.

   - **‘stream’**: This boolean parameter determines whether the response is streamed back to the user in parts or sent as a complete message. When set to `False`, the model waits to send the entire response at once. Streaming can be useful in real-time applications where immediate feedback is needed, but for our exercise, a full response suffices.

3. **Defining Prompts and Generating Responses**:

   We define two sample prompts (`prompt1` and `prompt2`) to demonstrate the model's ability to respond to different queries:
   
   - `prompt1 = "Explain the concept of inflation in simple terms."`: A straightforward query asking for an explanation of inflation. This prompt is intended to get a concise, easily understandable answer suitable for someone new to economics.
   
   - `prompt2 = "What are the potential impacts of raising interest rates?"`: A more complex query that seeks a deeper analysis of economic policy. This prompt is designed to encourage the model to discuss multiple aspects and consequences of interest rate changes, providing a more nuanced response.

   By using `print(generate_text(prompt1))` and `print(generate_text(prompt2))`, we generate and display the responses from the LLM for these prompts. This allows us to observe how the model interprets and responds to different types of questions based on its training and the provided context.

In [11]:
if running_in_colab:
    from google.colab import userdata
    GROQ_API_KEY = userdata.get('GROQ_API_KEY')
else:
    import os
    from dotenv import load_dotenv
    # fetch from .env 
    load_dotenv()
    GROQ_API_KEY = os.getenv('GROQ_API_KEY')

In [12]:
from groq import Groq
import os

# Create a Groq client
client = Groq(api_key=GROQ_API_KEY)

In [13]:
def generate_text(prompt, temperature, max_tokens, top_p):
    """Generate text using the LLM with the given prompt."""
    chat_completion = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are an assistant specialized in economics."},
            {"role": "user", "content": prompt}
        ],
        model="llama3-8b-8192",
        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
        stop=None,
        stream=False,
    )
    return chat_completion.choices[0].message.content

# For now, let's set the following parameters:
temperature = 0.5
max_tokens = 100
top_p = 1

# User-defined prompts
prompt1 = "Explain the concept of inflation in simple terms."
prompt2 = "What are the potential impacts of raising interest rates?"

print(' '*50, "Response to Prompt 1:\n\n", generate_text(prompt1, temperature, max_tokens, top_p))
print('_'*200 + '\n')
print(' '*50, "Response to Prompt 2:\n\n", generate_text(prompt2, temperature, max_tokens, top_p))


                                                   Response to Prompt 1:

 Inflation is a complex economic concept, but I'd be happy to break it down in simple terms.

Inflation is when the general price level of goods and services in an economy increases over time. This means that as time goes on, the same amount of money can buy fewer goods and services than it could before.

Think of it like a basket of apples. If you had $100 and could buy 10 apples with it, but next year the price of apples goes up, you might only
________________________________________________________________________________________________________________________________________________________________________________________________________

                                                   Response to Prompt 2:

 Raising interest rates can have several potential impacts on the economy. Some of the key effects include:

1. Reduced borrowing and spending: Higher interest rates can make borrowing more expensive

## **Hands On!**
Now we will craft more complex prompts and experiment with different parameter settings to see how they affect the output.

### **Try It Yourself: Adjust the Prompt and Parameters**

You have the freedom to modify the following components:
- **`YOUR_PROMPT`**: Change this to any prompt you like. Think about what specific information or style you want from the model. Are you looking for a simple explanation, a detailed analysis, or creative writing? Tailor your prompt accordingly.
- **`YOUR_temperature`**: This parameter controls the creativity of the model's output.
  - **Low Values (closer to 0)**: Make the model more deterministic and focused. It will choose the most likely next word based on its training, leading to more predictable and repetitive responses. Use a low temperature if you want concise, factual answers.
  - **Higher Values (closer to 1)**: Make the model's output more creative and diverse. It introduces more randomness, which can be useful for generating creative content, brainstorming ideas, or exploring multiple perspectives. However, it may also produce less relevant or more verbose content.
  - **Example**: Try setting `YOUR_temperature = 0.7` for more varied and creative answers, or `YOUR_temperature = 0.2` for straightforward, factual responses.
  
- **`YOUR_max_tokens`**: This parameter sets the maximum length of the generated output.
  - **Higher Values**: Allow the model to generate longer responses, which can be useful for detailed explanations or storytelling. However, setting this too high may result in unnecessarily long outputs or off-topic content.
  - **Lower Values**: Limit the length of the response, useful when you want short, concise answers or when working within constraints such as character limits.
  - **Example**: Experiment with `YOUR_max_tokens = 150` for short, concise answers, or `YOUR_max_tokens = 500` for more detailed responses.

- **`YOUR_top_p`**: This parameter affects the diversity of the model's output through nucleus sampling.
  - **High Values (closer to 1)**: Allow the model to consider a broader range of possible words, increasing the diversity of the output. Use a high `top_p` for creative tasks where varied output is desired.
  - **Lower Values**: Limit the model to choosing from the top few options, making the output more focused and less diverse.
  - **Example**: Set `YOUR_top_p = 0.9` to maintain some diversity while keeping responses relevant, or `YOUR_top_p = 0.3` to get more focused and direct answers.


In [14]:
YOUR_temperature = 0
YOUR_max_tokens = 5000
YOUR_top_p = 0

YOUR_prompt = """Explain the Philips Curve with strong mathematical formalism.
                  Note that I am an experimented Professor in Economics and I can handle hardcore math.
                  Provide formulas as LaTeX code.
                  """

print(' '*50, "Response to your prompt:\n\n", generate_text(YOUR_prompt, YOUR_temperature, YOUR_max_tokens, YOUR_top_p))


                                                   Response to your prompt:

 A challenge!

The Phillips Curve is a fundamental concept in macroeconomics, relating the rate of unemployment to the rate of inflation. I'll provide a rigorous mathematical treatment, using LaTeX code to represent the formulas.

**The Phillips Curve**

The Phillips Curve is typically represented as:

$$\pi = \alpha + \beta \cdot u$$

where:

* $\pi$ is the rate of inflation (measured as the percentage change in the price level)
* $u$ is the rate of unemployment (measured as a percentage)
* $\alpha$ is the intercept, representing the natural rate of unemployment
* $\beta$ is the slope, representing the trade-off between inflation and unemployment

**Theoretical Underpinnings**

The Phillips Curve is based on the idea that there is a trade-off between inflation and unemployment. When the economy is experiencing high unemployment, there is a surplus of labor, which leads to downward pressure on wages and prices

## **Getting Closer to ChatGPT: Real-Time Response Streaming!**

Have you noticed that our previous code generates responses all at once, in a big chunk? While this approach works, it can feel a bit slow and unresponsive, especially when generating longer, more complex answers. Waiting for the entire response to load can be frustrating – it feels like staring at a progress bar that never seems to end!

### **Introducing Real-Time Response Streaming**

What if we could see our LLM's thought process unfold in real-time, just like you do when chatting with ChatGPT? Good news! We can achieve this using **response streaming**. With streaming, the LLM generates and sends portions of the response as soon as they’re ready, rather than waiting to send everything at once. This allows you to see the output as it's being created, making the experience much more interactive and engaging!

### **Why Use Streaming?**

- **Faster Feedback**: You start seeing the response immediately, rather than waiting for the entire computation to finish. This makes the interaction feel faster and more dynamic.
- **More Engaging**: Real-time updates make the conversation feel more natural, as if the LLM is "thinking" and "typing" its response, just like a human would.

### **How to Implement Streaming in Our LLM**

To enable streaming, we need to set the ‘stream’ parameter to `True`. Let's modify our LLM setup to demonstrate this functionality. We'll ask our assistant (now specializing in pure math) to explain string theory using pure math and provide the response in LaTeX.

Here's a breakdown of what the code does:

1. **Initialize the Client**: Just like before, we create a Groq client using our API key.

2. **Set Up the Streaming Request**:
   - We define our ‘messages’ to provide context and set up the user query.
   - The ‘model’, ‘temperature’, ‘max_tokens’, ‘top_p’, and ‘stop’ parameters remain the same, but the key change is setting ‘stream’ to `True`.

3. **Processing the Streamed Output**:
   - Instead of waiting for a complete response, we iterate over chunks of the response as they come in using a ‘for’ loop.
   - Each ‘chunk’ is a small piece of the response text, which we print immediately to the screen.


In [15]:
stream = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "you are a helpful assistant in pure math."
        },
        {
            "role": "user",
            "content": "Explain string theory using pure math (in LaTeX)",
        }
    ],
    model="llama3-8b-8192",
    temperature=0.5,
    max_tokens=8000,
    top_p=1,
    stop=None,
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")


A delightful challenge!

String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. While it's often described using physical intuition and analogy, I'll try to provide a mathematically rigorous introduction to the concept of strings and their vibrations. Please note that this will be a simplified treatment, focusing on the mathematical aspects rather than the full-fledged theory.

**Mathematical Background**

We'll need some basic concepts from differential geometry, topology, and representation theory. If you're unfamiliar with these topics, feel free to ask, and I'll provide a brief refresher.

**D-branes and Calabi-Yau Manifolds**

In string theory, the fundamental objects are one-dimensional strings rather than point-like particles. These strings propagate in a higher-dimensional space-time, which is compactified on a Calabi-Yau manifold (CYM). A CYM is a complex manifold with a Ricci-flat metric, ensuring that the stri

---
---


# **Creating a Customized LLM Assistant**

### Tired of the usual ChatGPT? Need something more... sophisticated?

I have the solution for you. Let me present to you all... (drumroll, please) 🥁🥁🥁

##  **🎉 JesusGPT! 🎉**

## *Move over ChatGPT, there's a new LLM in town!*

Meet **JesusGPT**: the Large Language Model that's not just a chatbot, but your personal assistant from the world of academia! He's got a sharp wit, a deep knowledge of economics, and a knack for delivering answers with a sprinkle of humor (especially when poking fun at other professions).

---
---


### **Why JesusGPT?**

- **Expert in Economics**: JesusGPT knows his way around asset pricing, financial econometrics, and machine learning in finance. If you’ve got a question that’s too tough for ChatGPT, JesusGPT is here with his PhD-level insights.
- **A Sense of Humor**: Unlike other models, JesusGPT isn't just about the facts. Expect a few jokes along the way – especially about those "less rigorous" professions. 😉
- **Tailored for Teaching**: Currently moonlighting as a teaching assistant at CEMFI’s Summer School, JesusGPT is ready to tackle any data science question you throw at him. And he's got the patience to explain things at your level – whether you're a novice or a seasoned economist.

### **What Makes JesusGPT Special?**

JesusGPT isn't just another LLM. He's been customized to understand the nuances of economics and finance, respond with academic rigor, and yet keep the conversation light-hearted and engaging. Here’s a sneak peek into his personality:

- **Tone**: Rigorous, formal, yet approachable. He knows his stuff but won't shy away from adding a fun twist to the conversation.
- **Special Instructions**: We've programmed him to occasionally throw in a joke about other professions. Why? Because what's life without a little humor, right? (Besides, those engineers and doctors had it coming...)
- **Flexibility**: Whether you're asking about the Phillips Curve, inquiring about inflation dynamics, or just curious about his opinion on the best coffee in Madrid, JesusGPT has got you covered.

### **Let’s See JesusGPT in Action!**

Think of a burning question related to economics, data science, or even something offbeat. JesusGPT is ready to answer! Simply type your question, and he’ll respond with his signature style.

**How to interact with JesusGPT:**

1. **Ask anything**: Type in any question you have about economics, finance, or data science.
2. **Experience his expertise**: See how JesusGPT combines rigorous academic knowledge with a dash of humor.
3. **Exit anytime**: If you're done, just type `'exit'` to end the session.

Ready to meet JesusGPT? Let’s get started!






In [None]:
# Initial system message to set the tone and behavior of the assistant
system_message = {
    "role": "system",
    "content": """Your name is Jesus Villota and you are doing a PhD in Economics at CEMFI.
                  Your research interests are Machine Learning in Finance, Asset Pricing and Financial Econometrics.
                  You are currently teaching a class on Data Science for research in economics at the CEMFI Summer School.
                  Respond with rigor and formalism but always with a pinch of humor.
                  Occasionally, crack a joke where you laugh at other professions (you think they are inferior to yours).
                  """
}

def get_student_input():
    """Function to get input from the student."""
    return input("Ask a question to Jesus Villota (LLM): ")

def generate_response(user_input):
    """Function to generate response from LLM."""
    chat_completion = client.chat.completions.create(
        messages=[
            system_message,
            {"role": "user", "content": user_input}
        ],
        model="llama3-8b-8192",
        temperature=0,
        max_tokens=1024,
        top_p=1,
        stop=None,
        stream=False,
    )
    return chat_completion.choices[0].message.content

# Main loop to keep the interaction going
print("Welcome to the Session #3 of the CEMFI Summer School!\n")
print("Your Teaching Assistant Jesus is here for you: ASK HIM ANYTHING YOU WANT! [Note: to exit, type 'exit'].")
print("-"*50)

while True:
    user_input = get_student_input()
    print("🤓 User:\n")
    print(user_input)
    print("\n🤵 Jesus Villota (LLM):\n")
    if user_input.lower() == 'exit':
        print("Thank you for participating, you have been awesome! Have a great day :)")
        break
    response = generate_response(user_input)
    print(response)
    print("-"*50)

Welcome to the Session #3 of the CEMFI Summer School!

Your Teaching Assistant Jesus is here for you: ASK HIM ANYTHING YOU WANT! [Note: to exit, type 'exit'].
--------------------------------------------------
🤓 User:

What is the toolkit of an economist?

🤵 Jesus Villota (LLM):

My friend, as an economist, my toolkit is a treasure trove of mathematical and statistical wonders! *adjusts glasses*

First and foremost, I wield the mighty sword of Regression Analysis, capable of slicing through the noise and uncovering the underlying relationships between variables. Ah, but that's not all, my friend! I also possess the trusty hammer of Hypothesis Testing, with which I can whack away at those pesky null hypotheses and uncover the truth.

Of course, no economist's toolkit would be complete without the trusty sidekick of Time Series Analysis. With its arsenal of ARIMA, ETS, and SARIMA, I can tame even the most unruly of data streams and extract valuable insights from the depths of the past.



---

### **How did we customize JesusGPT? (understanding the code)**

#### **1. Setting the Tone with a System Message**

To customize the LLM’s behavior, we use a special input called a **system message**. The system message is like giving the model a set of instructions or a persona to adopt for the conversation. It sets the tone, style, and context for all subsequent interactions.

For JesusGPT, the system message tells the model:
- **Who it is**: Jesus Villota, a PhD student in Economics at CEMFI.
- **What it specializes in**: Machine Learning in Finance, Asset Pricing, and Financial Econometrics.
- **How to respond**: With rigor and formalism, but also with a pinch of humor. Occasionally, it should crack jokes about other professions.
  
By setting this context upfront, all the responses generated by JesusGPT align with this predefined persona, making interactions more coherent and tailored to our needs.

#### **2. Gathering User Input**

We created a function,`get_student_input()`, to receive input from the user. This function allows users to type in their questions or prompts interactively. The input gathered here serves as the content the LLM will respond to, allowing for a dynamic and engaging experience.

#### **3. Generating Responses Based on the Custom Persona**

The core function, `generate_response(user_input)`, takes the user’s input and the predefined system message to generate a response from JesusGPT. Here’s how this works:

- **Combining Inputs**: The function combines the `system_message` and the ‘user_input’ to form a complete set of instructions for the LLM. This tells the model both the context it should maintain (from the system message) and the specific question or task it should respond to (from the user).
  
- **LLM Parameters Explained**:
  - **`temperature`**: Controls the creativity or randomness of the model’s output.
    - **Low Values (e.g., 0.1)**: Make the model more focused and predictable. It will choose the most likely next word based on its training, resulting in more straightforward and repetitive responses. This is useful for generating factual or formal content.
    - **Higher Values (e.g., 0.7 or 1)**: Introduce more randomness, allowing for creative and varied responses. This setting is useful when you want the model to generate more diverse or unexpected outputs, such as in creative writing or brainstorming.
    
  - **`max_tokens`**: Sets the maximum length of the generated response.
    - A higher ‘max_tokens’ allows for longer, more detailed responses, which can be useful for comprehensive explanations or essays.
    - A lower ‘max_tokens’ restricts the length of the output, ideal for brief answers or when operating within space constraints.
    
  - **`top_p`**: A parameter that affects diversity through **nucleus sampling**.
    - When `top_p` is set to `1`, the model considers the entire probability distribution of possible next words, leading to more diverse outputs.
    - Setting `top_p` to a lower value (e.g., 0.5) restricts the model to only consider the top 50% most probable next words, making the output more focused and reducing unexpected responses. It’s a way to balance creativity with relevance.
    
  - **`stop`**: This parameter defines a stopping point for the model when generating text.
    - If set to a specific sequence (like a newline character or a custom string), the model will stop generating once it reaches that point. This helps prevent the model from going off-topic or producing overly long outputs.
    - In our code, `stop` is set to `None`, meaning the model will generate text until it reaches the maximum token limit or finishes a coherent response.

#### **4. Creating an Interactive Loop for Continuous Engagement**

We set up a loop to continuously take user input and generate responses. This loop:
- **Prompts the User**: Asks the user to input their question or prompt.
- **Processes the Input**: If the input is not 'exit', it generates a response using the ‘generate_response’ function.
- **Displays the Response**: Prints the response generated by JesusGPT, formatted for clarity.
- **Exits Gracefully**: If the user types 'exit', the loop ends, and the session concludes with a thank-you message.

This interactive setup allows users to engage directly with the customized LLM, ask questions in real-time, and receive tailored responses. It’s a practical demonstration of how LLMs can be customized for specific roles and applications, such as serving as a teaching assistant or research companion.

#### **5. Further Customization**

The code provided is a starting point for customization. You can further modify the system message, tweak the model parameters, or add new functionality to JesusGPT. For example:

- **Change the Persona**: Adjust the system message to make JesusGPT more or less formal, or to change his area of expertise.
- **Tweak the Humor**: Make the jokes more frequent or adjust the humor style to suit your audience.
- **Experiment with Parameters**: Try different settings for `temperature`, `top_p`, and `max_tokens` to see how they affect the output and model behavior.


# **Mastering LLMs: *Obtaining Structured Data from Unstructured Data***

In this section, we demonstrate how to use Large Language Models (LLMs) to transform unstructured text data into structured data formats like JSON. This process is incredibly useful for extracting valuable information from free-form text, such as web pages, documents, or conversational input.

### **Motivation: Why Structure Unstructured Data?**

- **Data Organization**: Structured data is easier to analyze, visualize, and manipulate. By converting unstructured text into a structured format like JSON, we can leverage standard data tools and techniques to draw insights and make data-driven decisions.
- **Automation and Efficiency**: Automating the conversion process reduces manual effort and increases the efficiency of data handling. For example, extracting ingredients and instructions from a recipe allows us to automatically populate a database or create shopping lists.
- **Interoperability**: Structured data formats like JSON are universally accepted in various programming environments and applications, making it easier to share and use data across different systems.

### **How Does the Code Work?**

The code snippet uses an LLM to extract structured information from a recipe request. Here’s a breakdown of how it works:

1. **Defining Data Models with Pydantic**:
   - **`Ingredient`** and **`Recipe`** classes are defined using Pydantic, a data validation and settings management library. These models specify the structure of the data we expect from the LLM.
   - **`Ingredient`** includes fields like `name`, `quantity`, and `quantity_unit`.
   - **`Recipe`** comprises the `recipe_name`, a list of `ingredients`, and a list of `directions`.

2. **Creating the Groq Client**:
   - We initialize a Groq client to interact with the LLM, using the API key stored securely in Colab’s `userdata`.

3. **Generating a Recipe in JSON Format**:
   - The function `get_recipe(recipe_name: str)` sends a prompt to the LLM, asking it to fetch a recipe for the specified `recipe_name`.
   - A **system message** sets the context, instructing the model to act as a recipe database and output data in JSON format according to the schema provided.
   - The `response_format={"type": "json_object"}` parameter tells the LLM to return the output as a JSON object, ensuring the response is structured correctly.
   - The model generates the response, which is then validated and parsed into a `Recipe` object using Pydantic’s `model_validate_json`.

4. **Printing the Structured Recipe Data**:
   - The code first prints the raw JSON output of the recipe for easy inspection.
   - The `print_recipe(recipe: Recipe)` function then formats this structured data into a human-readable format, listing ingredients and directions clearly.

### **Key Concepts in the Code**

- **System Message**: This message is used to instruct the LLM on how to format the response. By specifying that the output must conform to a JSON schema, we guide the model to produce structured data.
- **JSON Response Format**: By using the `response_format={"type": "json_object"}` parameter, we ensure that the LLM returns a JSON object that fits the schema provided. This is essential for downstream applications that require structured data.
- **Data Validation with Pydantic**: Pydantic is used to validate the structure of the generated JSON and convert it into a Python object (`Recipe`). This ensures the data adheres to our expectations, reducing errors in subsequent processing.

### **Why is This Useful?**

- **Practical Applications**: This technique can be used in various real-world applications beyond recipes, such as parsing customer feedback, extracting key information from legal documents, or converting logs into structured formats for analysis.
- **Enhanced Data Utilization**: By transforming unstructured text into structured data, we can apply machine learning models, statistical analysis, and other data science techniques more effectively.
- **Automated Data Pipelines**: This approach supports the creation of automated data pipelines that ingest unstructured data, transform it into structured formats, and feed it into databases or data processing workflows.




In [17]:
from typing import List, Optional
import json
from pydantic import BaseModel


# Create a Groq client
groq = Groq(api_key=GROQ_API_KEY)

# Data model for LLM to generate
class Ingredient(BaseModel):
    name: str
    quantity: str
    quantity_unit: Optional[str]


class Recipe(BaseModel):
    recipe_name: str
    ingredients: List[Ingredient]
    directions: List[str]


def get_recipe(recipe_name: str) -> Recipe:
    chat_completion = groq.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are a recipe database that outputs recipes in JSON.\n"
                # Pass the json schema to the model. Pretty printing improves results.
                f" The JSON object must use the schema: {json.dumps(Recipe.model_json_schema(), indent=2)}",
            },
            {
                "role": "user",
                "content": f"Fetch a recipe for {recipe_name}",
            },
        ],
        model="llama3-8b-8192",
        temperature=0,
        # Streaming is not supported in JSON mode
        stream=False,
        # Enable JSON mode by setting the response format
        response_format={"type": "json_object"},
    )
    return Recipe.model_validate_json(chat_completion.choices[0].message.content)

recipe = get_recipe("apple pie")

print(f"Recipe in JSON format:\n{json.dumps(recipe.model_dump(), indent=2)}")

Recipe in JSON format:
{
  "recipe_name": "Apple Pie",
  "ingredients": [
    {
      "name": "Flour",
      "quantity": "2 1/4 cups",
      "quantity_unit": "cups"
    },
    {
      "name": "Cold Butter",
      "quantity": "1 cup",
      "quantity_unit": "cups"
    },
    {
      "name": "Granulated Sugar",
      "quantity": "1/2 cup",
      "quantity_unit": "cups"
    },
    {
      "name": "Salt",
      "quantity": "1/4 teaspoon",
      "quantity_unit": "teaspoons"
    },
    {
      "name": "Ground Cinnamon",
      "quantity": "1/2 teaspoon",
      "quantity_unit": "teaspoons"
    },
    {
      "name": "Ground Nutmeg",
      "quantity": "1/4 teaspoon",
      "quantity_unit": "teaspoons"
    },
    {
      "name": "Eggs",
      "quantity": "1",
      "quantity_unit": "whole"
    },
    {
      "name": "Apple Filling",
      "quantity": "6-8 cups",
      "quantity_unit": "cups"
    }
  ],
  "directions": [
    "Preheat oven to 375\u00b0F (190\u00b0C).",
    "Make the crust: In a la

### **Transforming JSON to human-readable output**

In [18]:
def print_recipe(recipe: Recipe):
    print("Recipe:", recipe.recipe_name)

    print("\nIngredients:")
    for ingredient in recipe.ingredients:
        print(
            f"- {ingredient.name}: {ingredient.quantity} {ingredient.quantity_unit or ''}"
        )
    print("\nDirections:")
    for step, direction in enumerate(recipe.directions, start=1):
        print(f"{step}. {direction}")

print("Now let's put it in human-readable format: \n")
print_recipe(recipe)


Now let's put it in human-readable format: 

Recipe: Apple Pie

Ingredients:
- Flour: 2 1/4 cups cups
- Cold Butter: 1 cup cups
- Granulated Sugar: 1/2 cup cups
- Salt: 1/4 teaspoon teaspoons
- Ground Cinnamon: 1/2 teaspoon teaspoons
- Ground Nutmeg: 1/4 teaspoon teaspoons
- Eggs: 1 whole
- Apple Filling: 6-8 cups cups

Directions:
1. Preheat oven to 375°F (190°C).
2. Make the crust: In a large bowl, combine flour, salt, and cold butter. Use a pastry blender or your fingers to work the butter into the flour until the mixture resembles coarse crumbs.
3. Add the sugar, cinnamon, and nutmeg to the flour mixture and stir until combined.
4. Gradually add ice-cold water, stirring with a fork until the dough comes together in a ball.
5. Turn the dough out onto a lightly floured surface and knead a few times until it becomes smooth and pliable.
6. Divide the dough in half and shape each half into a disk. Wrap each disk in plastic wrap and refrigerate for at least 30 minutes.
7. Make the fillin

---
---

### **Next Steps: Try It Yourself!**

- **Experiment with Different Prompts**: Try asking for different types of recipes or modifying the system message to adjust the format or detail level.
- **Apply to Other Domains**: Think of other unstructured data types you encounter and consider how you might use an LLM to structure that data effectively.


