<a href="https://colab.research.google.com/github/jesusvillota/CSS_DataScience_2025/blob/main/Session3/3_4_LLM_II_Function_Calling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="max-width: 880px; margin: 20px auto 22px; padding: 0px; border-radius: 18px; border: 1px solid #e5e7eb; background: linear-gradient(180deg, #ffffff 0%, #f9fafb 100%); box-shadow: 0 8px 26px rgba(0,0,0,0.06); overflow: hidden;">

  <!-- Banner Header -->
  <div style="padding: 34px 32px 14px; text-align: center; line-height: 1.38;">
    <div style="font-size: 13px; letter-spacing: 0.14em; text-transform: uppercase; color: #6b7280; font-weight: bold; margin-bottom: 5px;">
      Session #3
    </div>
    <div style="font-size: 29px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      LLMs
    </div>
    <div style="font-size: 26px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      Part II: Function Calling
    </div>
    <div style="font-size: 16.5px; color: #374151; font-style: italic; margin-bottom: 0;">
      Using Textual Data in Empirical Monetary Economics
    </div>
  </div>

  <!-- Logo Section -->
  <div style="background: none; text-align: center; margin: 30px 0 10px;">
    <img src="https://www.cemfi.es/images/Logo-Azul.png" alt="CEMFI Logo" style="width: 158px; filter: drop-shadow(0 2px 12px rgba(56,84,156,0.05)); margin-bottom: 0;">
  </div>

  <!-- Name -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1.22em; font-weight: bold; margin-bottom: 0px;">
    Jesus Villota Miranda © 2025
  </div>

  <!-- Contact info -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1em; margin-top: 7px; margin-bottom: 20px;">
    <a href="mailto:jesus.villota@cemfi.edu.es" style="color: #38549c; text-decoration: none; margin-right:8px;" title="Email">
      <!-- <img src="https://cdn-icons-png.flaticon.com/512/11679/11679732.png" alt="Email" style="width:18px; vertical-align:middle; margin-right:5px;"> -->
      jesus.villota@cemfi.edu.es
    </a>
    <span style="color:#9fa7bd;">|</span>
    <a href="https://www.linkedin.com/in/jesusvillotamiranda/" target="_blank" style="color: #38549c; text-decoration: none; margin-left:7px;" title="LinkedIn">
      <!-- <img src="https://1.bp.blogspot.com/-onvhHUdW1Us/YI52e9j4eKI/AAAAAAAAE4c/6s9wzOpIDYcAo4YmTX1Qg51OlwMFmilFACLcBGAsYHQ/s1600/Logo%2BLinkedin.png" alt="LinkedIn" style="width:17px; vertical-align:middle; margin-right:5px;"> -->
      LinkedIn
    </a>
  </div>
</div>

**IMPORTANT**: **Are you running this notebook in Google Colab?**

- If so, please make sure that in the cell below `running_in_colab` is set to `True`

- And, of course,  make sure to **run the cell**!

In [2]:
running_in_colab = False

## What we'll be doing here (big picture)

 

- Goal: show how “function calling” lets an LLM return structured data by invoking your Python functions with validated arguments.

- We’ll build two small applications:

  1) News → firm-level shocks: from a Spanish article, extract affected Spanish listed firms and classify shock type/magnitude/direction.

  2) Constitution articles → camera-implied tasks: decide whether a constitutional article assigns/impllies tasks to Congress, Senate, both, or neither.

- Core pattern you’ll see in both:

  1) Define a tool schema (name, description, JSON parameters) for the function you want the model to call.

  2) Ask the LLM with tools enabled; if it decides to call your tool, it provides JSON args matching the schema.

  3) Validate/execute your local Python function; return its result to the model as a tool response.

  4) Make a second model call so the LLM can integrate the tool’s structured output into a final answer.

In [3]:
import os
import json
import pandas as pd

if running_in_colab:
    ! pip install groq
    from google.colab import userdata
    api_key = userdata.get('GROQ_API_KEY')
else:
    import os
    from dotenv import load_dotenv
    load_dotenv()
    api_key = os.getenv('GROQ_API_KEY')

In [4]:
from groq import Groq
client = Groq(api_key=api_key)
MODEL = 'llama3-70b-8192'

<div style="max-width: 880px; margin: 20px auto 22px; padding: 20px 32px; border-radius: 18px; border: 1px solid #e5e7eb; background: linear-gradient(180deg, #ffffff 0%, #f9fafb 100%); box-shadow: 0 8px 26px rgba(0,0,0,0.06); overflow: hidden; text-align: center;">

  <div style="font-size: 20px; font-weight: 800; color: #14276c; margin-bottom: 8px;">
    Application #1)
  </div>
  <div style="font-size: 18px; font-weight: 700; color: #374151; font-style: italic;">
    Extracting & categorizing news-implied firm-specific shocks
  </div>
  
</div>


This notebook replicates the methodology in [CEMFI Working Paper 2501](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5006857).

In [4]:
def news_parser(firms):
    response = []
    for firm in firms:
        response.append({
            "firm": firm["firm"],
            "ticker": firm.get("ticker", ""),
            "shock_type": firm.get("shock_type", ""),
            "shock_magnitude": firm.get("shock_magnitude", ""),
            "shock_direction": firm.get("shock_direction", ""),
        })
    return response

def run_conversation(user_prompt):
    # Step 1: send the conversation and available functions to the model
    messages = [
        {
            "role": "system",
            "content":  f"""
                            You are a function calling LLM that analyses business news in Spanish. 
                        """
        },
        {
            "role": "user",
            "content": user_prompt,
        }
    ]
    
    tools = [
        {
            "type": "function",
            "function": {
                "name": "news_parser",
                "description": f"""
                                    For every article, you must identify the firms directly affected by the news. Do not include every firm mentioned in the article, only include those that are directly affected by the shocks narrated therein. 
                                    The identified firms must be Spanish and should be publicly listed in the Spanish exchange (their ticker is of the form 'TICKER.MC'). Do not include non-Spanish foreign firms. Do not include Spanish firms that are not publicly traded.
                                    For each identified firm, classify the shocks that affect them (type, magnitude, category). The type of shock can be 'demand', 'supply', 'financial', 'policy', or 'technology'. The magnitude can be 'minor' or 'major'. The direction can be 'positive' or 'negative'.
                                    If a firm is neutral to the article, do NOT include it in the analysis.
                                """,
                "parameters": {
                    "type": "object",
                    "properties": {
                        "firms": {
                            "type": "array",
                            "description": f"""
                                                List the Spanish firms impacted by the reported news. Such firms must be publicly listed in the Spanish stock exchange and have a stock market ticker of the form TICKER.MC. 
                                                Foreign firms (not listed in the Spanish exchange and whose ticker is not TICKER.MC) are not to be included here. Do not include firms that are mentioned just for contextual comparison but are not directly affected by the events described in the article.
                                                If a firm is neutral to the article, do not include it in the list.
                                                Some times the article mentions explicitly the Spanish ticker of those firms that are directly affected (and hence, the firms to include here). e.g: Iberdrola (IBE.MC).
                                            """,
                            "items": {
                                "type": "object",
                                "properties": {
                                    "firm": {
                                        "type": "string",
                                        "description": "State the Spanish firm (within the list 'firms') in which you will focus the analysis. This firm should be publicly traded in the Spanish exchange with a ticker of the form 'TICKER.MC'. ",
                                    },
                                    "ticker": {
                                        "type": "string",
                                        "description": "Specify its stock market ticker of the Spanish firm in Yahoo Finance format (note that Spanish firms' tickers end with '.MC', e.g., ITX.MC for Inditex, ACX.MC for Acerinox, SAN.MC for Banco Santander, NTGY.MC for Naturgy).",
                                    },
                                    "shock_type": {
                                        "type": "string",
                                        "enum": ["demand", "supply", "financial", "policy", "technology"],
                                        "description": "Classify the type of shock implied by the news article. Choose 'demand' for events impacting consumer demand, 'supply' for events affecting the supply of goods or services, 'financial' for events related to financial markets or conditions, 'policy' for events stemming from changes in government policies or regulations, and 'technology' for events resulting from significant technological advancements or disruptions.",
                                    },
                                    "shock_magnitude": {
                                        "type": "string",
                                        "enum": ["minor", "major"],
                                        "description": "How strong do you expect the shock to be: 'minor' or 'major'?",
                                    },
                                    "shock_direction": {
                                        "type": "string",
                                        "enum": ["positive", "negative"],
                                        "description": f"""
                                                        In what direction do you expect the shock to affect this firm? Choose one of the available options: 'positive' or 'negative'.
                                                        Choose 'positive' for beneficial impacts and 'negative' for adverse impacts.
                                                        Do not state 'neutral' here. If the firm is neutral to the article, do not include it in the list of firms.
                                                        """,
                                    },
                                },
                                "required": ["firm"],
                            },
                        },
                    },
                    "required": ["firms"],
                },
            },
        },
    ]

    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools,
        tool_choice="auto",
        max_tokens=4096
    )

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        
        # Step 3: call the function
        available_functions = {
            "news_parser": news_parser,
        }
        messages.append(response_message)  # extend conversation with assistant's reply
        
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                firms=function_args.get("firms")
            )
            messages.append(
                {
                    "role": "function",
                    "name": function_name,
                    "content": json.dumps(function_response),
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model=MODEL,
            messages=messages
        )  # get a new response from the model where it can see the function response
        
        return second_response.choices[0].message.content, function_response


user_prompt = f"""
Cellnex tendrá más competencia en Europa.  La filial de Telefónica (TEF.MC) Telxius Telecom ha acordado vender su división de torres de telecomunicaciones en Europa y Latinoamérica a American Tower (AMT), lo cual aumentará la presencia de ésta en Europa e incrementará la competencia para el grupo español de telecomunicaciones inalámbricas Cellnex Telecom (CLNX.MC), señala Equita Sim. La transacción "supone la entrada de un nuevo operador independiente de torres en el mercado español y potencialmente más competencia para el crecimiento futuro también en el mercado europeo", sostiene la correduría. Cellnex llegó a un acuerdo en noviembre con CK Hutchison (0001.HK) para comprar el negocio europeo de torres y sus activos del conglomerado cotizado en Hong Kong. La acción de Telefónica sube un 9,6% a EUR3,94 y la de Cellnex avanza un 0,3% a EUR47,79.
"""

completion_text, structured_output = run_conversation(user_prompt)
print("Completion Text:", completion_text)
print("Structured Output:", structured_output)


Completion Text: The function call has been successful. The output is a list of two dictionaries, each representing a firm mentioned in the news article.

The first dictionary corresponds to Cellnex Telecom, indicating that the firm is expected to experience a negative supply shock of minor magnitude, which means that the company may face increased competition in the European market.

The second dictionary corresponds to Telefónica, indicating that the firm is expected to experience a positive financial shock of minor magnitude, which means that the sale of its tower division to American Tower is likely to have a beneficial impact on its financial performance.

This analysis can be useful for investors, analysts, and other stakeholders to assess the potential impact of this news on the stock prices and business prospects of these companies.
Structured Output: [{'firm': 'Cellnex Telecom', 'ticker': 'CLNX.MC', 'shock_type': 'supply', 'shock_magnitude': 'minor', 'shock_direction': 'negati

In [5]:
structured_output

[{'firm': 'Cellnex Telecom',
  'ticker': 'CLNX.MC',
  'shock_type': 'supply',
  'shock_magnitude': 'minor',
  'shock_direction': 'negative'},
 {'firm': 'Telefónica',
  'ticker': 'TEF.MC',
  'shock_type': 'financial',
  'shock_magnitude': 'minor',
  'shock_direction': 'positive'}]

<div style="max-width: 880px; margin: 20px auto 22px; padding: 20px 32px; border-radius: 18px; border: 1px solid #e5e7eb; background: linear-gradient(180deg, #ffffff 0%, #f9fafb 100%); box-shadow: 0 8px 26px rgba(0,0,0,0.06); overflow: hidden; text-align: center;">

  <div style="font-size: 20px; font-weight: 800; color: #14276c; margin-bottom: 8px;">
    Application #2
  </div>
  <div style="font-size: 18px; font-weight: 700; color: #374151; font-style: italic;">
    Extracting the camera-implied task in each article of Argentina's Constitution (Extra)
  </div>
  
</div>


In [7]:
import PyPDF2
import re

def extract_text_from_pdf(file_path):
    with open(file_path, "rb") as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text

def split_into_articles(text):
    # Normalize text to lowercase to capture "articulo" consistently
    text = text.lower()

    # Find the start index of "disposiciones transitorias" to ignore it and anything after
    disp_trans_index = text.find("disposiciones transitorias")
    if disp_trans_index != -1:
        text = text[:disp_trans_index]

    # Use regex to split on "articulo X" where X is a number, capturing each article
    # The pattern captures the article header and the content until the next article or end of text
    pattern = re.compile(r"(artículo \d+.*?)(?=artículo \d+|$)", re.DOTALL)

    articles = pattern.findall(text)

    # Strip whitespace from articles
    articles = [article.strip() for article in articles if article.strip()]

    return articles

In [None]:
# Read the constitution (stored in Session3/docs/constitucion.pdf)
file_path = "docs/constitucion.pdf"
full_text = extract_text_from_pdf(file_path)

# Separate the full text (string) into a list of articles (strings)
articles_list = split_into_articles(full_text)
articles_list

['artículo 1.- la nación argentina adopta para su gob ierno la forma representativa republicana \nfederal, según lo establece la presente constitució n.',
 'artículo 2.- el gobierno federal sostiene el culto católico apostólico romano.',
 'artículo 3.- las autoridades que ejercen el gobiern o federal, residen en la ciudad que se \ndeclare capital de la república por una ley especia l del congreso, previa cesión hecha por una o \nmas legislaturas provinciales, del territorio que h aya de federalizarse.',
 'artículo 4.- el gobierno federal provee a los gasto s de la nación con los fondos del tesoro \nnacional formado del producto de derechos de import ación y exportación, del de la venta o \nlocación de tierras de propiedad nacional, de la re nta de correos, de las demás contribuciones \nque equitativa y proporcionalmente a la población i mponga el congreso general, y de los \nempréstitos y operaciones de crédito que decrete el  mismo congreso para urgencias de la \nnación, o para empres

## Function Calling: Camera‑Implied Task Classification (Argentina’s Constitution)

We define a function-calling schema to classify, for each article of Argentina’s Constitution, whether it assigns or implies a task/role to: (a) Congress, (b) Senate, (c) both Congress and Senate, or (d) neither. The tool returns a structured output with the target, whether it’s explicit or inferred, an evidence snippet, and a short rationale.


In [9]:
def camera_task_parser(items):
    def _norm_target(val: str) -> str:
        v = (val or "").strip().lower()
        if v in {
            "congress", "congreso", "congreso de la nación", "congreso de la nacion",
            "diputados", "cámara de diputados", "camara de diputados",
        }:
            return "congress"
        if v in {
            "senate", "senado", "cámara de senadores", "camara de senadores",
        }:
            return "senate"
        if v in {
            "both", "ambas", "ambos", "ambas cámaras", "ambas camaras",
            "congress and senate", "congreso y senado", "senado y congreso",
            "ambas cámaras del congreso", "ambas camaras del congreso",
        }:
            return "both"
        if v in {"neither", "ninguna", "ninguno", "none", "no aplica"}:
            return "neither"
        return "neither"

    def _norm_eoi(val: str) -> str:
        v = (val or "").strip().lower()
        if v in {"explicit", "explícita", "explicita", "expreso", "expresa"}:
            return "explicit"
        return "inferred"

    parsed = []
    for it in items:
        tgt = _norm_target(it.get("target"))
        eoi = _norm_eoi(it.get("explicit_or_inferred"))
        evi = (it.get("evidence", "") or "").strip()
        if len(evi) > 240:
            evi = evi[:240]
        rat = (it.get("rationale", "") or "").strip()
        parsed.append({
            "article_id": it.get("article_id", ""),
            "target": tgt,
            "explicit_or_inferred": eoi,
            "evidence": evi,
            "rationale": rat,
        })
    return parsed


def run_camera_task_conversation(article_text, article_id=None, model=None):
    # Resolve model lazily to avoid NameError if MODEL isn't defined yet
    if model is None:
        model = globals().get("MODEL", "llama3-70b-8192")

    # Ensure Groq client exists (handles fresh kernels)
    cli = globals().get("client")
    if cli is None:
        try:
            from groq import Groq
        except Exception as e:
            raise RuntimeError("Groq SDK is not available; please run the Groq setup cell.") from e
        api = globals().get("api_key")
        if not api:
            raise RuntimeError("Groq client is not initialized and GROQ_API_KEY is missing.")
        globals()["client"] = Groq(api_key=api)
        cli = globals()["client"]

    # System and user prompts geared to constitutional analysis in Spanish.
    messages = [
        {
            "role": "system",
            "content": (
                "Eres un LLM experto en derecho constitucional argentino. "
                "Analizas artículos de la Constitución Nacional de Argentina y determinas si el artículo asigna o implica una tarea/rol a las cámaras legislativas: Congreso, Senado, ambas, o ninguna. "
                "Si la referencia es indirecta pero razonablemente deducible (p. ej., menciones a 'Congreso' o 'Senado', o a funciones propias como sanción de leyes, juicio político, acuerdos, etc.), clasifícala como 'inferred'. "
                "Cuando el texto menciona explícitamente 'Congreso' o 'Senado' con una atribución específica, clasifícala como 'explicit'."
            ),
        },
        {
            "role": "user",
            "content": (
                f"Analiza el siguiente artículo y clasifica la tarea implicada en relación con las cámaras.\n\n"
                f"Artículo ID: {article_id or ''}\n"
                f"Texto: {article_text}"
            ),
        },
    ]

    tools = [
        {
            "type": "function",
            "function": {
                "name": "camera_task_parser",
                "description": (
                    "Dado el texto de un artículo constitucional argentino, devuelve una lista con una única entrada que clasifica si el artículo asigna/implica una tarea a: 'congress', 'senate', 'both', o 'neither'. "
                    "Incluye si la atribución es 'explicit' o 'inferred', cita una evidencia breve del texto y explica brevemente el criterio."
                ),
                "parameters": {
                    "type": "object",
                    "properties": {
                        "items": {
                            "type": "array",
                            "description": (
                                "Una lista con una única clasificación para el artículo analizado."
                            ),
                            "items": {
                                "type": "object",
                                "properties": {
                                    "article_id": {
                                        "type": "string",
                                        "description": "Identificador del artículo (p. ej., 'Artículo 1', o índice numérico).",
                                    },
                                    "target": {
                                        "type": "string",
                                        "description": (
                                            "A qué cámara(s) asigna o implica la tarea: 'congress' (Congreso de la Nación), 'senate' (Senado), 'both' (ambas cámaras), o 'neither' (ninguna). Es obligatorio elegir una de estas cuatro."
                                        ),
                                    },
                                    "explicit_or_inferred": {
                                        "type": "string",
                                        "description": (
                                            "Especifica si la atribución es 'explicit' (mencionada expresamente) o 'inferred' (deducida razonablemente)."
                                        ),
                                    },
                                    "evidence": {
                                        "type": "string",
                                        "description": "Fragmento breve del artículo que respalda la clasificación (máx. ~240 caracteres).",
                                    },
                                    "rationale": {
                                        "type": "string",
                                        "description": (
                                            "Explicación corta (1-3 frases) de por qué el artículo implica la tarea para esa(s) cámara(s)."
                                        ),
                                    },
                                },
                                # No required fields inside to avoid tool validation failures; defaults handled in parser
                            },
                        },
                    },
                    "required": ["items"],
                },
            },
        }
    ]

    def _call(messages):
        return cli.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_tokens=2048,
        )

    try:
        response = _call(messages)
    except Exception:
        # Retry once with stricter instruction to avoid empty fields
        messages.insert(1, {
            "role": "system",
            "content": (
                "Usa la herramienta 'camera_task_parser' y rellena todos los campos. "
                "Elige exactamente uno en target: congress | senate | both | neither. "
                "No dejes valores vacíos; si dudas, usa 'neither' y marca 'inferred'."
            )
        })
        response = _call(messages)

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls

    if tool_calls:
        available_functions = {"camera_task_parser": camera_task_parser}
        # Extend messages with assistant's function call
        messages.append(response_message)
        for tc in tool_calls:
            fn_name = tc.function.name
            fn = available_functions[fn_name]
            fn_args = json.loads(tc.function.arguments)
            fn_response = fn(items=fn_args.get("items", []))
            messages.append({
                "role": "function",
                "name": fn_name,
                "content": json.dumps(fn_response),
            })
        second = cli.chat.completions.create(model=model, messages=messages)
        return second.choices[0].message.content, fn_response

    # If no tool call, fall back to plain text
    return response_message.content, []


# Convenience: classify all parsed articles

def classify_all_camera_tasks(articles):
    results = []
    for idx, art in enumerate(articles, start=1):
        text = art.strip()
        if not text:
            continue
        completion, structured = run_camera_task_conversation(text, article_id=f"Artículo {idx}")
        results.append({
            "article_id": f"Artículo {idx}",
            "completion": completion,
            "structured": structured,
        })
    return results

In [None]:
# Quick smoke test on first articles (optional)
preview = articles_list[:5] if isinstance(articles_list, list) else []
results = classify_all_camera_tasks(preview)

# Normalize to a flat DataFrame for inspection
rows = []
for r in results:
    structured = r.get("structured") or []
    if structured and isinstance(structured, list):
        for item in structured:
            # Each tool return from camera_task_parser is a list, so unpack
            if isinstance(item, list):
                for sub in item:
                    rows.append({
                        "article_id": sub.get("article_id", r.get("article_id")),
                        "target": sub.get("target"),
                        "explicit_or_inferred": sub.get("explicit_or_inferred"),
                        "evidence": sub.get("evidence"),
                        "rationale": sub.get("rationale"),
                    })
            elif isinstance(item, dict):
                rows.append({
                    "article_id": item.get("article_id", r.get("article_id")),
                    "target": item.get("target"),
                    "explicit_or_inferred": item.get("explicit_or_inferred"),
                    "evidence": item.get("evidence"),
                    "rationale": item.get("rationale"),
                })
    else:
        rows.append({
            "article_id": r.get("article_id"),
            "target": None,
            "explicit_or_inferred": None,
            "evidence": None,
            "rationale": r.get("completion"),
        })

df_camera_tasks = pd.DataFrame(rows)
df_camera_tasks.head()

Unnamed: 0,article_id,target,explicit_or_inferred,evidence,rationale
0,Artículo 1,neither,inferred,Adopta para su gobierno la forma representativ...,"El artículo establece la forma de gobierno, lo..."
1,Artículo 2,neither,inferred,El gobierno federal sostiene el culto católico...,No se menciona explícitamente la intervención ...
2,Artículo 3,congress,explicit,por una ley especial del Congreso,El artículo especifica que una ley especial de...
3,Artículo 4,congress,explicit,"del congreso general, y de los empréstitos y o...",El artículo menciona explícitamente al Congres...
4,Artículo 5,congress,inferred,dictará para sí una constitución,La provincia dictará una constitución que aseg...
