## Modularizing the solution ‚úÇÔ∏è

Currently, our solution does everything at once. Meaning, with a single prompt, we aim to extract all information from the given article at once.

For our use case, the current solution may be sufficient.
However, for more complex use cases, a "one prompt to rule them all" approach is generally not the best. 

In this notebook, we will explore how we can modularize our solution.

We will split our solution in four parts/tasks:

1. Extract general information from the article (e.g. `title`, `summary`)
2. Classify whether the article is business-related (i.e. `is_about_business`)<br><br>
*If business-related*: <br>
3. Identify which businesses or companies are involved
4. Extract information about each business involved (`stock_price_change`, etc.)

In [1]:
# For autoreloading external modules
%load_ext autoreload
%autoreload 2

## Redefining the LLM outputs

In [2]:
from typing import List, Literal
from pydantic import BaseModel, Field

This is what we had previously:

In [3]:
class BusinessSpecificInfo(BaseModel):
    business: str = Field(..., description="The business or company involved")
    stock_price_change: Literal["increase", "decrease", "none"] = Field(
        ..., description="""
        Possible stock price change as result of the article. 
        - "increase" if article speaks positively about the business
        - "decrease" if article speaks negatively about the business
        - "none" if article speaks neutrally about the business
        """
    )
    reason: str = Field(..., description="A single sentence reason for the possible stock price change")
    relevant_substring: str = Field(
        ..., description="A relevant substring from the article supporting the reason (10-20 words)"
    )

class ArticleInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    is_about_business: bool = Field(..., description="Whether the article is about business")
    business_info: List[BusinessSpecificInfo]

Now let's create separate models for each of our modularized task:

In [4]:
class GeneralInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    
class BusinessCategory(BaseModel):
    is_about_business: bool = Field(..., description="Whether the article is about business")
    
class BusinessesInvolved(BaseModel):
    businesses: List[str] = Field(
        ..., description="Which main businesses or companies are involved in the article"
    )

## Extracting info in modularized calls

In [5]:
from llmops_training.news_reader.data import get_bbc_news_sample
from llmops_training.news_reader.generation import generate_object
from llmops_training.news_reader.extraction import (
    get_business_category_prompt_template, 
    get_business_specific_prompt_template, 
    get_businesses_involved_prompt_template, 
    get_general_info_prompt_template,
    format_prompt,
)

In [6]:
import markdown

f = open('article.md', 'r')
article=markdown.markdown( f.read() )

> **Exercise** üìù
>
> - Fill in the TODO's below to extract info from the article in modularized calls. Use the Pydantic models and the imported functions from above.

In [9]:
# Task 1
general_info_prompt_template = get_general_info_prompt_template()
general_info_prompt = format_prompt(general_info_prompt_template, article)
general_info = generate_object(general_info_prompt, GeneralInfo)

# Task 2
business_category_prompt_template = get_business_category_prompt_template()
business_category_prompt = format_prompt(business_category_prompt_template, article)
business_category = generate_object(business_category_prompt, BusinessCategory)

# Task 3
businesses_involved_prompt_template = get_businesses_involved_prompt_template()
businesses_involved_prompt = format_prompt(businesses_involved_prompt_template, article)
businesses_involved = generate_object(businesses_involved_prompt, BusinessesInvolved)

# Task 4
business_info = []
for business in businesses_involved.businesses:
    business_info_prompt_template = get_business_specific_prompt_template()
    prompt = format_prompt(business_info_prompt_template, article, business)
    business_info.append(generate_object(prompt, BusinessSpecificInfo))  # Task 4

output = ArticleInfo(
    title=general_info.title,
    summary=general_info.summary,
    is_about_business=business_category.is_about_business,
    business_info=business_info,
)

In [10]:
# %load ../solutions/modularizing-the-solution/modularize.py

Let's see if that worked!

In [11]:
output.model_dump()

{'title': 'Lululemon boss to step down early next year',
 'summary': 'Calvin McDonald, CEO of Lululemon Athletica, will leave the company at the end of January after over seven years, as Lululemon faces declining sales in the US and a significant drop in share price. Amid competitive pressures, tariff challenges, and mixed international performance, the company has upgraded its revenue forecasts following recent strong sales, and has appointed co-interim chief executives while searching for a permanent replacement.',
 'is_about_business': True,
 'business_info': [{'business': 'Lululemon Athletica',
   'stock_price_change': 'decrease',
   'reason': 'The company experienced poor sales in its primary US market and its share price fell nearly 50% over the past year.',
   'relevant_substring': 'poor sales for Lululemon in the US and its share price falling almost 50%'},
  {'business': 'Vuori',
   'stock_price_change': 'none',
   'reason': 'The article neutrally mentions Vuori as one of the 

## Integrate solution in the app

We have already structured the above solution for you as functions in our package.

In [12]:
from llmops_training.news_reader.extraction import (
    extract_article_info,
    extract_info_from_articles,
)

As you see, we can call a single function from our package that does everything that we defined above:

In [13]:
result, _ = extract_article_info(article)
result.model_dump()

{'title': 'Lululemon boss to step down early next year',
 'summary': 'Lululemon CEO Calvin McDonald is set to step down at the end of January after over seven years at the helm. The decision comes amid declining sales in its US market and increased competition, despite some positive international performance and upgraded revenue forecasts. Interim co-chief executives have been named as the company seeks new leadership.',
 'is_about_business': True,
 'business_info': [{'business': 'Lululemon Athletica',
   'stock_price_change': 'decrease',
   'reason': "Lululemon's share price dropped due to poor sales in its US market and challenges such as increased tariffs.",
   'relevant_substring': 'its share price falling almost 50% in the past year'},
  {'business': 'Vuori',
   'stock_price_change': 'none',
   'reason': 'Vuori is only mentioned as a competitor without any specific news affecting its stock performance.',
   'relevant_substring': 'competition from lower-priced rivals such as Vuori 

We also have a function that does the same thing, but on multiple articles in one go:

In [14]:
article1 = data[data["is_business"]].iloc[0].article
article2 = data[data["is_business"]].iloc[1].article

results, _ = extract_info_from_articles([article1, article2])
results

NameError: name 'data' is not defined

> **Exercise** üìù
>
> - Import the function `extract_info_from_articles` in the `components.py` file and use it to extract information from the articles.
> - Run the app and see if it works!
>   ```bash
>   uv run streamlit run src/llmops_training/news_reader/app/app.py
>   ```
>
> üí° Hint:
>
> - Replace the `mock_extract_info_from_articles` function.

---