## Modularizing the solution ‚úÇÔ∏è

Currently, our solution does everything at once. Meaning, with a single prompt, we aim to extract all information from the given article at once.

For our use case, the current solution may be sufficient.
However, for more complex use cases, a "one prompt to rule them all" approach is generally not the best. 

In this notebook, we will explore how we can modularize our solution.

We will split our solution in four parts/tasks:

1. Extract general information from the article (e.g. `title`, `summary`)
2. Classify whether the article is business-related (i.e. `is_about_business`)<br><br>
*If business-related*: <br>
3. Identify which businesses or companies are involved
4. Extract information about each business involved (`stock_price_change`, etc.)

In [1]:
# For autoreloading external modules
%load_ext autoreload
%autoreload 2

## Redefining the LLM outputs

In [2]:
from typing import List, Literal
from pydantic import BaseModel, Field

This is what we had previously:

In [4]:
class BusinessSpecificInfo(BaseModel):
    business: str = Field(..., description="The business or company involved")
    stock_price_change: Literal["increase", "decrease", "none"] = Field(
        ..., description="""
        Possible stock price change as result of the article. 
        - "increase" if article speaks positively about the business
        - "decrease" if article speaks negatively about the business
        - "none" if article speaks neutrally about the business
        """
    )
    reason: str = Field(..., description="A single sentence reason for the possible stock price change")
    relevant_substring: str = Field(
        ..., description="A relevant substring from the article supporting the reason (10-20 words)"
    )

class ArticleInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    is_about_business: bool = Field(..., description="Whether the article is about business")
    business_info: List[BusinessSpecificInfo]

Now let's create separate models for each of our modularized task:

In [5]:
class GeneralInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    
class BusinessCategory(BaseModel):
    is_about_business: bool = Field(..., description="Whether the article is about business")
    
class BusinessesInvolved(BaseModel):
    businesses: List[str] = Field(
        ..., description="Which main businesses or companies are involved in the article"
    )

## Extracting info in modularized calls

In [6]:
from llmops_training.news_reader.data import get_bbc_news_sample
from llmops_training.news_reader.generation import generate_object
from llmops_training.news_reader.extraction import (
    get_business_category_prompt_template, 
    get_business_specific_prompt_template, 
    get_businesses_involved_prompt_template, 
    get_general_info_prompt_template,
    format_prompt,
)

In [7]:
article = """
The US firm behind the Roomba smart vacuum cleaner, iRobot, has filed for bankruptcy protection after facing competition from Chinese rivals and being hit by tariffs.

Under the so-called pre-packaged Chapter 11 process, the main manufacturer of its devices, Shenzhen-based Picea Robotics, will take ownership of the firm.

The tough commercial landscape had forced iRobot to cut its prices and make major investments in new technology, according to documents filed on Sunday.

US import duties of 46% on goods from Vietnam, where most of iRobot's devices for the American market are made, increased its costs by $23m (¬£17.2m) this year, the firm said.

The loss-making company was valued at $3.56bn in 2021 after the pandemic helped to drive strong demand for its products. It is now valued at around $140m.

On Friday, iRobot's shares fell by more than 13% on the technology-heavy Nasdaq trading platform in New York.

iRobot said the bankruptcy filing was not expected to disrupt its app, supply chains or product support.

Founded in 1990 by three members of the Massachusetts Institute of Technology's (MIT) Artificial Intelligence Lab, iRobot initially focused on defence and space technology before launching the Roomba in 2002.

The Roomba holds about 42% of the US market share and 65% of the Japanese market share for robotic vacuum cleaners, according to the company.

Last year, a planned $1.7bn takeover deal by online retail giant Amazon was derailed by the European Union's competition watchdog.

Trade tariffs imposed by US Donald Trump on goods entering America from overseas has added to costs to many businesses, including iRobot, which rely on imports for product manufacturing.

Trump has argued that the import taxes will boost American jobs and industry.

Picea is a manufacturer of robotic vacuum cleaners, with research and development and production facilities in China and Vietnam.

It has more than 7,000 employees worldwide and has sold more than 20 million robotic vacuum cleaners.
"""

> **Exercise** üìù
>
> - Fill in the TODO's below to extract info from the article in modularized calls. Use the Pydantic models and the imported functions from above.

In [15]:
# Task 1
general_info_prompt_template = get_general_info_prompt_template()
general_info_prompt = format_prompt(general_info_prompt_template, article)
general_info = generate_object(general_info_prompt, GeneralInfo)

# Task 2
business_category_prompt_template = get_business_category_prompt_template()
business_category_prompt = format_prompt(business_category_prompt_template, article)
business_category = generate_object(business_category_prompt, BusinessCategory)

# Task 3
businesses_involved_prompt_template = get_businesses_involved_prompt_template()
businesses_involved_prompt = format_prompt(businesses_involved_prompt_template, article)
businesses_involved = generate_object(businesses_involved_prompt, BusinessesInvolved)

# Task 4
business_info = []
for business in businesses_involved.businesses:
    business_info_prompt_template = get_business_specific_prompt_template()
    prompt = format_prompt(business_info_prompt_template, article, business)
    business_info.append(generate_object(prompt, BusinessSpecificInfo))  # Task 4

output = ArticleInfo(
    title=general_info.title,
    summary=general_info.summary,
    is_about_business=business_category.is_about_business,
    business_info=business_info,
)

Let's see if that worked!

In [17]:
output.model_dump()

{'title': 'iRobot Files for Bankruptcy Protection Amidst Rising Competition and Tariffs',
 'summary': 'US firm iRobot, the maker of the Roomba smart vacuum cleaner, has filed for bankruptcy protection after facing stiff competition from Chinese rivals and rising costs due to US tariffs, with Shenzhen-based Picea Robotics set to take over the firm in a pre-packaged Chapter 11 process.',
 'is_about_business': True,
 'business_info': [{'business': 'iRobot',
   'stock_price_change': 'decrease',
   'reason': 'Bankruptcy filings, tariffs, and fierce Chinese competition have led to a costly decline resulting in a significant drop in share prices.',
   'relevant_substring': "iRobot's shares fell by more than 13% on the technology-heavy Nasdaq trading platform in New York."},
  {'business': 'Picea Robotics',
   'stock_price_change': 'none',
   'reason': "The article does not provide any information about a change in Picea Robotics' stock price.",
   'relevant_substring': 'Under the so-called pr

## Integrate solution in the app

We have already structured the above solution for you as functions in our package.

In [18]:
from llmops_training.news_reader.extraction import (
    extract_article_info,
    extract_info_from_articles,
)

As you see, we can call a single function from our package that does everything that we defined above:

In [19]:
result, _ = extract_article_info(article)
result.model_dump()

{'title': 'iRobot Files for Bankruptcy Amid Tariff Pressures and Chinese Competition',
 'summary': 'iRobot, the US firm known for its Roomba smart vacuum cleaner, has filed for bankruptcy protection under a pre‚Äêpackaged Chapter 11 process. The company faces intense competition from Chinese rivals, escalating costs from US import tariffs, and a dramatic decline in valuation from $3.56bn to around $140m. Shenzhen-based Picea Robotics is set to take ownership as iRobot struggles with price cuts and heavy investments in new technology, while its operations remain largely unaffected.',
 'is_about_business': True,
 'business_info': [{'business': 'iRobot',
   'stock_price_change': 'decrease',
   'reason': 'The article highlights a steep share drop due to bankruptcy filing, cost increases from tariffs, and stiff competition.',
   'relevant_substring': "iRobot's shares fell by more than 13% on the technology-heavy Nasdaq trading platform in New York."},
  {'business': 'Picea Robotics',
   'st

We also have a function that does the same thing, but on multiple articles in one go:

In [None]:
article1 = data[data["is_business"]].iloc[0].article
article2 = data[data["is_business"]].iloc[1].article

results, _ = extract_info_from_articles([article1, article2])
results

> **Exercise** üìù
>
> - Import the function `extract_info_from_articles` in the `components.py` file and use it to extract information from the articles.
> - Run the app and see if it works!
>   ```bash
>   uv run streamlit run src/llmops_training/news_reader/app/app.py
>   ```
>
> üí° Hint:
>
> - Replace the `mock_extract_info_from_articles` function.

---