## Modularizing the solution ‚úÇÔ∏è

Currently, our solution does everything at once. Meaning, with a single prompt, we aim to extract all information from the given article at once.

For our use case, the current solution may be sufficient.
However, for more complex use cases, a "one prompt to rule them all" approach is generally not the best. 

In this notebook, we will explore how we can modularize our solution.

We will split our solution in four parts/tasks:

1. Extract general information from the article (e.g. `title`, `summary`)
2. Classify whether the article is business-related (i.e. `is_about_business`)<br><br>
*If business-related*: <br>
3. Identify which businesses or companies are involved
4. Extract information about each business involved (`stock_price_change`, etc.)

In [1]:
# For autoreloading external modules
%load_ext autoreload
%autoreload 2

## Redefining the LLM outputs

In [2]:
from typing import List, Literal
from pydantic import BaseModel, Field

This is what we had previously:

In [3]:
class BusinessSpecificInfo(BaseModel):
    business: str = Field(..., description="The business or company involved")
    stock_price_change: Literal["increase", "decrease", "none"] = Field(
        ..., description="""
        Possible stock price change as result of the article. 
        - "increase" if article speaks positively about the business
        - "decrease" if article speaks negatively about the business
        - "none" if article speaks neutrally about the business
        """
    )
    reason: str = Field(..., description="A single sentence reason for the possible stock price change")
    relevant_substring: str = Field(
        ..., description="A relevant substring from the article supporting the reason (10-20 words)"
    )

class ArticleInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    is_about_business: bool = Field(..., description="Whether the article is about business")
    business_info: List[BusinessSpecificInfo]

Now let's create separate models for each of our modularized task:

In [4]:
class GeneralInfo(BaseModel):
    title: str = Field(..., description="The title of the article")
    summary: str = Field(..., description="A single sentence summary of the article")
    
class BusinessCategory(BaseModel):
    is_about_business: bool = Field(..., description="Whether the article is about business")
    
class BusinessesInvolved(BaseModel):
    businesses: List[str] = Field(
        ..., description="Which main businesses or companies are involved in the article"
    )

## Extracting info in modularized calls

In [5]:
from llmops_training.news_reader.data import get_bbc_news_sample
from llmops_training.news_reader.generation import generate_object
from llmops_training.news_reader.extraction import (
    get_business_category_prompt_template, 
    get_business_specific_prompt_template, 
    get_businesses_involved_prompt_template, 
    get_general_info_prompt_template,
    format_prompt,
)

In [6]:
article = """

UK economy shrinks again in April as strikes hit output
The UK economy shrank for a second month in April as strikes hit output in key sectors including transport and health.
Gross domestic product (GDP) fell by 0.1% in April, the Office for National Statistics (ONS) said.
The fall follows a 0.3% contraction in March, which was the largest monthly drop since January 2021.
Strikes by workers in sectors including transport, health and education disrupted services in April.
The ONS said the biggest drag on GDP came from the transport sector, which shrank by 2.5% as rail strikes hit services.
The health sector also contracted by 1.3% due to strikes by NHS workers.
Overall, the services sector, which makes up around 80% of the UK economy, fell by 0.2% in April.
The construction sector, however, grew by 0.5%, while the manufacturing sector was flat.
Economists had expected the economy to remain unchanged in April.
The Bank of England has warned that the UK economy is likely to enter a recession later this year, with high inflation and rising interest rates weighing on growth.
The latest GDP figures suggest that the economy is already under significant pressure.
However, some economists believe that the impact of the strikes may be temporary and that the economy could rebound in the coming months.
The government has said it is committed to resolving the disputes with striking workers and ensuring that public services are restored as soon as possible.
"""  # A sample article about the UK economy

> **Exercise** üìù
>
> - Fill in the TODO's below to extract info from the article in modularized calls. Use the Pydantic models and the imported functions from above.

In [8]:
# Task 1
general_info_prompt_template = get_general_info_prompt_template()
general_info_prompt = format_prompt(general_info_prompt_template, article)
general_info = generate_object(general_info_prompt, GeneralInfo)

# Task 2
business_category_prompt_template = get_business_category_prompt_template()
business_category_prompt = format_prompt(business_category_prompt_template, article)
business_category = generate_object(business_category_prompt, BusinessCategory)

# Task 3
businesses_involved_prompt_template = get_businesses_involved_prompt_template()
businesses_involved_prompt = format_prompt(businesses_involved_prompt_template, article)
businesses_involved = generate_object(businesses_involved_prompt, BusinessesInvolved)

# Task 4
business_info = []
for business in businesses_involved.businesses:
    business_info_prompt_template = get_business_specific_prompt_template()
    prompt = format_prompt(business_info_prompt_template, article, business=business)
    business_info.append(generate_object(prompt, BusinessSpecificInfo))  # Task 4

output = ArticleInfo(
    title=general_info.title,
    summary=general_info.summary,
    is_about_business=business_category.is_about_business,
    business_info=business_info,
)

In [None]:
# %load ../solutions/modularizing-the-solution/modularize.py

Let's see if that worked!

In [9]:
output.model_dump()

{'title': 'UK economy shrinks again in April as strikes hit output',
 'summary': 'The UK economy contracted by 0.1% in April due to significant disruptions in key sectors, notably a 2.5% decline in transport amid rail strikes and a 1.3% drop in health services, following a 0.3% contraction in March.',
 'is_about_business': True,
 'business_info': [{'business': 'Office for National Statistics',
   'stock_price_change': 'none',
   'reason': 'The article merely reports that ONS provided GDP data, without indicating any change or performance impact on the business.',
   'relevant_substring': 'GDP fell by 0.1% in April, the Office for National Statistics (ONS) said.'},
  {'business': 'NHS',
   'stock_price_change': 'decrease',
   'reason': 'Strikes by NHS workers led to a contraction of the health sector by 1.3%, indicating negative performance.',
   'relevant_substring': 'the health sector also contracted by 1.3% due to strikes by NHS workers'},
  {'business': 'Bank of England',
   'stock_

## Integrate solution in the app

We have already structured the above solution for you as functions in our package.

In [10]:
from llmops_training.news_reader.extraction import (
    extract_article_info,
    extract_info_from_articles,
)

As you see, we can call a single function from our package that does everything that we defined above:

In [11]:
result, _ = extract_article_info(article)
result.model_dump()

{'title': 'UK economy shrinks again in April as strikes hit output',
 'summary': 'The UK economy contracted in April, with a 0.1% drop in GDP driven by significant strikes in key sectors such as transport and health. The transport sector fell by 2.5%, and the health sector by 1.3%, amid disruptions. This follows a 0.3% shrinkage in March, and while the construction sector grew by 0.5%, the overall services sector, which dominates the economy, fell by 0.2%. Economists had expected stability, but the Bank of England warns of a recession potential, though some believe these disruptions may be temporary.',
 'is_about_business': True,
 'business_info': [{'business': 'Office for National Statistics (ONS)',
   'stock_price_change': 'none',
   'reason': "The article cites ONS's report on a UK GDP contraction with no direct impact on its stock performance.",
   'relevant_substring': 'Gross domestic product (GDP) fell by 0.1% in April, the Office for National Statistics (ONS) said.'},
  {'busine

We also have a function that does the same thing, but on multiple articles in one go:

In [13]:
article1 = article
article2 = article

results, _ = extract_info_from_articles([article1, article2])
results



> **Exercise** üìù
>
> - Import the function `extract_info_from_articles` in the `components.py` file and use it to extract information from the articles.
> - Run the app and see if it works!
>   ```bash
>   uv run streamlit run src/llmops_training/news_reader/app/app.py
>   ```
>
> üí° Hint:
>
> - Replace the `mock_extract_info_from_articles` function.

---