# Week 2 | Assignment 1

## Objective
Build a LangChain-powered pipeline (Chain) that:

1. Accepts a company name as input.  
2. Extracts or generates its stock code.  
3. Uses finance news search tools in LangChain to fetch news about the company.  
4. Sends the news to an LLM (Azure OpenAI GPT-4o or GPT-4o-mini) to generate a structured sentiment profile.  
5. Outputs the result as a JSON object.  
6. Uses **mlflow** for tracing, prompt debugging, and monitoring.  

---

## Tech Stack & Tools
- **Framework:** LangChain  
- **LLM:** GPT-4o or GPT-4o-mini (deployed via Azure OpenAI)  
- **Data Source:** GOAT / Yahoo Finance tool / Brave Search / Exa Search in LangChain  
- **Prompt Management & Observability:** mlflow  
- **Environment:** Python (v3.10+ recommended)  

---

## Tasks Breakdown

### Step 1: Accept Input
- Accept a company name as input (e.g., `"Apple Inc"`).  

### Step 2: Get Stock Code
- Generate or extract the stock ticker/symbol using:
  - Either a static lookup table or an API/tool (like Yahoo Finance OR GOAT Symbol Suggest).  
  - Integrate this as the first link in your chain.  

### Step 3: Fetch Company News
- Use LangChain’s integration with new search tools to fetch the latest news for the extracted stock symbol.  
- Extract a concise list of recent headlines or article summaries.  

### Step 4: Analyze Sentiment with GPT-4o / GPT-4o-mini
- Pass the news summaries to the GPT-4o model via LangChain with a prompt that asks the LLM to:
  - Classify sentiment.  
  - Extract named entities: people, places, other companies.  
  - Provide a structured JSON with the following fields:

```json
{
  "company_name": "",
  "stock_code": "",
  "newsdesc": "",
  "sentiment": "Positive/Negative/Neutral",
  "people_names": [],
  "places_names": [],
  "other_companies_referred": [],
  "related_industries": [],
  "market_implications": "",
  "confidence_score": 0.0
}
```
*Tip: Use StructuredOutputParser from LangChain or PydanticOutputParser for JSON formatting.*

### Step 5: Integrate MLflow
- Log prompts, outputs, and metadata using mlflow.
- Add tracing spans for:
  - Stock code extraction.
  - News fetching.
  - Sentiment parsing.

#### Deliverables & Bonus Ideas

#### Deliverables

#### 1. Python Script / Notebook (`.py` or `.ipynb`)
- Full implementation of the chain.  
- LangChain Chain definition.  
- Proper use of **Azure OpenAI APIs** and **mlflow**.  

#### 2. README
- Setup instructions.  
- Azure and mlflow API configuration steps.  
- Sample command to run the chain.  

#### 3. Sample Output JSON
- Provide a real example for a company (e.g., `"Microsoft"`).  


#### Bonus Ideas (Optional)
- Add **entity linking** for extracted people and companies (e.g., using Wikipedia or LinkedIn).  
- **Visualize sentiment trend** if historical news is considered.  
- Use **LangChain’s MultiPromptChain** to classify and process different types of news differently.  
- Build a **UI using Streamlit**.  


In [1]:
# !python -m pip install --upgrade --quiet  yfinance

In [2]:
import mlflow
print(mlflow.__version__)

3.3.2


In [3]:
from dotenv import load_dotenv
load_dotenv()

True

In [4]:
import os
from typing import List, Optional
from langchain_community.tools.yahoo_finance_news import YahooFinanceNewsTool
from langchain_openai import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
import yfinance as yf

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [5]:
# from openai import AzureOpenAI
# client=AzureOpenAI()
# client.chat.completions.create( messages=[
#         {
#             "role": "user",
#             "content": "What is capital of france?",
#         }
#     ],
#     model="gpt4o",
# ).choices[0].message.content

In [6]:
import mlflow
mlflow.langchain.autolog()

In [7]:
mlflow.set_tracking_uri("http://20.75.92.162:5000")
mlflow.set_experiment("kc-company-stock-sentiment-week2_assignment1")


2025/09/16 17:04:49 INFO mlflow.tracking.fluent: Experiment with name 'kc-company-stock-sentiment-week2_assignment1' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/911887883162768483', creation_time=1758022489343, experiment_id='911887883162768483', last_update_time=1758022489343, lifecycle_stage='active', name='kc-company-stock-sentiment-week2_assignment1', tags={}>

In [8]:
client = AzureChatOpenAI(
    deployment_name=os.environ['AZURE_OPENAI_DEPLOYMENT'],  # Your deployment name
        model_name="gpt-4o",
        temperature=0.1
)

In [9]:
client.invoke('What is the capital of France?').content

'The capital of France is Paris.'

In [11]:
# Pydantic model for structured output
class SentimentProfile(BaseModel):
    company_name: str = Field(description="Name of the company")
    stock_code: str = Field(description="Stock ticker symbol")
    newsdesc: str = Field(description="Summary of recent news")
    sentiment: str = Field(description="Overall sentiment: Positive/Negative/Neutral")
    people_names: List[str] = Field(description="Names of people mentioned in news")
    places_names: List[str] = Field(description="Places/locations mentioned")
    other_companies_referred: List[str] = Field(description="Other companies mentioned")
    related_industries: List[str] = Field(description="Related industries mentioned")
    market_implications: str = Field(description="Potential market implications")
    confidence_score: float = Field(description="Confidence score between 0-1")

In [20]:
class CompanySentimentPipeline:
    def __init__(self, azure_client):
        self.llm = azure_client
        self.yahoo_news = YahooFinanceNewsTool()
        self.parser = PydanticOutputParser(pydantic_object=SentimentProfile)
        
        # Create the prompt template
        self.prompt_template = PromptTemplate(
            template="""
            You are a financial analyst. Analyze the following news about {company_name} (Stock: {stock_code}) and provide a structured sentiment analysis.
            
            Recent News:
            {news_content}
            
            Please analyze this news and extract the following information in the specified JSON format:
            
            {format_instructions}
            
            Focus on:
            - Overall sentiment (Positive/Negative/Neutral)
            - Key people mentioned
            - Places or regions mentioned
            - Other companies referenced
            - Related industries
            - Market implications
            - Your confidence in this analysis (0-1 scale)
            
            Provide your analysis:
            """,
            input_variables=["company_name", "stock_code", "news_content"],
            partial_variables={"format_instructions": self.parser.get_format_instructions()}
        )
        
        # Create the LLM chain
        # self.chain = LLMChain(
        #     llm=self.llm,
        #     prompt=self.prompt_template,
        #     output_parser=self.parser
        # )
       
        self.chain = self.prompt_template | self.llm | self.parser

        
        # Set MLflow experiment
        mlflow.set_experiment("company-sentiment-analysis")
    
    def get_stock_symbol(self, company_name: str) -> Optional[str]:
        """Extract stock symbol with MLflow logging"""
        mlflow.log_param("symbol_extraction_method", "yfinance_with_fallback")
        
        try:
            # Try common approaches to find ticker
            ticker = yf.Ticker(company_name)
            info = ticker.info
            
            if info and 'symbol' in info:
                mlflow.log_param("extraction_source", "yfinance_api")
                return info['symbol']
            
            # Fallback: try searching with yfinance
            # This is a simple mapping for common companies
            company_mappings = {
                "apple": "AAPL",
                "microsoft": "MSFT", 
                "google": "GOOGL",
                "amazon": "AMZN",
                "tesla": "TSLA",
                "meta": "META",
                "nvidia": "NVDA",
                "netflix": "NFLX"
            }
            
            company_lower = company_name.lower()
            for key, symbol in company_mappings.items():
                if key in company_lower:
                    mlflow.log_param("extraction_source", "fallback_mapping")
                    return symbol
            
            mlflow.log_param("extraction_result", "not_found")
            return None
            
        except Exception as e:
            mlflow.log_param("extraction_error", str(e))
            print(f"Error getting stock symbol: {e}")
            return None
    
    def fetch_company_news(self, stock_symbol: str) -> str:
        """Fetch news with MLflow logging"""
        mlflow.log_param("news_source", "yahoo_finance")
        
        try:
            news_data = self.yahoo_news.run(stock_symbol)
            mlflow.log_param("news_fetch_success", True)
            return news_data
        except Exception as e:
            mlflow.log_param("news_fetch_error", str(e))
            mlflow.log_param("news_fetch_success", False)
            print(f"Error fetching news: {e}")
            return f"No recent news available for {stock_symbol}"
    
    def run_pipeline(self, company_name: str) -> dict:
        """Main pipeline execution with detailed MLflow tracing"""
        with mlflow.start_run(run_name=f"sentiment_analysis_{company_name}"):
            try:
                # Log input parameter
                mlflow.log_param("company_name", company_name)
                
                # SPAN 1: Stock code extraction
                with mlflow.start_run(run_name="stock_extraction", nested=True):
                    mlflow.log_param("input_company", company_name)
                    stock_symbol = self.get_stock_symbol(company_name)
                    
                    if not stock_symbol:
                        mlflow.log_param("extraction_status", "failed")
                        mlflow.log_param("error", "Stock symbol not found")
                        return {"error": f"Could not find stock symbol for {company_name}"}
                    
                    mlflow.log_param("extraction_status", "success")
                    mlflow.log_param("extracted_symbol", stock_symbol)
                    mlflow.log_text(f"Extracted symbol: {stock_symbol}", "stock_extraction_output.txt")
                
                print(f"Found stock symbol: {stock_symbol}")
                
                # SPAN 2: News fetching
                with mlflow.start_run(run_name="news_fetching", nested=True):
                    mlflow.log_param("stock_symbol", stock_symbol)
                    news_content = self.fetch_company_news(stock_symbol)
                    
                    mlflow.log_param("news_length_chars", len(news_content))
                    mlflow.log_param("fetch_status", "success" if news_content else "failed")
                    # Log the actual news content (truncate if too long)
                    mlflow.log_text(
                        news_content[:2000] + "..." if len(news_content) > 2000 else news_content, 
                        "fetched_news.txt"
                    )
                
                print(f"Fetched news for {company_name}")
                
                # SPAN 3: Sentiment parsing with LLM
                with mlflow.start_run(run_name="sentiment_analysis", nested=True):
                    mlflow.log_param("llm_model", "gpt-4o")
                    mlflow.log_param("company_name", company_name)
                    mlflow.log_param("stock_code", stock_symbol)
                    
                    # Log the prompt being sent to LLM
                    formatted_prompt = self.prompt_template.format(
                        company_name=company_name,
                        stock_code=stock_symbol,
                        news_content=news_content,
                        format_instructions=self.parser.get_format_instructions()
                    )
                    mlflow.log_text(formatted_prompt, "llm_prompt.txt")
                    
                    # Execute LLM call
                    result = self.chain.invoke({
                        "company_name": company_name,
                        "stock_code": stock_symbol,
                        "news_content": news_content
                    })
                    
                    # Log LLM output
                    result_dict = result.dict()
                    mlflow.log_text(str(result_dict), "llm_output.json")
                    mlflow.log_param("output_sentiment", result_dict.get("sentiment"))
                    mlflow.log_metric("confidence_score", result_dict.get("confidence_score", 0.0))
                    mlflow.log_param("entities_count", 
                                   len(result_dict.get("people_names", [])) + 
                                   len(result_dict.get("other_companies_referred", [])))
                
                # Log final results at parent run level
                mlflow.log_param("final_sentiment", result_dict.get("sentiment"))
                mlflow.log_metric("final_confidence", result_dict.get("confidence_score", 0.0))
                
                return result_dict
                
            except Exception as e:
                mlflow.log_param("pipeline_error", str(e))
                return {"error": f"Pipeline failed: {str(e)}"}

In [21]:
# Create pipeline instance
pipeline = CompanySentimentPipeline(client)

In [22]:
# Run analysis
company_name = "Apple Inc"
result = pipeline.run_pipeline(company_name)

# Print results
import json
print(json.dumps(result, indent=2))


HTTP Error 404: 


🏃 View run stock_extraction at: http://20.75.92.162:5000/#/experiments/464813825302503577/runs/8192c412d0e64698bc7b06ab83d9fb60
🧪 View experiment at: http://20.75.92.162:5000/#/experiments/464813825302503577
Found stock symbol: AAPL
🏃 View run news_fetching at: http://20.75.92.162:5000/#/experiments/464813825302503577/runs/d3687fbedced4702947510003591d688
🧪 View experiment at: http://20.75.92.162:5000/#/experiments/464813825302503577
Fetched news for Apple Inc


/tmp/ipykernel_12831/2214151142.py:162: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  result_dict = result.dict()


🏃 View run sentiment_analysis at: http://20.75.92.162:5000/#/experiments/464813825302503577/runs/ca156cb0722c41f5a7e8fe3291e188af
🧪 View experiment at: http://20.75.92.162:5000/#/experiments/464813825302503577
🏃 View run sentiment_analysis_Apple Inc at: http://20.75.92.162:5000/#/experiments/464813825302503577/runs/f828517d663d4e09aaee4db35b4c1697
🧪 View experiment at: http://20.75.92.162:5000/#/experiments/464813825302503577
{
  "company_name": "Apple Inc.",
  "stock_code": "AAPL",
  "newsdesc": "Jim Cramer praised Apple Inc. during his show, highlighting the company's iPhone Air and visiting its glass supplier's factory in Kentucky.",
  "sentiment": "Positive",
  "people_names": [
    "Jim Cramer"
  ],
  "places_names": [
    "Kentucky"
  ],
  "other_companies_referred": [],
  "related_industries": [
    "Technology",
    "Consumer Electronics"
  ],
  "market_implications": "Positive sentiment around Apple could lead to increased investor confidence and potential stock price apprecia

In [None]:
break

## Just Direct Testing

In [None]:
class CompanySentimentPipeline:
    def __init__(self, azure_client):
        self.llm = azure_client
        self.yahoo_news = YahooFinanceNewsTool()
        self.parser = PydanticOutputParser(pydantic_object=SentimentProfile)
        
        # Create the prompt template
        self.prompt_template = PromptTemplate(
            template="""
            You are a financial analyst. Analyze the following news about {company_name} (Stock: {stock_code}) and provide a structured sentiment analysis.
            
            Recent News:
            {news_content}
            
            Please analyze this news and extract the following information in the specified JSON format:
            
            {format_instructions}
            
            Focus on:
            - Overall sentiment (Positive/Negative/Neutral)
            - Key people mentioned
            - Places or regions mentioned
            - Other companies referenced
            - Related industries
            - Market implications
            - Your confidence in this analysis (0-1 scale)
            
            Provide your analysis:
            """,
            input_variables=["company_name", "stock_code", "news_content"],
            partial_variables={"format_instructions": self.parser.get_format_instructions()}
        )
        
        # Create the LLM chain
        self.chain = LLMChain(
            llm=self.llm,
            prompt=self.prompt_template,
            output_parser=self.parser
        )
    
    def get_stock_symbol(self, company_name: str) -> Optional[str]:
        """Extract stock symbol from company name using yfinance"""
        try:
            # Try common approaches to find ticker
            ticker = yf.Ticker(company_name)
            info = ticker.info
            
            if info and 'symbol' in info:
                return info['symbol']
            
            # Fallback: try searching with yfinance
            # This is a simple mapping for common companies
            company_mappings = {
                "apple": "AAPL",
                "microsoft": "MSFT", 
                "google": "GOOGL",
                "amazon": "AMZN",
                "tesla": "TSLA",
                "meta": "META",
                "nvidia": "NVDA",
                "netflix": "NFLX"
            }
            
            company_lower = company_name.lower()
            for key, symbol in company_mappings.items():
                if key in company_lower:
                    return symbol
                    
            return None
            
        except Exception as e:
            print(f"Error getting stock symbol: {e}")
            return None
    
    def fetch_company_news(self, stock_symbol: str) -> str:
        """Fetch recent news for the company using Yahoo Finance"""
        try:
            news_data = self.yahoo_news.run(stock_symbol)
            return news_data
        except Exception as e:
            print(f"Error fetching news: {e}")
            return f"No recent news available for {stock_symbol}"
    
    def run_pipeline(self, company_name: str) -> dict:
        """Main pipeline execution"""
        try:
            # Step 1: Get stock symbol
            stock_symbol = self.get_stock_symbol(company_name)
            if not stock_symbol:
                return {"error": f"Could not find stock symbol for {company_name}"}
            
            print(f"Found stock symbol: {stock_symbol}")
            
            # Step 2: Fetch news
            news_content = self.fetch_company_news(stock_symbol)
            print(f"Fetched news for {company_name}")
            
            # Step 3: Analyze with LLM
            result = self.chain.run(
                company_name=company_name,
                stock_code=stock_symbol,
                news_content=news_content
            )
            
            # Convert Pydantic model to dict
            return result.dict()
            
        except Exception as e:
            return {"error": f"Pipeline failed: {str(e)}"}
    
    # def run_pipeline(self, company_name: str) -> dict:
    #     """Main pipeline execution"""
    #     with mlflow.start_run(run_name=f"sentiment_analysis_{company_name}"):
    #         try:
    #             # Log input parameter
    #             mlflow.log_param("company_name", company_name)
                
    #             # Step 1: Get stock symbol
    #             stock_symbol = self.get_stock_symbol(company_name)
    #             if not stock_symbol:
    #                 mlflow.log_param("error", "Stock symbol not found")
    #                 return {"error": f"Could not find stock symbol for {company_name}"}
                
    #             mlflow.log_param("stock_symbol", stock_symbol)
    #             print(f"Found stock symbol: {stock_symbol}")
                
    #             # Step 2: Fetch news
    #             news_content = self.fetch_company_news(stock_symbol)
    #             mlflow.log_param("news_length", len(news_content))
    #             print(f"Fetched news for {company_name}")
                
    #             # Step 3: Analyze with LLM (automatically traced by autolog)
    #             result = self.chain.run(
    #                 company_name=company_name,
    #                 stock_code=stock_symbol,
    #                 news_content=news_content
    #             )
                
    #             # Log key results
    #             result_dict = result.dict()
    #             mlflow.log_param("sentiment", result_dict.get("sentiment"))
    #             mlflow.log_metric("confidence_score", result_dict.get("confidence_score", 0.0))
                
    #             return result_dict
                
    #         except Exception as e:
    #             mlflow.log_param("error", str(e))
    #             return {"error": f"Pipeline failed: {str(e)}"}

In [11]:
# Create pipeline instance
pipeline = CompanySentimentPipeline(client)

  self.chain = LLMChain(


In [12]:
# Run analysis
company_name = "Apple Inc"
result = pipeline.run_pipeline(company_name)

# Print results
import json
print(json.dumps(result, indent=2))


HTTP Error 404: 


Found stock symbol: AAPL
Fetched news for Apple Inc


  result = self.chain.run(


{
  "company_name": "Apple Inc.",
  "stock_code": "AAPL",
  "newsdesc": "Jim Cramer praised Apple Inc. during his show, highlighting the company's iPhone Air and visiting its glass supplier's factory in Kentucky.",
  "sentiment": "Positive",
  "people_names": [
    "Jim Cramer"
  ],
  "places_names": [
    "Kentucky"
  ],
  "other_companies_referred": [],
  "related_industries": [
    "Technology",
    "Consumer Electronics"
  ],
  "market_implications": "Positive sentiment may lead to increased investor confidence and potential stock price appreciation.",
  "confidence_score": 0.85
}


/tmp/ipykernel_12370/2354147279.py:106: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  return result.dict()
