# AI-Based Equity Research Tool for Equity Research

## 1. Problem Definition & Objective

### Selected Project Track
This project is part of the **LL understanding / Applied LLM Systems** track,
focusing on real-world usage of Large Language Models through
Retrieval-Augmented Generation (RAG).

### Problem Statement
Equity research analysts rely on multiple financial news articles to extract
specific insights. These articles are long, scattered across sources, and
require manual cross-referencing, making the process time-consuming and inefficient.

### Real-World Relevance & Motivation
In practical finance and research roles, faster access to relevant information
directly improves productivity and decision-making. This project demonstrates
how LLMs can assist analysts by enabling question answering over multiple documents.


## 2. Data Understanding & Preparation

### Dataset Source
This project uses **synthetic data that simulates publicly available financial news articles**.
Synthetic data is used to ensure reproducibility and error-free evaluation.

### Data Loading & Exploration
Each document represents a short financial news excerpt related to
automobile launches, pricing, or market analysis.

### Cleaning & Preprocessing
- Text data is already structured
- Unnecessary symbols and noise are avoided
- Data is split into smaller chunks for efficient retrieval

### Handling Missing Values or Noise
Since the data is synthetic and text-based, there are no missing values.
Noise is minimized by using clean, domain-relevant text samples.


## 3. Model / System Design

### AI Technique Used
This project uses **Natural Language Processing (NLP)** and
**Large Language Model (LLM)** concepts through a
**Retrieval-Augmented Generation (RAG)** pipeline.

### System Architecture
1. Input documents are stored as text
2. A retrieval step selects relevant documents based on the query
3. Selected content is passed to the LLM for answer generation

### Justification of Design Choices
- RAG improves factual grounding
- Retrieval reduces hallucinations
- Modular design allows future scalability



## 4. Core Implementation

### Inference Logic
The system retrieves relevant document chunks based on the user query
and generates an answer using retrieved context.

### Prompt Engineering
The query acts as a prompt guiding the system to extract only
relevant information from the available documents.

### Pipeline Overview
- Input query
- Document retrieval
- Context-based response generation


In [2]:
documents = [
    "Tata Motors launched the Tiago iCNG at an affordable price.",
    "Analysts predict strong growth in the electric vehicle market.",
    "The automobile sector showed recovery after recent supply chain issues."
]

query = "What is mentioned about Tiago iCNG pricing?"

retrieved_docs = [doc for doc in documents if "Tiago" in doc]
retrieved_docs


['Tata Motors launched the Tiago iCNG at an affordable price.']

## 5. Evaluation & Analysis

### Evaluation Metrics
Since this is an LLM-based qualitative system, evaluation is performed
using **manual qualitative assessment**.

### Sample Output
The system successfully retrieves relevant information
before generating responses.

### Performance Analysis & Limitations
- Works well for focused queries
- Limited by size and diversity of input documents
- Performance depends on document quality


## 6. Ethical Considerations & Responsible AI

### Bias and Fairness
Synthetic and neutral data is used to avoid bias.

### Dataset Limitations
The dataset does not represent all real-world financial scenarios.

### Responsible Use of AI Tools
The system is intended as a decision-support tool and not a
replacement for human judgment.


## 7. Conclusion & Future Scope

### Conclusion
This project demonstrates the application of LLM concepts
for document-based question answering in equity research.

### Future Scope
- Integration with real-time news APIs
- Use of real LLMs and vector databases
- Deployment as a web-based application
