# FinInsights: LLM-Powered Financial Assistant (Demo)

This notebook demonstrates a simple LLM-powered assistant that can answer natural language queries about financial data such as portfolio allocations, sector performance, or company fundamentals.

We use the OpenAI API to power a conversational assistant that can:
- Interpret structured data (CSV-based financial or portfolio data)
- Generate textual summaries
- Provide sector insights from plain English queries

This is a simple prototype version designed for demonstration in a single notebook.

### Setup & Data Load

In [66]:
import os
from openai import OpenAI
from dotenv import load_dotenv
import requests
import httpx

# Load API key and proxy from .env
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
http_proxy = os.getenv("HTTP_PROXY")
https_proxy = os.getenv("HTTPS_PROXY")

proxies = {
    "http": http_proxy,
    "https": https_proxy
}

# Load sample portfolio data
df = pd.read_csv("sample_portfolio_30.csv", encoding="utf-8", encoding_errors="ignore")
print(df.head())
print(df.shape)

    Symbol                                          Name  Allocation  \
0  6889106  Taiwan Semiconductor Manufacturing Co., Ltd.    0.269198   
1  BMMV2K8                          Tencent Holdings Ltd    0.120909   
2  BK6YZP5                 Alibaba Group Holding Limited    0.071246   
3  6771720                 Samsung Electronics Co., Ltd.    0.068551   
4  BK1N461                             HDFC Bank Limited    0.042680   

   Adjusted_Market_Value  Dividend_Yield  Price_to_Earnings  Debt_to_Equity  \
0                 957463            1.30              26.87           24.49   
1                 451376            0.82              21.09           36.78   
2                 253403            1.18              16.97           24.57   
3                 243820            2.71              11.27            4.94   
4                 130462            1.10              19.48          129.46   

  Country                  Sector                                   Industry  \
0  Taiwan  I

### Create Smaller Custom Dataframes

In [67]:
# Truncate Business_Description to 40 words
df["Short_Description"] = df["Business_Description"].apply(lambda x: " ".join(str(x).split()[:40]) + "...")

df_allocations = df[["Name", "Allocation", "Sector"]]
df_perf = df[["Name", "Allocation", "Country", "Sector", "YTD_Performance"]]
df_perf_fundamentals = df[["Name", "Allocation", "Country", "Sector", "YTD_Performance", "Adjusted_Market_Value", "Dividend_Yield", "Price_to_Earnings", "Debt_to_Equity"]]
df_business_desc = df[["Name", "Country", "Sector", "Short_Description"]]

### LLM Prompt Function

In [68]:
def ask_fininsights(data: pd.DataFrame, question: str, model="gpt-4"):
    """Send natural language question and tabular data to OpenAI via Chat API"""

    # Convert the entire DataFrame to markdown (table format) for inclusion in the prompt
    # Consider limiting rows (e.g., with data.head(10)) to avoid token limits
    data_sample = data.to_markdown(index=False)

    # Build the prompt to send to the model, including dataset and natural language question
    prompt = f"""
You are a financial analyst assistant. Given the following dataset:

{data_sample}

Answer this question clearly and concisely:
{question}
"""

    # Set headers for OpenAI API request, including authorization
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    # Construct the payload to send to OpenAI
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant specialised in portfolio analysis."},
            {"role": "user", "content": prompt}
        ],
        "temperature": 0.4,   # Lower temperature for more deterministic output
        "max_tokens": 500     # Limit output length to avoid excess token usage
    }

    try:
        # Make the POST request to OpenAI API with optional proxy and timeout
        response = requests.post(
            "https://api.openai.com/v1/chat/completions",
            headers=headers,
            json=payload,
            proxies=proxies,
            timeout=60
        )

        # If successful, return the LLM’s generated message content
        if response.status_code == 200:
            return response.json()["choices"][0]["message"]["content"]
        else:
            # Print the error message if API returns non-200 status
            print(f"OpenAI API Error: {response.status_code} - {response.text}")
            return None

    except requests.exceptions.RequestException as e:
        # Handle timeout, connection, or other request-related errors
        print(f"Request error: {e}")
        return None

### Example Prompts

In [69]:
example_question_1 = "Which sectors have the highest allocations in the portfolio? Provide allocations."
example_question_2 = "What are top 3 performing companies (with their YTD Performance)?"
example_question_3 = "What are the 3 most similar companies to Alibaba based on business description?"
example_question_4 = "Are there any commonalities among strong performing companies?"

### Example 1: Sector Allocations
**Prompt:** "Which sectors have the highest allocations in the portfolio? Provide allocations."  

In [70]:
print("\n\nExample 1:")
print(ask_fininsights(df_allocations, example_question_1))



Example 1:
The sectors with the highest allocations in the portfolio are:

1. Information Technology: 0.269198 (Taiwan Semiconductor Manufacturing Co., Ltd.) + 0.0685511 (Samsung Electronics Co., Ltd.) + 0.0325908 (Xiaomi Corporation Class B) + 0.0302744 (SK hynix Inc.) + 0.0209634 (Hon Hai Precision Industry Co., Ltd.) + 0.0196372 (MediaTek Inc) + 0.0160924 (Infosys Limited) + 0.010404 (Delta Electronics, Inc.) = 0.4677111

2. Financials: 0.0426797 (HDFC Bank Limited) + 0.0277273 (China Construction Bank Corporation Class H) + 0.0250624 (ICICI Bank Limited) + 0.0140366 (Industrial and Commercial Bank of China Limited Class H) + 0.0139188 (Al Rajhi Bank) + 0.0129511 (Ping An Insurance (Group) Company of China, Ltd. Class H) + 0.0115572 (Bank of China Limited Class H) + 0.0110032 (Nu Holdings Ltd. Class A) = 0.1589363

3. Consumer Discretionary: 0.0712461 (Alibaba Group Holding Limited) + 0.0221492 (PDD Holdings Inc. Sponsored ADR Class A) + 0.0216779 (Meituan Class B) + 0.0152249 (BY

### Example 2: Top Performing Companies  
**Prompt:** "What are top 3 performing companies (with their YTD Performance)?"

In [71]:
print("\n\nExample 2:")
print(ask_fininsights(df_perf, example_question_2))



Example 2:
The top three performing companies based on the Year-to-Date (YTD) performance are:

1. Delta Electronics, Inc. with a YTD Performance of 0.595656
2. SK hynix Inc. with a YTD Performance of 0.587134
3. Xiaomi Corporation Class B with a YTD Performance of 0.548867


### Example 3: Similar Businesses
**Prompt:** "What are the 3 most similar companies to Alibaba based on business description?"

In [72]:
print("\n\nExample 3:")
print(ask_fininsights(df_business_desc, example_question_3)) 



Example 3:
Based on the business description, the three most similar companies to Alibaba Group Holding Limited are:

1. Tencent Holdings Ltd - Similar to Alibaba, Tencent provides a range of online and mobile services, including fintech and business services. Both companies are heavily involved in the technology sector and are based in China.

2. JD.com, Inc. Class A - JD.com is also a technology-driven E-commerce company, similar to Alibaba. Both companies are involved in the sale of various products and services online, and are based in China.

3. Meituan Class B - Meituan, like Alibaba, provides a technology platform that connects consumers and merchants. It operates in similar segments such as food delivery, in-store, hotel, and travel. This company is also based in China.


### Example 4: Patterns Among Top Performers
**Prompt:** "Are there any commonalities among strong performing companies?"

In [73]:
print("\n\nExample 4:")
print(ask_fininsights(df_perf_fundamentals, example_question_4)) 



Example 4:
Based on the dataset provided, strong performing companies can be identified by their Year-to-Date (YTD) Performance. Here are some commonalities observed among these companies:

1. Sector: Most of the strong performing companies (with high YTD Performance) are from the Information Technology and Consumer Discretionary sectors. Companies like Taiwan Semiconductor Manufacturing Co., Ltd., Tencent Holdings Ltd, Alibaba Group Holding Limited, and Samsung Electronics Co., Ltd. are all from these sectors.

2. Country: A significant number of strong performing companies are based in China and Taiwan. This includes companies like Taiwan Semiconductor Manufacturing Co., Ltd., Tencent Holdings Ltd, Alibaba Group Holding Limited, and Xiaomi Corporation Class B.

3. Debt-to-Equity Ratio: There doesn't seem to be a clear pattern in the Debt-to-Equity ratio among the strong performing companies. Some companies like Taiwan Semiconductor Manufacturing Co., Ltd. and Samsung Electronics Co

# Conclusion

This notebook explored using OpenAI's API to generate natural language insights from portfolio data. I tested the approach on a range of questions - including sector allocations, top performers, and company similarity based on business descriptions.

The model was generally able to extract and summarise useful information, particularly for questions involving rankings or comparisons (e.g. top performers). It handled business descriptions quite well when asked about company similarities. However, the accuracy of numerical breakdowns (like allocation totals) was mixed, and explanations were often generic or shallow.

One key limitation is that the model only sees a subset of the data due to token limits. This means answers may miss context or appear biased towards the first few rows. It's also worth noting that the model isn’t accessing real-time news or external information - it's only working off the data I provided.

That said, it shows promise as a lightweight assistant for interpreting structured data. Future iterations could include pre-processed summaries (e.g. sector totals), embedding-based similarity rather than text matching, and integration with a wider data pipeline (e.g. newsflow, fundamentals, macro data).