# **AF3005 – Programming for Finance**  

---

## **📘 Assignment 3: Machine Learning on Financial Data with Streamlit**  

📍 **FAST National University of Computer and Emerging Sciences (FAST-NUCES), Islamabad**  
👨‍🏫 **Instructor:** Dr. Usama Arshad (Assistant Professor, FSM)  
🎓 **Program:** BS Financial Technology (BSFT)  
📅 **Semester:** Spring 2025  
📌 **Sections:** BSFT06A, BSFT06B, BSFT06C  

---



---

## 🎯 **Objective**

The objective of this assignment is to develop a fully interactive machine learning application using financial datasets. Students will integrate Kragle datasets, fetch data from Yahoo Finance, implement machine learning models, and visualize the step-by-step ML workflow through a well-designed Streamlit interface.

---

## 🧠 **Relevant Learning Outcomes (LOs)**

| LO | Description |
|:---:|:------------|
| LO3 | Develop financial models and algorithms for decision-making. (PLO 3, PLO 5) |
| LO5 | Visualize and interpret financial data effectively using Python tools. (PLO 1, PLO 3) |
| LO8 | Demonstrate self-learning skills to enhance programming capabilities for finance. (PLO 8) |

---

## 📦 **Requirements**

- **Data Sources**:
  - Upload financial datasets from **Kragle**.
  - Fetch real-time stock market data using **Yahoo Finance API** (`yfinance` library).

- **Machine Learning Models** (Choose one):
  - Linear Regression
  - Logistic Regression
  - K-Means Clustering

- **Python Libraries**:
  - `streamlit`
  - `pandas`
  - `numpy`
  - `scikit-learn`
  - `matplotlib`
  - `plotly`
  - `yfinance`

- **Design Elements**:
  - Apply a consistent **color scheme** and **theme**.
  - Add **GIFs, animations, and pictures** to enhance the user experience.
  - Include **button-based navigation**, **notifications**, and **step-by-step visualizations**.

---

## 🛠️ **Task Description**

You are required to build an interactive Streamlit application following these major steps:

### 1. Welcome Interface
- Display a **welcome message** with a **finance-themed GIF**.
- Apply a **custom background color** and **themed buttons**.
- Sidebar should allow:
  - Uploading a Kragle dataset.
  - Fetching Yahoo Finance stock data.

---

### 2. Step-by-Step Machine Learning Pipeline

Each step must be activated using a **separate button** and **confirmed with a notification** (`st.success`, `st.info`, etc.).

| Step | Description | Visual/Notification |
|:----|:------------|:--------------------|
| **Load Data** | Upload or fetch financial data. | Data preview table, load success message |
| **Preprocessing** | Clean missing values, outliers, etc. | Display missing value stats, preprocessing notification |
| **Feature Engineering** | Select and transform features. | Show feature importance/selection results |
| **Train/Test Split** | Split data into training/testing sets. | Visualize split with pie chart |
| **Model Training** | Train chosen ML model. | Notification of training completion |
| **Evaluation** | Evaluate the model. | Display metrics and evaluation charts |
| **Results Visualization** | Predict outcomes or show clusters. | Graphs, cluster visualizations |

---

### 3. Additional Features
- Use **Plotly** for interactive charts wherever possible.
- Display appropriate **notifications** after each stage.
- Add **themed GIFs/pictures** on key pages (start, end, etc.).
- (Bonus) Allow users to **download the results**.

---

## 📊 **Expected Deliverables**

- A **Jupyter Notebook (.ipynb)** with complete, properly commented code.
- A **Streamlit application** that runs locally.
- (Optional Bonus) A deployed **Streamlit Cloud link** for public access.

---

## 🎁 **Bonus Points**
- Allow dynamic model selection.
- Add feature importance visualization (if using Linear or Logistic Regression).
- Schedule real-time Yahoo Finance data updates.
- Downloadable model or results.

---

## ⚡ **Important Instructions**
- **Use consistent colors and visual themes** throughout the app.
- **Interactive navigation** is mandatory (buttons for every major step).
- **Proper error handling** must be included (e.g., if no dataset is loaded).
- Submit a neat, **well-commented** and **easy-to-navigate notebook**.
- Include a brief **"How to Run"** note at the top of your notebook.

---

## 🧾 **📍 Submission Guidelines**

1. **GitHub Repository**  
   - Upload all code files, `requirements.txt`, and a proper `README.md`.  
   - README must include:
     - Course Name
     - Instructor Name
     - App Overview
     - Deployment Link
     - Demo Video  
   - **Share the GitHub repository link.**

2. **Streamlit Share Deployment**  
   - Deploy the app using Streamlit Share.  
   - **Submit the deployed app link.**

3. **LinkedIn Post**  
   - Make a post about your project, including:
     - **Project Description**
     - **GitHub Link**
     - **Streamlit App Link**
     - **Demo Video**
     - **Tag your Instructor** and **use relevant hashtags**.

4. **Submission Form**  
   - Submit the **GitHub link**, **Streamlit link**, and **LinkedIn post link** in the designated online form provided.

---

## 🏆 **Total Marks: 20**

| Section | Marks |
|:--------|:------|
| Correct Workflow and Implementation | 8 |
| Visualizations and Interactivity | 5 |
| Design, Theme, and User Experience | 4 |
| Submission Requirements Fulfilled | 3 |

---

## 📅 **Submission Deadline**
- [ *4 May 2025* ]

---

# 🚀 **Good Luck! Build smart, code smart, and showcase your finance-tech skills!**

---


In [None]:
# ✅ Works in both Colab and Streamlit (comment st lines in Colab)
try:
    import streamlit as st
    is_streamlit = True
except ImportError:
    is_streamlit = False

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error, r2_score

import io

# ---------------------------------------------
# 🧾 Streamlit UI (only runs if in Streamlit)
if is_streamlit:
    st.set_page_config(page_title="Finance ML App", layout="centered")
    st.title("📈 Financial Data ML App")
    st.markdown("Upload a financial dataset and apply ML models step-by-step.")

    uploaded_file = st.file_uploader("📂 Upload CSV file", type=["csv"])
else:
    # 🧪 Colab: Manual file upload
    from google.colab import files
    uploaded = files.upload()
    uploaded_file = list(uploaded.keys())[0]

# ---------------------------------------------
# 📊 Load Data
if uploaded_file:
    if is_streamlit:
        df = pd.read_csv(uploaded_file)
        st.success("✅ Data Loaded!")
        st.dataframe(df.head())
    else:
        df = pd.read_csv(uploaded_file)
        print("✅ Data Loaded from:", uploaded_file)
        print(df.head())

    # ---------------------------------------------
    # 🧹 Preprocessing
    df_clean = df.dropna()
    if is_streamlit:
        st.info("🧹 Removed missing values.")
        st.write(df_clean.isnull().sum())
    else:
        print("🧹 Cleaned Missing Values:")
        print(df_clean.isnull().sum())

    # ---------------------------------------------
    # 📈 Feature Selection (numeric)
    features = df_clean.select_dtypes(include=np.number)
    if features.shape[1] < 2:
        raise Exception("Need at least 2 numeric columns (features + target)")

    X = features.iloc[:, :-1]
    y = features.iloc[:, -1]

    # ---------------------------------------------
    # ✂️ Split Data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    if is_streamlit:
        st.success("✅ Data Split (80/20)")
        st.write("Training Size:", X_train.shape[0])
        st.write("Testing Size:", X_test.shape[0])
    else:
        print("✅ Data Split:")
        print("Train size:", X_train.shape)
        print("Test size:", X_test.shape)

    # ---------------------------------------------
    # 📚 Train Model (Linear Regression)
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    if is_streamlit:
        st.success("✅ Model Trained: Linear Regression")
        st.write("📉 Mean Squared Error:", round(mse, 3))
        st.write("📈 R² Score:", round(r2, 3))
    else:
        print("✅ Linear Regression Trained")
        print("MSE:", mse)
        print("R2 Score:", r2)

    # ---------------------------------------------
    # 📊 Visualization
    if is_streamlit:
        st.subheader("📊 Actual vs Predicted")
        fig, ax = plt.subplots()
        ax.scatter(y_test, y_pred, alpha=0.7)
        ax.set_xlabel("Actual")
        ax.set_ylabel("Predicted")
        ax.set_title("Actual vs Predicted")
        st.pyplot(fig)
    else:
        plt.figure(figsize=(6, 4))
        plt.scatter(y_test, y_pred, alpha=0.7)
        plt.xlabel("Actual")
        plt.ylabel("Predicted")
        plt.title("Actual vs Predicted")
        plt.grid(True)
        plt.show()
