# 📜 Automated Trading System - ETL Process

## ✅ Purpose
This notebook (`ETL.ipynb`) performs the **Extract, Transform, Load (ETL)** process on historical stock data downloaded from **SimFin**. It prepares the dataset for **Machine Learning** by cleaning, merging, and structuring data efficiently.

---

## 📌 Steps in the ETL Process

### 1️⃣ Extract Data
- **Loaded CSV files** from SimFin bulk download:
  - `us-shareprices-daily.csv` → **Daily stock prices**
  - `us-companies.csv` → **Company details**
  - `us-income-quarterly.csv` → **Quarterly financial reports** (optional)
- Used **Pandas** to read datasets with correct delimiters (`;`).

---

### 2️⃣ Transform Data
- **Converted `date` column** to datetime format.
- **Dropped missing values** in essential columns (`ticker`, `close`).
- **Converted price-related columns** (`Open`, `High`, `Low`, `Close`, `Volume`) to float.
- **Merged stock prices with company info** on `ticker`.

#### 🔹 Handling Missing Data:
| Column | Issue | Solution |
|---------|--------|-----------|
| `dividend` | Many NaN values | Filled with `0` (assuming no dividend was given) |
| `shares outstanding` | Missing values | Replaced with **median value** |
| `industryid` & `isin` | Missing industry data | Filled with `"Unknown Industry"` and `"Unknown"` |
| `business summary`, `number employees`, `CIK` | Too many NaNs | **Dropped from dataset** |

---

### 3️⃣ Load Data
- **Saved cleaned dataset** as `fully_cleaned_stock_data.csv` for Machine Learning.
- **Confirmed no missing values** using:
  ```python
  print(merged_df.isnull().sum())  # Should all be 0
