# Python Financial Data Analysis Mini Project

This notebook performs an exploratory data analysis (EDA) on a synthetic financial dataset containing daily closing prices for four exchange-traded funds (ETFs). The goal is to demonstrate skills in data cleaning, transformation, visualization, and basic return analysis using Python.

**Tickers:** ETF_A, ETF_B, ETF_C, ETF_D

Dataset file: `financial_data.csv`

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# Load dataset
df = pd.read_csv('financial_data.csv', parse_dates=['date'])
df.head()

## 1. Basic Overview

In [None]:
# Shape and basic info
print('Shape:', df.shape)
print('\nTickers:', df['ticker'].unique())
df.describe(include='all')

## 2. Pivot to Wide Format
For many time series operations it is useful to have one column per ticker.

In [None]:
# Pivot: rows = dates, columns = tickers, values = close_price
prices = df.pivot(index='date', columns='ticker', values='close_price').sort_index()
prices.head()

## 3. Visualize Price Series

In [None]:
# Plot price series for all ETFs
ax = prices.plot(figsize=(10, 5))
ax.set_title('Daily Closing Prices of ETFs')
ax.set_xlabel('Date')
ax.set_ylabel('Price')
plt.tight_layout()
plt.show()

## 4. Daily Returns

In [None]:
# Compute daily percentage returns
returns = prices.pct_change().dropna()
returns.head()

In [None]:
# Summary statistics of daily returns
returns.describe()

## 5. Correlation Analysis

In [None]:
# Correlation matrix of returns
corr_matrix = returns.corr()
corr_matrix

In [None]:
# Visualize correlation matrix as a heatmap
fig, ax = plt.subplots(figsize=(6, 5))
cax = ax.matshow(corr_matrix.values)
fig.colorbar(cax)
ax.set_xticks(range(len(corr_matrix.columns)))
ax.set_xticklabels(corr_matrix.columns, rotation=45)
ax.set_yticks(range(len(corr_matrix.index)))
ax.set_yticklabels(corr_matrix.index)
ax.set_title('Correlation Matrix of ETF Returns', pad=20)
plt.tight_layout()
plt.show()

## 6. Distribution of Returns

In [None]:
# Plot histogram of daily returns for one ETF
ax = returns['ETF_A'].hist(bins=30, figsize=(6, 4))
ax.set_title('Distribution of Daily Returns - ETF_A')
ax.set_xlabel('Daily Return')
ax.set_ylabel('Frequency')
plt.tight_layout()
plt.show()

## 7. Cumulative Returns and Performance Comparison

In [None]:
# Compute cumulative returns for each ETF
cumulative_returns = (1 + returns).cumprod() - 1
cumulative_returns.tail()

In [None]:
# Plot cumulative returns
ax = cumulative_returns.plot(figsize=(10, 5))
ax.set_title('Cumulative Returns of ETFs')
ax.set_xlabel('Date')
ax.set_ylabel('Cumulative Return')
plt.tight_layout()
plt.show()

In [None]:
# Identify total return for each ETF over the full period
total_returns = cumulative_returns.iloc[-1]
print('Total cumulative return over period:')
print(total_returns.sort_values(ascending=False))

## 8. Key Insights

You can adapt this section manually after running the notebook with real data. For example:
- Which ETF had the highest total return?
- Which ETF had the lowest volatility (standard deviation of returns)?
- Are any ETFs highly correlated (e.g., correlation > 0.8)?
- How do drawdowns and recovery periods differ between ETFs?

This narrative section is useful when turning the analysis into a short report.