# Using libraries in python

# Part 1: Pandas DataFrames

Pandas is a powerful library for data manipulation and analysis in Python. It provides data structures like Series and DataFrame, which are essential for handling structured data.

This notebook provides a comprehensive guide on how to use and manipulate Pandas DataFrames using examples from financial markets. 

## 1. Introduction to Pandas DataFrames

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table.

### Importing Pandas
First, we need to import the Pandas library.

In [18]:
import pandas as pd

### Creating a DataFrame
Let's create a DataFrame with some financial market data.

In [None]:
# Creating a DataFrame
data = {
    'Date': pd.date_range(start='2023-01-01', periods=5, freq='D'),
    'AAPL': [150.75, 153.30, 149.50, 155.00, 157.25],
    'GOOGL': [2800.50, 2820.00, 2790.00, 2850.00, 2900.00],
    'MSFT': [299.00, 305.00, 300.00, 310.00, 315.00]
}
df = pd.DataFrame(data)
print(df)

## 2. Basic DataFrame Operations

### Viewing Data
You can view the first few rows of the DataFrame using the `head()` method and the last few rows using the `tail()` method.

In [None]:
# Viewing the first few rows
print(df.head())

# Viewing the last few rows
print(df.tail())

### Getting DataFrame Information
You can get information about the DataFrame using the `info()` method and summary statistics using the `describe()` method.

In [None]:
# Getting DataFrame information
print(df.info())

# Getting summary statistics
print(df.describe())

### Accessing Data
You can access data in a DataFrame using column names and row indices.

In [None]:
# Accessing a single column
print(df['AAPL'])

# Accessing multiple columns
print(df[['AAPL', 'GOOGL']])

# Accessing a single row by index
print(df.iloc[0])

# Accessing multiple rows by index
print(df.iloc[0:2])

# Accessing a specific value
print(df.at[0, 'AAPL'])

## 3. Data Manipulation

### Adding and Removing Columns
You can add new columns to the DataFrame and remove existing columns.

In [None]:
# Adding a new column
df['AMZN'] = [3400.00, 3420.00, 3390.00, 3450.00, 3500.00]
print(df)

# Removing a column
df = df.drop(columns=['AMZN'])
print(df)

### Adding and Removing Rows
You can add new rows to the DataFrame and remove existing rows.

In [None]:
# Adding a new row
new_row = {'Date': '2023-01-06', 'AAPL': 160.00, 'GOOGL': 2950.00, 'MSFT': 320.00}
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
print(df)

# Removing a row
if 5 in df.index:
    df = df.drop(index=5)
print(df)

### Filtering Data
You can filter data in the DataFrame based on certain conditions.

In [None]:
# Filtering rows where AAPL price is greater than 150
filtered_df = df[df['AAPL'] > 150]
print(filtered_df)

### Sorting Data
You can sort data in the DataFrame by one or more columns.

In [None]:
# Sorting by AAPL prices
sorted_df = df.sort_values(by='AAPL')
print(sorted_df)

# Sorting by multiple columns
sorted_df = df.sort_values(by=['AAPL', 'GOOGL'])
print(sorted_df)

## 4. Handling Missing Data

### Detecting Missing Data
You can detect missing data in the DataFrame.

In [None]:
# Detecting missing data
df_with_nan = df.copy()
df_with_nan.loc[2, 'AAPL'] = None
print(df_with_nan.isnull())
print(df_with_nan.isnull().sum())

### Handling Missing Data
You can handle missing data by filling or dropping missing values.

In [None]:
# Filling missing values
df_filled = df_with_nan.fillna(0)
print(df_filled)

# Dropping rows with missing values
df_dropped = df_with_nan.dropna()
print(df_dropped)

## 5. Grouping and Aggregating Data

You can group and aggregate data in the DataFrame to perform summary statistics.

In [None]:
# Grouping and aggregating data
data = {
    'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
    'Stock': ['AAPL', 'GOOGL', 'MSFT', 'AAPL', 'GOOGL', 'MSFT', 'AAPL', 'GOOGL', 'MSFT', 'AAPL'],
    'Price': [150.75, 2800.50, 299.00, 153.30, 2820.00, 305.00, 149.50, 2790.00, 300.00, 155.00]
}
df_group = pd.DataFrame(data)
print(df_group)

# Grouping by Stock and calculating mean price
grouped_df = df_group.groupby('Stock').mean()
print(grouped_df)

## 6. Merging and Joining DataFrames

You can merge and join DataFrames to combine data from multiple sources.

In [None]:
# Creating two DataFrames
data1 = {
    'Date': pd.date_range(start='2023-01-01', periods=5, freq='D'),
    'AAPL': [150.75, 153.30, 149.50, 155.00, 157.25]
}
data2 = {
    'Date': pd.date_range(start='2023-01-01', periods=5, freq='D'),
    'GOOGL': [2800.50, 2820.00, 2790.00, 2850.00, 2900.00]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print(df1)
print(df2)

# Merging DataFrames
merged_df = pd.merge(df1, df2, on='Date')
print(merged_df)

# Part 2: NumPy

NumPy provides fast array operations that are ideal for vectorized financial calculations such as returns, risk metrics, and portfolio algebra. The examples below illustrate common workflows.

### Importing NumPy
First, import NumPy to enable fast numerical operations.

In [None]:
import numpy as np

### Creating NumPy Arrays
NumPy arrays store numeric data efficiently and enable vectorized operations commonly used in finance.

In [None]:
# Creating a NumPy array from a list
stock_prices = [150.75, 153.30, 149.50, 155.00, 157.25]
np_stock_prices = np.array(stock_prices)
print(np_stock_prices)

### Creating Arrays with NumPy Functions
NumPy offers helpers to initialize arrays with common patterns useful for simulations and baselines.

In [None]:
# Creating arrays with common patterns
zeros_array = np.zeros(5)
ones_array = np.ones(5)
range_array = np.arange(1, 6)
linspace_array = np.linspace(0, 1, 5)

print("Zeros:", zeros_array)
print("Ones:", ones_array)
print("Range:", range_array)
print("Linspace:", linspace_array)

### Basic Operations on NumPy Arrays
Vectorized arithmetic keeps calculations concise and fast for pricing and returns.

In [None]:
# Basic arithmetic operations
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([10, 20, 30, 40, 50])

print("Addition:", array1 + array2)
print("Subtraction:", array1 - array2)
print("Multiplication:", array1 * array2)
print("Division:", array1 / array2)

### Statistical Operations
Quickly compute summary statistics to monitor performance or risk.

In [None]:
# Statistical operations on price data
print("Mean:", np.mean(np_stock_prices))
print("Median:", np.median(np_stock_prices))
print("Standard Deviation:", np.std(np_stock_prices))
print("Variance:", np.var(np_stock_prices))

### Indexing and Slicing NumPy Arrays
Slice arrays to inspect subsets or adjust values in place.

In [None]:
# Indexing and slicing examples
print("First element:", np_stock_prices[0])
print("Last element:", np_stock_prices[-1])
print("First three elements:", np_stock_prices[:3])
print("Last two elements:", np_stock_prices[-2:])

np_stock_prices[0] = 160.00
print("Modified array:", np_stock_prices)

### Working with Multi-Dimensional Arrays
Model matrices (e.g., correlations or pricing grids) with 2D arrays.

In [None]:
# 2D array (matrix) examples
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)

print("Element at (0, 0):", matrix[0, 0])
print("Element at (1, 2):", matrix[1, 2])
print("First two rows:", matrix[:2, :])
print("Last two columns:", matrix[:, -2:])

### Financial Markets Examples
Apply NumPy to returns, portfolio math, and covariance estimation.

In [None]:
# Example 1: Calculating log returns
prices = np.array([100, 101, 102, 103, 104])
log_returns = np.log(prices[1:] / prices[:-1])
print("Log returns:", log_returns)

# Example 2: Portfolio returns
weights = np.array([0.4, 0.3, 0.3])  # Weights for AAPL, GOOGL, MSFT
returns = np.array([0.01, 0.02, 0.015])  # Daily returns for AAPL, GOOGL, MSFT
portfolio_return = np.dot(weights, returns)
print("Portfolio return:", portfolio_return)

# Example 3: Covariance matrix of returns
returns_matrix = np.array([
    [0.01, 0.02, 0.015],
    [0.02, 0.025, 0.02],
    [0.015, 0.02, 0.01]
])
cov_matrix = np.cov(returns_matrix.T)
print("Covariance matrix:\n", cov_matrix)

# Part 3. Matplotlib
Matplotlib helps translate numerical results into charts for communicating trends visually.

### Importing Matplotlib
Import `matplotlib.pyplot` for quick plotting in notebooks.

In [None]:
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8")

### Plotting Price History
Line plots are useful for visualizing price trends over time.

In [None]:
# Line chart of sample prices
dates = pd.date_range(start="2023-01-01", periods=5, freq="D")
prices = [150.75, 153.30, 149.50, 155.00, 157.25]

plt.figure(figsize=(8, 4))
plt.plot(dates, prices, marker="o", label="AAPL close")
plt.title("Sample Price History")
plt.xlabel("Date")
plt.ylabel("Price ($)")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

### Visualizing Return Distributions
Histograms help inspect the distribution of returns for risk assessment.

In [None]:
# Histogram of simulated daily returns
np.random.seed(42)
simulated_returns = np.random.normal(loc=0.001, scale=0.01, size=250)

plt.figure(figsize=(8, 4))
plt.hist(simulated_returns, bins=30, color="steelblue", edgecolor="black")
plt.title("Simulated Daily Returns Distribution")
plt.xlabel("Return")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()