# Market EDA for AAPL

**Author:** galafis  
**Date:** 2025-09-29  
**Objective:** Explore intraday price behavior, volume patterns, and potential anomalies in Apple Inc. (AAPL) trading data


## 1. Context & Questions

This notebook explores synthetic AAPL market data to understand:

- Intraday price volatility and trends
- Volume distribution and outliers
- Price-volume correlation
- Potential data quality issues

**Data Source:** Synthetic sample generated for demonstration purposes.

In [None]:
# Import dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Configure plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
%matplotlib inline

## 2. Data Loading and Sampling

For this EDA, we generate synthetic intraday data to demonstrate analytical workflows without exposing production data.

In [None]:
# Generate synthetic AAPL intraday data
np.random.seed(42)

# Create datetime index for September 2025 trading days
start_date = datetime(2025, 9, 1, 9, 30)
periods = 20 * 390  # 20 trading days, 390 minutes each
date_range = pd.date_range(start=start_date, periods=periods, freq='1min')

# Simulate price movement with random walk
base_price = 175.0
price_changes = np.random.randn(periods) * 0.5
prices = base_price + np.cumsum(price_changes)

# Simulate volume with lognormal distribution
volumes = np.random.lognormal(mean=11, sigma=0.8, size=periods).astype(int)

# Create DataFrame
df = pd.DataFrame({
    'timestamp': date_range,
    'symbol': 'AAPL',
    'price': prices,
    'volume': volumes
})

# Add derived fields
df['hour'] = df['timestamp'].dt.hour
df['date'] = df['timestamp'].dt.date

print(f"Dataset shape: {df.shape}")
df.head()

## 3. Quality Checks

In [None]:
# Check for missing values
print("Missing values per column:")
print(df.isnull().sum())
print("\n")

# Check data types
print("Data types:")
print(df.dtypes)
print("\n")

# Check for duplicates
duplicates = df.duplicated(subset=['timestamp', 'symbol']).sum()
print(f"Duplicate records: {duplicates}")

In [None]:
# Value ranges
print("Price statistics:")
print(df['price'].describe())
print("\n")

print("Volume statistics:")
print(df['volume'].describe())

## 4. Exploratory Plots and Summary Stats

In [None]:
# Price time series
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(df['timestamp'], df['price'], linewidth=0.8, alpha=0.8)
ax.set_title('AAPL Intraday Price Movement - September 2025', fontsize=14)
ax.set_xlabel('Timestamp')
ax.set_ylabel('Price ($)')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Volume distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].hist(df['volume'], bins=50, edgecolor='black', alpha=0.7)
axes[0].set_title('Volume Distribution')
axes[0].set_xlabel('Volume')
axes[0].set_ylabel('Frequency')

axes[1].boxplot(df['volume'])
axes[1].set_title('Volume Boxplot')
axes[1].set_ylabel('Volume')

plt.tight_layout()
plt.show()

In [None]:
# Intraday patterns by hour
hourly_stats = df.groupby('hour').agg({
    'price': ['mean', 'std'],
    'volume': ['mean', 'sum']
})

fig, axes = plt.subplots(2, 1, figsize=(12, 8))

axes[0].plot(hourly_stats.index, hourly_stats[('price', 'mean')], marker='o', linewidth=2)
axes[0].fill_between(
    hourly_stats.index,
    hourly_stats[('price', 'mean')] - hourly_stats[('price', 'std')],
    hourly_stats[('price', 'mean')] + hourly_stats[('price', 'std')],
    alpha=0.3
)
axes[0].set_title('Average Price by Hour (with std dev)')
axes[0].set_ylabel('Price ($)')
axes[0].grid(alpha=0.3)

axes[1].bar(hourly_stats.index, hourly_stats[('volume', 'mean')], alpha=0.7)
axes[1].set_title('Average Volume by Hour')
axes[1].set_xlabel('Hour of Day')
axes[1].set_ylabel('Volume')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Price-volume correlation
correlation = df[['price', 'volume']].corr()
print("Price-Volume Correlation Matrix:")
print(correlation)
print("\n")

# Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['price'], df['volume'], alpha=0.3, s=10)
plt.title('Price vs Volume Scatter')
plt.xlabel('Price ($)')
plt.ylabel('Volume')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## 5. Findings and Next Steps

### Key Observations

- **Price Movement:** Synthetic data shows random walk behavior with moderate volatility
- **Volume Patterns:** Lognormal distribution is typical for market volume data
- **Intraday Dynamics:** Hour-based aggregations reveal patterns that would be more pronounced in real data
- **Data Quality:** No missing values or duplicates detected

### Next Steps

1. Replace synthetic data with sanitized production samples
2. Analyze multi-day trends and seasonality
3. Investigate volume spikes and their correlation with price movements
4. Develop anomaly detection logic for unusual trading patterns
5. Extract reusable plotting functions to `notebooks/utils/plotting.py`