# Example: Quick Data Analysis

This example demonstrates a simple data analysis workflow using the template structure.

**Author:** Template Team

**Date:** 2024-10-30

---

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
%matplotlib inline

## Generate Sample Data

For this example, we'll create synthetic data.

In [None]:
# Create sample dataset
np.random.seed(42)
n_samples = 100

data = {
    'id': range(1, n_samples + 1),
    'category': np.random.choice(['A', 'B', 'C'], n_samples),
    'value1': np.random.randn(n_samples) * 10 + 50,
    'value2': np.random.randn(n_samples) * 15 + 100,
    'score': np.random.uniform(0, 100, n_samples)
}

df = pd.DataFrame(data)
print(f"Dataset shape: {df.shape}")
df.head()

## Basic Statistics

In [None]:
# Summary statistics
print("\nSummary Statistics:")
print(df.describe())

print("\nCategory Distribution:")
print(df['category'].value_counts())

## Visualizations

In [None]:
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Histogram
axes[0, 0].hist(df['value1'], bins=20, edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Distribution of Value1')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

# Scatter plot
axes[0, 1].scatter(df['value1'], df['value2'], alpha=0.6, c=df['score'], cmap='viridis')
axes[0, 1].set_title('Value1 vs Value2')
axes[0, 1].set_xlabel('Value1')
axes[0, 1].set_ylabel('Value2')

# Bar plot
category_means = df.groupby('category')['score'].mean()
category_means.plot(kind='bar', ax=axes[1, 0], color='steelblue', edgecolor='black')
axes[1, 0].set_title('Average Score by Category')
axes[1, 0].set_ylabel('Average Score')
axes[1, 0].set_xlabel('Category')

# Box plot
df.boxplot(column='score', by='category', ax=axes[1, 1])
axes[1, 1].set_title('Score Distribution by Category')
axes[1, 1].set_ylabel('Score')

plt.suptitle('Sample Data Analysis', fontsize=16, y=1.00)
plt.tight_layout()
plt.show()

## Correlation Analysis

In [None]:
# Correlation matrix
numeric_cols = df.select_dtypes(include=[np.number]).columns
correlation = df[numeric_cols].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Correlation Matrix')
plt.tight_layout()
plt.show()

## Key Findings

Based on this example analysis:

1. **Data Distribution**: Value1 and Value2 show normal distributions
2. **Categories**: Roughly equal distribution across categories A, B, and C
3. **Correlations**: Low correlation between most variables

### Next Steps

- [ ] Collect real data
- [ ] Perform deeper analysis
- [ ] Build predictive models
- [ ] Create interactive dashboards

## Save Results (Optional)

Uncomment to save outputs to the output directory.

In [None]:
# Save processed data
# df.to_csv('data/processed/example_data.csv', index=False)

# Save figure
# plt.savefig('output/figures/example_analysis.png', dpi=300, bbox_inches='tight')

print("Analysis complete!")