# Python Data Analysis with Polars and Seaborn

This notebook demonstrates basic data analysis using modern Python tools: **Polars** for data manipulation and **Seaborn** for visualization.


## Setup


In [None]:
import polars as pl
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Set up plotting style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)


## Create Sample Data

Let's create a sample dataset using Polars:


In [None]:
# Generate sample data
np.random.seed(42)
n_samples = 1000

data = pl.DataFrame({
    "category": np.random.choice(["A", "B", "C", "D"], n_samples),
    "value": np.random.normal(50, 15, n_samples),
    "score": np.random.uniform(0, 100, n_samples),
    "group": np.random.choice(["Group1", "Group2"], n_samples)
})

print(f"Dataset shape: {data.shape}")
data.head()


## Data Exploration with Polars

Polars provides fast and expressive data manipulation:


In [None]:
# Group by operations
category_summary = (
    data
    .group_by("category")
    .agg([
        pl.col("value").mean().alias("avg_value"),
        pl.col("score").mean().alias("avg_score"),
        pl.len().alias("count")
    ])
    .sort("avg_value", descending=True)
)

print("Summary by Category:")
category_summary


## Data Visualization with Seaborn

Now let's create some visualizations using Seaborn:


In [None]:
# Convert to pandas for seaborn (seaborn works best with pandas)
data_pd = data.to_pandas()

# Distribution plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

sns.histplot(data=data_pd, x="value", hue="category", alpha=0.7, ax=ax1)
ax1.set_title("Distribution of Values by Category")

sns.boxplot(data=data_pd, x="category", y="score", ax=ax2)
ax2.set_title("Score Distribution by Category")

plt.tight_layout()
plt.show()


## Summary

This notebook demonstrates:

- ✅ **Polars** for fast, expressive data manipulation
- ✅ **Seaborn** for statistical data visualization  
- ✅ Integration between Polars and visualization libraries
- ✅ Modern Python data science workflow

### Key Takeaways:

1. **Polars** is faster than pandas for large datasets and has a more consistent API
2. **Seaborn** works well with pandas DataFrames (easy conversion with `.to_pandas()`)
3. The combination provides a powerful, modern data analysis stack
