In [None]:
!pip install polars

In [2]:
import polars as pl

### 1. SQL Integration (pl.SQLContext)

Polars provides SQL support, allowing you to run SQL queries directly on `DataFrame` objects.

**Example: Using SQL in Polars**

In [3]:
df = pl.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45]
})

# Create SQLContext and register the DataFrame
sql_context = pl.SQLContext()
sql_context.register('people', df)

# Run SQL Query
result = sql_context.execute('SELECT name, age FROM people WHERE age > 35')

print(result.collect())

shape: (2, 2)
┌───────┬─────┐
│ name  ┆ age │
│ ---   ┆ --- │
│ str   ┆ i64 │
╞═══════╪═════╡
│ David ┆ 40  │
│ Eve   ┆ 45  │
└───────┴─────┘


**Why Use SQL with Polars?**
- Great for users familiar with SQL.
- Useful for migrating existing SQL-based workflows.
- Combines well with Polars' performance optimizations.

### 2. Working with Nested JSON Data

Polars natively supports working with complex JSON structures, making it ideal for handling APIs and semi-structured data.

**Example: Parsing Nested JSON**

In [4]:
json_data = [
    {'id': 1, 'name': 'Alice', 'info': {'age': 25, 'city': 'NY'}},
    {'id': 2, 'name': 'Bob', 'info': {'age': 30, 'city': 'LA'}},
    {'id': 3, 'name': 'Charlie', 'info': {'age': 35, 'city': 'SF'}}
]

# Convert JSON to DataFrame
df = pl.from_dicts(json_data)

# Extract nested fields
df = df.with_columns(
    pl.col('info').struct.field('age').alias('age'),
    pl.col('info').struct.field('city').alias('city')
).drop('info')

print(df)

shape: (3, 4)
┌─────┬─────────┬─────┬──────┐
│ id  ┆ name    ┆ age ┆ city │
│ --- ┆ ---     ┆ --- ┆ ---  │
│ i64 ┆ str     ┆ i64 ┆ str  │
╞═════╪═════════╪═════╪══════╡
│ 1   ┆ Alice   ┆ 25  ┆ NY   │
│ 2   ┆ Bob     ┆ 30  ┆ LA   │
│ 3   ┆ Charlie ┆ 35  ┆ SF   │
└─────┴─────────┴─────┴──────┘


**Why Use Polars for JSON?**
- Supports deep field extraction (`struct.field`).
- Handles large JSON files efficiently with `scan_json()`.

### 3. Creating Custom Functions (`apply`, `map`)

Polars allows custom transformations using apply and map for column-wise operations.

In [5]:
df = pl.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Erick'],
    'score': [85, 92, 78, 61]
})

# Custom function to grade scores
def grade(score):
    return 'A' if score >= 90 else 'B' if score >= 80 else 'C'

# Apply custom function
df = df.with_columns(
    pl.col('score').map_elements(grade, return_dtype=pl.Utf8).alias('grade')
)

print(df)

shape: (4, 3)
┌─────────┬───────┬───────┐
│ name    ┆ score ┆ grade │
│ ---     ┆ ---   ┆ ---   │
│ str     ┆ i64   ┆ str   │
╞═════════╪═══════╪═══════╡
│ Alice   ┆ 85    ┆ B     │
│ Bob     ┆ 92    ┆ A     │
│ Charlie ┆ 78    ┆ C     │
│ Erick   ┆ 61    ┆ C     │
└─────────┴───────┴───────┘


### 4. Using Polars in ETL and Data Pipelines

Polars is great for **ETL (Extract, Transform, Load)** processes due to its speed and memory efficiency.

**Example: ETL Pipeline in Polars**

In [None]:
# Extract: Load large dataset lazily
lf = pl.scan_csv('../sales.csv')

# Transform: Filter and aggregate
lf_transformed = (
    lf.group_by('product')
    .agg((pl.col('sale_price') * pl.col('units_sold')).alias('total'))
)

# Load: Save as Parquet for efficient storage
lf_transformed.collect().write_parquet('../sales_summary.parquet')

**Why Use Polars for ETL?**
- `Lazy execution` optimizes performance.
- `Parquet format` reduces storage size.
- `Parallel execution` speeds up transformations.

### 📌 Summary

| Feature |	Benefit |
| ------- | ------- |
| SQL Integration (pl.SQLContext) |	Use SQL queries on DataFrames |
| Nested JSON Handling | Easily extract and transform complex JSON structures |
| Custom Functions (apply, map) | Apply row-wise or vectorized transformations |
| ETL & Data Pipelines | Process large datasets efficiently |

Polars is a **high-performance alternative** to Pandas, perfect for handling large-scale **data processing and ETL workflows.** 🚀