##  What is Polars 
#### Polars is a lightning-fast DataFrame library built in RUST with Python bindings. It's optimized for performance and multi-threaded execution, making it suitable for large- scale data processing 

### Why use Polars ?
#### Multithreaded execution by default
#### Lazy evalution support (like Spark)
#### Rust-backed performance
#### MEmory efficient
#### Supports both eager and lazy APIs
#### Ideal for Data Science and ETL pipelines

In [2]:
!pip install polars

Collecting polars
  Downloading polars-1.27.1-cp39-abi3-win_amd64.whl.metadata (15 kB)
Downloading polars-1.27.1-cp39-abi3-win_amd64.whl (35.6 MB)
   ---------------------------------------- 0.0/35.6 MB ? eta -:--:--
   - -------------------------------------- 1.3/35.6 MB 7.4 MB/s eta 0:00:05
   -- ------------------------------------- 2.6/35.6 MB 5.8 MB/s eta 0:00:06
   ---- ----------------------------------- 3.7/35.6 MB 5.6 MB/s eta 0:00:06
   ----- ---------------------------------- 4.7/35.6 MB 5.4 MB/s eta 0:00:06
   ------ --------------------------------- 5.8/35.6 MB 5.3 MB/s eta 0:00:06
   ------- -------------------------------- 6.8/35.6 MB 5.2 MB/s eta 0:00:06
   -------- ------------------------------- 7.9/35.6 MB 5.2 MB/s eta 0:00:06
   ---------- ----------------------------- 8.9/35.6 MB 5.2 MB/s eta 0:00:06
   ----------- ---------------------------- 10.0/35.6 MB 5.1 MB/s eta 0:00:05
   ------------ --------------------------- 11.0/35.6 MB 5.1 MB/s eta 0:00:05
   --------

## Core Features & Functionality
### 1 Creae DataFrame 

In [5]:
import polars as pl

df = pl.DataFrame({
    "name": ["Alice","Bob","Charlie"],
    "age":[25,32,37],
    "salary":[50000, 60000, 70000]
})
print(df)

shape: (3, 3)
┌─────────┬─────┬────────┐
│ name    ┆ age ┆ salary │
│ ---     ┆ --- ┆ ---    │
│ str     ┆ i64 ┆ i64    │
╞═════════╪═════╪════════╡
│ Alice   ┆ 25  ┆ 50000  │
│ Bob     ┆ 32  ┆ 60000  │
│ Charlie ┆ 37  ┆ 70000  │
└─────────┴─────┴────────┘


## 2. 🧮 Basic Operations

In [14]:
# select column 
df.select("name")

# filter  rows
df.filter(pl.col("age") >30)

# add a new column 
df.with_columns((pl.col("salary")*1.1).alias("updated_salary"))

# groupby and aggregation
df.group_by("age").agg(pl.col("salary").mean().alias("avg_salary"))

age,avg_salary
i64,f64
37,70000.0
25,50000.0
32,60000.0


## 3. 💤 Lazy API (Like Spark)

In [17]:
lazy_df =df.lazy()
result = (
    lazy_df
    .filter(pl.col("salary") > 55000)
    .with_columns((pl.col("salary")*1.1).alias("bonus_salary"))
    .collect()
)
print(result)

shape: (2, 4)
┌─────────┬─────┬────────┬──────────────┐
│ name    ┆ age ┆ salary ┆ bonus_salary │
│ ---     ┆ --- ┆ ---    ┆ ---          │
│ str     ┆ i64 ┆ i64    ┆ f64          │
╞═════════╪═════╪════════╪══════════════╡
│ Bob     ┆ 32  ┆ 60000  ┆ 66000.0      │
│ Charlie ┆ 37  ┆ 70000  ┆ 77000.0      │
└─────────┴─────┴────────┴──────────────┘


## 4 Joins

In [20]:
df1 = pl.DataFrame({"id":[1,2], "name":["Alice","Bob"]})
df2 = pl.DataFrame({"id":[1,2], "city":["NY", "LA"]})

joined_df = df1.join(df2, on="id", how="inner")
print(joined_df)

shape: (2, 3)
┌─────┬───────┬──────┐
│ id  ┆ name  ┆ city │
│ --- ┆ ---   ┆ ---  │
│ i64 ┆ str   ┆ str  │
╞═════╪═══════╪══════╡
│ 1   ┆ Alice ┆ NY   │
│ 2   ┆ Bob   ┆ LA   │
└─────┴───────┴──────┘


## 5. 📊 Pivot & Melt

In [23]:
df = pl.DataFrame({
    "year":[2020, 2020, 2021],
    "product":["A","B","C"],
    "sales": [100,200,150]
})
#pivot 
df.pivot(values="sales", index="year", columns="product")

# melt 
df.melt(id_vars="year", value_vars="sales")

  df.pivot(values="sales", index="year", columns="product")
  df.melt(id_vars="year", value_vars="sales")


year,variable,value
i64,str,i64
2020,"""sales""",100
2020,"""sales""",200
2021,"""sales""",150


## 6 Null Handling

In [26]:
df = pl.DataFrame({
    "name":["Alice", None,"Charlie"],
    "age":[25,None,37]
})
df.drop_nulls()
df.fill_null("Unknown")

name,age
str,i64
"""Alice""",25.0
"""Unknown""",
"""Charlie""",37.0


In [None]:
## 7 Advanced Expressions

In [28]:
df = pl.DataFrame({"values": [1,2,3,4]})
df.with_columns(
    (pl.col("values")**2).alias("squared")
)

values,squared
i64,i64
1,1
2,4
3,9
4,16


## 8 Multi-threading 
#### Polars automatically runs operations in parallel., making it much faster than pandas for large datasets. No manual thread handling required.

##  Performance vs Pandas 
### Multithreading = Polars("YES")
### Lazy Evaluation = Polar("YES")
### Memory Usage =Polar(lower)=> Pandas(HIgher)
### Speed(large data) => Polar (FAster) + (pandas=>Slower)
### API => Silimar but more expressive + (pandas=>mature)


### Pros of Polars
####  Blazing fast due to Rust backend
#### 🧠 Lazy evaluation helps optimize complex pipelines
#### 🧵 Multi-threaded out of the box
#### 🧽 Lower memory consumption
#### 📁 Supports Parquet, CSV, JSON, IPC formats

### Cons of Polars
#### 🧩 Smaller community than pandas
#### 🛠️ API still evolving — some edge cases may not be supported
#### 📉 Limited visualization — relies on external libraries like matplotlib
#### 🧪 Some features in pandas (e.g., complex indexing) are not available yet
#### 🧍‍♂️ Steeper learning curve if you're used to pandas

### 🔄 Interoperability
#### Convert between pandas and polars:

In [35]:
import pandas as pd
import polars as pl

# pandas to polars
df = pl.from_pandas(pd.DataFrame(...))

# polars to pandas
pandas_df = df.to_pandas()

ValueError: DataFrame constructor not properly called!

### 📚 Real-Life Use Cases
#### ETL Pipelines – High-speed processing for data extraction and transformation.

#### ML Preprocessing – Clean and process data faster before feeding into ML models.

#### Big Data – Load and process millions of rows in seconds.

#### Streaming / Log Analysis – Real-time insights using lazy API.

#### 🔎 Tips and Best Practices
#### Use lazy mode for heavy data pipelines.

#### Avoid loops, use expressions instead.

#### Filter first, then aggregate.

#### Use scan_csv() instead of read_csv() for lazy loading large CSVs.

#### Use with_columns() to chain multiple new column expressions efficiently.



### 📦 Additional Tools
#### polars-sql (experimental): SQL interface to Polars.

#### Works well with pyarrow for Parquet/Arrow file formats.

#### Easily integrates into Airflow, Dagster, or Prefect pipelines.



In [40]:
If you'd like, I can also help you:

Create a cheat sheet 📄

Build mini-projects using Polars 🛠️

Compare it to Spark or Dask 💥

Write your own utility functions with Polars 💡

SyntaxError: unterminated string literal (detected at line 1) (2538411504.py, line 1)