# <font color="#418FDE" size="6.5" uppercase>**Setting Up Polars**</font>

>Last update: 20260101.
    
By the end of this Lecture, you will be able to:
- Install Polars and configure a Python environment that supports both pandas and Polars. 
- Verify Polars installation by running simple DataFrame operations and inspecting results. 
- Organize project files and notebooks to keep pandas and Polars examples clear and comparable. 


## **1. Configuring Your Environment**

### **1.1. Creating a virtual environment**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_01_01.jpg?v=1767308727" width="250">



>* Use a virtual environment for isolated tools
>* Prevents version conflicts and protects other projects

>* Virtual environments isolate projects like separate labs
>* They ensure consistent library versions and reproducible results

>* Virtual environments improve reproducibility and team collaboration
>* They keep projects isolated, organized, and disposable



### **1.2. Installing pandas and Polars**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_01_02.jpg?v=1767308738" width="250">



>* Install pandas and Polars with a package manager
>* Use both together to compare performance and behavior

>* Choose library versions intentionally for your goals
>* Standardize versions to avoid collaboration and deployment issues

>* Run quick imports and tiny DataFrame tests
>* Use smoke tests to catch issues early



In [None]:
#@title Python Code - Installing pandas and Polars

# Install pandas and Polars inside Colab environment.
# Verify both libraries import correctly without hidden errors.
# Create tiny tables to confirm everything works.

# !pip install pandas polars --quiet.

# Import pandas and Polars libraries together.
import pandas as pd
import polars as pl

# Print simple confirmation message for both libraries.
print("pandas and Polars successfully imported together.")

# Create tiny pandas DataFrame with temperatures Fahrenheit.
pd_df = pd.DataFrame({"city": ["Denver", "Miami"], "temp_F": [68, 86]})

# Create tiny Polars DataFrame with same temperatures Fahrenheit.
pl_df = pl.DataFrame({"city": ["Denver", "Miami"], "temp_F": [68, 86]})

# Print pandas DataFrame summary with shape and columns.
print("pandas DataFrame:", pd_df.shape, list(pd_df.columns))

# Print Polars DataFrame summary with shape and columns.
print("Polars DataFrame:", pl_df.shape, pl_df.columns)

# Show one pandas row converted to Celsius degrees.
pd_df["temp_C"] = (pd_df["temp_F"] - 32) * 5 / 9

# Show one Polars row converted to Celsius degrees.
pl_df = pl_df.with_columns(((pl.col("temp_F") - 32) * 5 / 9).alias("temp_C"))

# Print final confirmation with both converted tables.
print("pandas converted:", pd_df.to_dict(orient="records"))
print("Polars converted:", pl_df.to_dicts())



### **1.3. Managing versions and dependencies**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_01_03.jpg?v=1767308755" width="250">



>* Treat your setup as a tested snapshot
>* Snapshots ensure stability and reproducible future analyses

>* Record exact pandas and Polars versions used
>* Documentation ensures reproducible, consistent environments across projects

>* Plan updates carefully and test before upgrading
>* Actively manage conflicts to keep work reproducible



In [None]:
#@title Python Code - Managing versions and dependencies

# Demonstrate checking installed pandas and Polars versions together.
# Show how to record versions for reproducible future environments.
# Illustrate simple manual dependency snapshot using a small dictionary.

# !pip install pandas polars pyarrow.

# Import pandas and polars to inspect installed versions.
import pandas as pd
import polars as pl
import sys

# Build a small dictionary capturing key environment version information.
env_snapshot = {
    "python_version": sys.version.split()[0],
    "pandas_version": pd.__version__,
    "polars_version": pl.__version__,
}

# Print a clear header describing the environment snapshot content.
print("Environment snapshot for pandas and Polars dependencies:")

# Loop through snapshot items and print each recorded version line.
for name, version in env_snapshot.items():
    print(f"- {name}: {version}")



## **2. Importing Polars Basics**

### **2.1. Importing Polars Aliases**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_02_01.jpg?v=1767308778" width="250">



>* Choose a short, consistent alias for Polars
>* Alias improves readability and separates Polars from others

>* Alias separates Polars code from other libraries
>* Consistency reduces mental effort and builds familiarity

>* Shared Polars aliases improve teamwork and code reviews
>* Consistent aliases reduce errors and aid long-term maintenance



In [None]:
#@title Python Code - Importing Polars Aliases

# Show how to import Polars with aliases.
# Compare Polars alias with pandas alias usage.
# Confirm imports by printing simple DataFrame outputs.

# !pip install polars pandas.

# Import pandas using common alias pd.
import pandas as pd
# Import polars using common alias pl.
import polars as pl

# Create small pandas DataFrame using pd alias.
pd_df = pd.DataFrame({"city": ["Boston", "Dallas"], "sales_dollars": [120.0, 95.5]})
# Create small polars DataFrame using pl alias.
pl_df = pl.DataFrame({"city": ["Boston", "Dallas"], "sales_dollars": [120.0, 95.5]})

# Print pandas DataFrame label and content.
print("Pandas DataFrame using pd alias:")
print(pd_df)

# Print polars DataFrame label and content.
print("Polars DataFrame using pl alias:")
print(pl_df)

# Show that aliases help distinguish libraries quickly.
print("Notice pd and pl prefixes clearly separate pandas and Polars.")



### **2.2. Polars Version Check**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_02_02.jpg?v=1767308793" width="250">



>* Check the Polars version after importing it
>* Version info supports compatibility, debugging, and reproducibility

>* Use version info as a debugging clue
>* Compare versions to explain missing or changed features

>* Recording versions helps track changes over time
>* Supports reproducible, maintainable, professional-quality data projects



In [None]:
#@title Python Code - Polars Version Check

# Demonstrate checking installed Polars version clearly.
# Show how version helps compare tutorials and environments.
# Encourage recording version for future reproducibility.

# !pip install polars --quiet  # Uncomment if Polars is not installed.

# Import polars with a short convenient alias.
import polars as pl

# Get the current Polars version string.
current_version = pl.__version__

# Print a clear message showing the version.
print("Current Polars version:", current_version)

# Imagine a tutorial requiring a specific minimum version.
required_version = "1.0.0"

# Compare current version string with required version string.
version_ok = current_version >= required_version

# Print whether current version meets the tutorial requirement.
print("Meets tutorial minimum version:", version_ok)

# Suggest recording the version inside a simple project note.
project_note = f"Project used Polars version {current_version} for all monthly reports."

# Print the project note as an example documentation line.
print("Example project note:", project_note)



### **2.3. Polars in Practice**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_02_03.jpg?v=1767308811" width="250">



>* Create a tiny, realistic Polars DataFrame example
>* Check labeled columns, row numbers, and inferred types

>* Filter rows and select or add columns
>* Check fast, correct results to confirm Polars

>* Use Polars summaries to check stats, types
>* Compare outputs with expectations to confirm correctness



In [None]:
#@title Python Code - Polars in Practice

# Demonstrate basic Polars DataFrame creation and display for verification.
# Show simple filtering, column selection, and new column calculation operations.
# Inspect summary statistics to confirm Polars behavior and data interpretation.

# !pip install polars --quiet.

# Import polars library using conventional alias pl.
import polars as pl

# Create small sales data dictionary with simple example values.
sales_data = {
    "order_id": [1, 2, 3, 4],
    "customer": ["Alice", "Bob", "Alice", "Dana"],
    "category": ["books", "electronics", "books", "toys"],
    "quantity": [1, 3, 2, 5],
    "price_usd": [12.5, 199.0, 8.0, 15.0],
}

# Build Polars DataFrame from dictionary data structure.
df = pl.DataFrame(sales_data)

# Display original DataFrame to visually confirm structure.
print("Original DataFrame:")
print(df)

# Filter rows where quantity exceeds two units threshold.
filtered_df = df.filter(pl.col("quantity") > 2)

# Add total_revenue column as quantity multiplied by price_usd.
filtered_with_revenue = filtered_df.with_columns(
    (pl.col("quantity") * pl.col("price_usd")).alias("total_revenue_usd")
)

# Select subset columns to focus on customer and revenue information.
selected_columns = filtered_with_revenue.select([
    "order_id",
    "customer",
    "total_revenue_usd",
])

# Print transformed DataFrame to verify operations executed correctly.
print("\nFiltered with revenue:")
print(selected_columns)

# Show descriptive statistics for numeric columns in original DataFrame.
stats = df.describe()

# Print summary statistics to confirm numeric interpretation and plausibility.
print("\nSummary statistics:")
print(stats)



## **3. Consistent Sample Data Loading**

### **3.1. CSV Loading Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_03_01.jpg?v=1767308837" width="250">



>* CSV loading choices strongly affect library comparisons
>* Use consistent settings to avoid misleading differences

>* Make CSV loading a repeatable, documented step
>* Standardize options so pandas and Polars match

>* Centralized CSV loading keeps notebooks consistent and aligned
>* Shared loading rules improve comparability, maintenance, collaboration



In [None]:
#@title Python Code - CSV Loading Basics

# Demonstrate consistent CSV loading with pandas and Polars together.
# Show how shared options keep data types and values aligned.
# Provide a simple reusable loader function for both libraries.

# !pip install polars pandas pyarrow.

# Import required libraries for CSV handling and dataframes.
import pandas as pd
import polars as pl
from io import StringIO

# Create a small CSV text with deliberate quirks included.
csv_text = """order_id;order_date;quantity;price_usd
1;2024-01-01;3;19.99
2;2024-01-02;;29.50
3;2024-01-03;5;unknown
"""

# Wrap the CSV text with StringIO for file like behavior.
csv_buffer = StringIO(csv_text)

# Define shared CSV options for delimiter and missing values.
shared_read_options = {
    "sep": ";",
    "na_values": ["", "unknown"],
}

# Load CSV with pandas using shared options and explicit dtypes.
df_pandas = pd.read_csv(
    csv_buffer,
    **shared_read_options,
    dtype={"order_id": "int64", "quantity": "float64"},
    parse_dates=["order_date"],
)

# Reset buffer position before reading again with Polars.
csv_buffer.seek(0)

# Load CSV with Polars using matching options and schema.
df_polars = pl.read_csv(
    csv_buffer,
    separator=";",
    null_values=["", "unknown"],
    dtypes={"order_id": pl.Int64, "quantity": pl.Float64},
    try_parse_dates=True,
)

# Print concise pandas info showing column types and null handling.
print("Pandas dtypes and head:")
print(df_pandas.dtypes)
print(df_pandas.head())

# Print concise Polars schema and head for direct comparison.
print("\nPolars schema and head:")
print(df_polars.schema)
print(df_polars.head())



### **3.2. Schema Alignment Across Libraries**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_03_02.jpg?v=1767308859" width="250">



>* Keep column names and data types consistent
>* Aligned schemas make cross-library comparisons fair

>* Define each columnâ€™s meaning and desired type
>* Verify both libraries match this schema before comparing

>* Handle missing and categorical data consistently
>* Document schemas to ensure fair library comparisons



In [None]:
#@title Python Code - Schema Alignment Across Libraries

# Demonstrate matching schemas between pandas and Polars for fair comparisons.
# Show how the same CSV can load with different inferred column types.
# Fix mismatched types so both libraries share an aligned, documented schema.

# !pip install pandas polars pyarrow.

# Import required libraries for pandas and Polars usage.
import pandas as pd
import polars as pl

# Create small CSV content with mixed numeric and date values.
csv_text = """order_id,order_date,quantity,unit_price
A001,2024-01-01,2,19.99
A002,2024-01-02,3,25.50
A003,2024-01-03,1,17.00
"""

# Save CSV content into a temporary file for consistent loading.
csv_path = "orders_sample.csv"
with open(csv_path, "w", encoding="utf-8") as f:
    f.write(csv_text)

# Load CSV using pandas with default type inference behavior.
df_pd_raw = pd.read_csv(csv_path)

# Load CSV using Polars with default type inference behavior.
df_pl_raw = pl.read_csv(csv_path)

# Print pandas inferred dtypes to inspect initial schema.
print("Pandas dtypes before alignment:")
print(df_pd_raw.dtypes)

# Print Polars inferred dtypes to inspect initial schema.
print("\nPolars dtypes before alignment:")
print(df_pl_raw.dtypes)

# Define desired schema where identifiers are strings and dates are datetimes.
desired_pd_dtypes = {"order_id": "string", "quantity": "int64"}

# Apply pandas conversions for identifiers and numeric quantity column.
df_pd_aligned = df_pd_raw.astype(desired_pd_dtypes)

# Convert pandas order_date column into proper datetime type.
df_pd_aligned["order_date"] = pd.to_datetime(df_pd_aligned["order_date"])

# Define Polars schema using explicit data types for each column.
pl_schema = {"order_id": pl.Utf8, "order_date": pl.Date, "quantity": pl.Int64, "unit_price": pl.Float64}

# Reload CSV using Polars with explicit schema for alignment.
df_pl_aligned = pl.read_csv(csv_path, schema=pl_schema)

# Print aligned pandas dtypes to confirm schema corrections.
print("\nPandas dtypes after alignment:")
print(df_pd_aligned.dtypes)

# Print aligned Polars dtypes to confirm schema corrections.
print("\nPolars dtypes after alignment:")
print(df_pl_aligned.dtypes)



### **3.3. Reusable Test Datasets**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_01/Lecture_C/image_03_03.jpg?v=1767308882" width="250">



>* Curate a few small, reusable practice datasets
>* Reuse them to compare pandas and Polars behavior

>* Store shared datasets in a labeled data folder
>* Use same files in paired pandas, Polars notebooks

>* Shared datasets improve reproducibility and collaboration workflows
>* Curated data library grows with real-world scenarios



# <font color="#418FDE" size="6.5" uppercase>**Setting Up Polars**</font>


In this lecture, you learned to:
- Install Polars and configure a Python environment that supports both pandas and Polars. 
- Verify Polars installation by running simple DataFrame operations and inspecting results. 
- Organize project files and notebooks to keep pandas and Polars examples clear and comparable. 

In the next Module (Module 2), we will go over 'Core Data Operations'