# <font color="#418FDE" size="6.5" uppercase>**DataFrames and Series**</font>

>Last update: 20251227.
    
By the end of this Lecture, you will be able to:
- Create Polars DataFrames and Series from common Python and file inputs used in Pandas workflows. 
- Inspect Polars DataFrame schemas and summarize data using built‑in methods. 
- Explain how Polars’ data types and null handling compare to Pandas. 


## **1. Building Polars DataFrames**

### **1.1. From Python Collections**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_01_01.jpg?v=1766892459" width="250">



>* Polars builds DataFrames from familiar Python collections
>* Keys become columns, list items become rows

>* Use dictionaries of lists to define columns
>* Polars aligns list positions into rows for analysis

>* Polars handles lists of record-like dictionaries
>* Design data shapes for easy DataFrame creation



In [None]:
#@title Python Code - From Python Collections

# Demonstrate creating Polars DataFrames from simple Python collections.
# Show dictionary of lists and list of dictionaries as DataFrame inputs.
# Print small DataFrames to connect Python objects with tabular structures.

import polars as pl

# Create simple Python lists representing order information for a small store.
order_ids = [101, 102, 103, 104]
regions = ["West", "East", "South", "West"]
order_totals_usd = [29.99, 49.50, 15.75, 99.10]

# Build a dictionary where keys are column names and values are data lists.
orders_dict = {
    "order_id": order_ids,
    "region": regions,
    "total_usd": order_totals_usd,
}

# Create a Polars DataFrame directly from the dictionary of lists.
df_from_dict = pl.DataFrame(orders_dict)

# Print the resulting DataFrame to see columns and rows clearly.
print("DataFrame from dictionary of lists:")
print(df_from_dict)

# Create a list of dictionaries, each dictionary representing one order row.
orders_records = [
    {"order_id": 201, "region": "North", "total_usd": 10.00},
    {"order_id": 202, "region": "West", "total_usd": 75.25},
]

# Create another Polars DataFrame from the list of record dictionaries.
df_from_records = pl.DataFrame(orders_records)

# Print the second DataFrame to compare input shapes and results.
print("\nDataFrame from list of dictionaries:")
print(df_from_records)



### **1.2. Loading CSV and Parquet**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_01_02.jpg?v=1766892478" width="250">



>* Polars loads CSV and Parquet into DataFrames
>* Optimized for large files, memory, and speed

>* Polars reads CSV headers and infers column types
>* You can override types, then use the DataFrame

>* Parquet stores typed, columnar data for Polars
>* Enables fast, reliable loading and large-scale analysis



In [None]:
#@title Python Code - Loading CSV and Parquet

# Demonstrate loading CSV files into Polars DataFrames in a simple beginner friendly way.
# Demonstrate loading Parquet files into Polars DataFrames using the same sample data.
# Compare inferred data types between CSV loading and Parquet loading in Polars.

import polars as pl
import pandas as pd

# Create a small pandas DataFrame that mimics daily sales in dollars.
# This DataFrame will be saved as both CSV and Parquet files locally.
sales_pd = pd.DataFrame({"day": ["Mon", "Tue", "Wed"], "orders": [10, 14, 9], "revenue_usd": [250.5, 310.0, 199.9]})

# Save the pandas DataFrame as a CSV file for Polars loading demonstration.
# Save the same DataFrame as a Parquet file to compare loading behavior.
sales_pd.to_csv("daily_sales.csv", index=False)
sales_pd.to_parquet("daily_sales.parquet", index=False)

# Load the CSV file using Polars read_csv function into a new DataFrame.
# Print the DataFrame and its schema to inspect inferred column data types.
df_csv = pl.read_csv("daily_sales.csv")
print("CSV DataFrame loaded by Polars:")
print(df_csv)

# Display the schema for the CSV DataFrame to show inferred types clearly.
# This helps beginners see how Polars guesses column data types automatically.
print("\nCSV DataFrame schema:")
print(df_csv.schema)

# Load the Parquet file using Polars read_parquet function into another DataFrame.
# Print the DataFrame and its schema to compare with the CSV loading behavior.
df_parquet = pl.read_parquet("daily_sales.parquet")
print("\nParquet DataFrame loaded by Polars:")
print(df_parquet)

# Display the schema for the Parquet DataFrame which uses stored type information.
# This highlights that Parquet provides exact types without additional inference steps.
print("\nParquet DataFrame schema:")
print(df_parquet.schema)



### **1.3. From existing Pandas DataFrames**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_01_03.jpg?v=1766892494" width="250">



>* Convert existing Pandas DataFrames into Polars
>* Keep structure while gaining faster, columnar performance

>* Prepare data in Pandas, then convert to Polars
>* Use Polars for heavy analysis and performance

>* Convert Pandas outputs to Polars for processing
>* Adopt Polars gradually without changing existing interfaces



In [None]:
#@title Python Code - From existing Pandas DataFrames

# Demonstrate converting Pandas DataFrame into Polars DataFrame.
# Show a simple workflow starting with familiar Pandas objects.
# Highlight that structure and values remain consistent after conversion.

import pandas as pd
import polars as pl

# Create a small Pandas DataFrame representing monthly sales in dollars.
pd_df = pd.DataFrame({"month": ["Jan", "Feb", "Mar"], "sales_usd": [1200, 1500, 1700]})

# Display the original Pandas DataFrame to confirm its structure and values.
print("Pandas DataFrame:")
print(pd_df)

# Convert the Pandas DataFrame into a Polars DataFrame using from_pandas.
pl_df = pl.from_pandas(pd_df)

# Show the Polars DataFrame to verify that columns and values are preserved.
print("\nPolars DataFrame:")
print(pl_df)

# Perform a simple Polars operation to demonstrate continued analysis after conversion.
print("\nTotal quarterly sales using Polars:")
print(pl_df["sales_usd"].sum())



## **2. Inspecting Polars DataFrames**

### **2.1. Quick DataFrame Peeks**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_02_01.jpg?v=1766892512" width="250">



>* Use quick peeks to verify DataFrame structure
>* Polars shows small slices efficiently, even for big data

>* Small peeks quickly reveal messy, inconsistent data
>* Repeated samples guide cleaning, typing, and trust decisions

>* Use small, on-screen slices for huge datasets
>* Frequent peeks catch issues early and boost performance



In [None]:
#@title Python Code - Quick DataFrame Peeks

# Demonstrate quick Polars DataFrame peeks for fast initial inspection.
# Show how to view a small representative slice of a larger dataset.
# Highlight lightweight methods that avoid printing every single available row.

import polars as pl

# Create a small example DataFrame representing simple store sales data.
data = {"store": ["North", "South", "East", "West", "North"],
        "day": ["Mon", "Tue", "Wed", "Thu", "Fri"],
        "revenue_usd": [120.5, 99.0, 143.2, 87.3, 160.0]}

# Build the Polars DataFrame from the dictionary of column lists.
df = pl.DataFrame(data)

# Show a quick peek at the first few rows using head method.
print("First rows quick peek:")
print(df.head(3))

# Show a quick peek at the last few rows using tail method.
print("\nLast rows quick peek:")
print(df.tail(2))

# Show a random sample of rows to quickly inspect varied values.
print("\nRandom sample quick peek:")
print(df.sample(n=3, with_replacement=False))



### **2.2. Descriptive Statistics Overview**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_02_02.jpg?v=1766892534" width="250">



>* Use Polars summaries to understand data distributions
>* Run stats early to validate data quality

>* Polars summarizes each column using type-aware stats
>* Numeric and non-numeric columns get appropriate measures

>* Use stats flexibly to compare data patterns
>* Iterate summaries on subsets to refine insights



In [None]:
#@title Python Code - Descriptive Statistics Overview

# Show simple Polars descriptive statistics for numeric and non numeric columns.
# Compare describe output with a filtered subset of the same DataFrame.
# Help beginners see how summaries reveal distributions and potential unusual values.

import polars as pl

# Create a small DataFrame with monthly sales and region categories.
data = {
    "month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
    "region": ["East", "West", "East", "West", "East", "West"],
    "sales_dollars": [1200, 800, 1500, 4000, 900, 700],
}

# Build the Polars DataFrame from the dictionary data.
df = pl.DataFrame(data)

# Show the full DataFrame to understand the raw values.
print("Full sales DataFrame:")
print(df)

# Compute descriptive statistics for all columns using describe method.
print("\nDescriptive statistics for all columns:")
summary_all = df.describe()
print(summary_all)

# Filter to East region only and recompute descriptive statistics.
print("\nDescriptive statistics for East region only:")
summary_east = df.filter(pl.col("region") == "East").describe()
print(summary_east)



### **2.3. Schema and Dtypes**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_02_03.jpg?v=1766892546" width="250">



>* Schema lists each column and its type
>* Checking schema prevents type mistakes in analysis

>* Column dtypes affect speed, memory, and methods
>* Check and adjust dtypes for accurate analysis

>* Consistent schemas improve data quality and collaboration
>* Regular checks prevent join issues and silent errors



In [None]:
#@title Python Code - Schema and Dtypes

# Show Polars DataFrame schema and dtypes clearly.
# Compare inferred dtypes with manual casting choices.
# Help beginners inspect and adjust column data types.

import polars as pl

# Create a simple DataFrame with mixed column types.
# Prices use floats, quantities use integers, dates use strings.

df = pl.DataFrame({"item": ["apple", "banana", "orange"], "price_usd": [1.5, 0.75, 2.25], "quantity": [10, 5, 8], "sale_date": ["2024-01-01", "2024-01-02", "2024-01-03"]})

# Show the DataFrame to understand the raw values.
# This helps connect values with their inferred types.

print("Original DataFrame values:")
print(df)

# Inspect the schema to see column names and Polars dtypes.
# This is similar to a contract for your data.

print("\nInferred schema and dtypes:")
print(df.schema)

# Cast the sale_date column into a proper Date dtype.
# This enables time based operations later.

df_cast = df.with_columns(pl.col("sale_date").str.strptime(pl.Date, strict=False))

# Inspect the new schema after casting the date column.
# Notice the sale_date dtype changed from Utf8 to Date.

print("\nSchema after casting sale_date to Date:")
print(df_cast.schema)



## **3. Types and Nulls**

### **3.1. Polars Type System**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_03_01.jpg?v=1766892563" width="250">



>* Polars enforces clear, consistent column data types
>* Strict typing enables efficient, scalable query execution

>* Polars keeps column types strict and unchanged
>* This prevents subtle bugs and clarifies analysis

>* Specialized types model real-world, nested, temporal data
>* Expressive types reduce ambiguity and boost performance



In [None]:
#@title Python Code - Polars Type System

# Show strict Polars column types compared with flexible Pandas behavior.
# Create small example tables and inspect their inferred data types.
# Demonstrate how Polars preserves strings instead of silently converting them.

import pandas as pd
import polars as pl

pandas_data = {"quantity": [1, 2, 3], "price": ["9.99", "10.50", "8.75"]}
polars_data = {"quantity": [1, 2, 3], "price": ["9.99", "10.50", "8.75"]}

pandas_df = pd.DataFrame(pandas_data)
polars_df = pl.DataFrame(polars_data)

print("Pandas dtypes for each column:")
print(pandas_df.dtypes)

print("\nPolars dtypes for each column:")
print(polars_df.dtypes)

polars_casted = polars_df.with_columns(pl.col("price").cast(pl.Float64))
print("\nPolars dtypes after explicit cast:")
print(polars_casted.dtypes)



### **3.2. Handling nulls and NaNs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_03_02.jpg?v=1766892576" width="250">



>* Polars separates nulls from floating-point NaNs clearly
>* Nulls mean no value; NaNs mean invalid float

>* Aggregations usually skip nulls instead of counting
>* Flexible options to drop, fill, or keep nulls

>* Nulls and NaNs mean different missing situations
>* Polars lets you handle nulls and NaNs separately



In [None]:
#@title Python Code - Handling nulls and NaNs

# Show Polars nulls and NaNs differences clearly.
# Create small DataFrame with nulls and NaNs values.
# Compare basic operations handling nulls and NaNs.

import polars as pl
import numpy as np

# Create simple data including nulls and NaNs values.
data = {
    "temp_f": [72.0, np.nan, 68.0, None],
    "city": ["Austin", "Boston", None, "Denver"],
}

# Build Polars DataFrame from dictionary data.
df = pl.DataFrame(data)

# Show DataFrame and schema for quick inspection.
print("DataFrame with nulls and NaNs:")
print(df)
print("\nSchema showing column data types:")
print(df.schema)

# Check which rows contain nulls or NaNs values.
print("\nIs null in temp_f column:")
print(df["temp_f"].is_null())
print("Is NaN in temp_f column:")
print(df["temp_f"].is_nan())

# Compute mean temperature ignoring nulls and NaNs values.
mean_temp = df["temp_f"].mean()
print("\nMean temperature ignoring nulls and NaNs:", mean_temp)



### **3.3. Casting and Coercion**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas to Polars Migration/Module_02/Lecture_A/image_03_03.jpg?v=1766892590" width="250">



>* Polars enforces clear, consistent column data types
>* You must explicitly choose and apply type conversions

>* Polars uses explicit casts with strict, clear rules
>* Invalid or out-of-range values become nulls

>* Polars promotes numeric types to safe common types
>* Mixed incompatible types raise errors or produce nulls



In [None]:
#@title Python Code - Casting and Coercion

# Demonstrate Polars casting behavior with mixed type columns and explicit conversions.
# Show how invalid conversions become null instead of silent incorrect values.
# Compare automatic coercion in expressions with explicit casting operations.

import polars as pl

# Create a DataFrame with mixed numeric and string values.
# Polars will infer a string type for the mixed column.
# This simulates messy real world imported data.

sales_df = pl.DataFrame({"order_id": ["1001", "1002", "oops"], "amount_dollars": ["10.5", "20.0", "bad"]})

# Attempt to cast order_id to integer and observe null for invalid value.
# Strict casting will turn non numeric strings into null values.
# This avoids silent incorrect conversions.

order_cast = sales_df.with_columns(pl.col("order_id").cast(pl.Int64, strict=False).alias("order_id_int"))

# Attempt to cast amount_dollars to float and then to integer dollars.
# Invalid numeric strings become null during the float conversion.
# Integer casting then truncates fractional cents values.

amount_cast = order_cast.with_columns([pl.col("amount_dollars").cast(pl.Float64, strict=False).alias("amount_float"), pl.col("amount_dollars").cast(pl.Float64, strict=False).cast(pl.Int64, strict=False).alias("amount_int_dollars")])

# Show final DataFrame to inspect casting and resulting null values.
# Output remains short and readable for beginners.
# Notice how invalid entries become null instead of incorrect numbers.

print(amount_cast)



# <font color="#418FDE" size="6.5" uppercase>**DataFrames and Series**</font>


In this lecture, you learned to:
- Create Polars DataFrames and Series from common Python and file inputs used in Pandas workflows. 
- Inspect Polars DataFrame schemas and summarize data using built‑in methods. 
- Explain how Polars’ data types and null handling compare to Pandas. 

In the next Lecture (Lecture B), we will go over 'Selecting and Filtering'