In [None]:
!pip install polars

In [1]:
import polars as pl

In [35]:
df = pl.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
})

### 1. Selecting and Filtering Columns

### Selecting Columns (`select`)
In Polars, you can select specific columns from a DataFrame using the `select` method. This allows you to extract a subset of the DataFrame.

**Example: Selecting Specific Columns**

In [22]:
# Select the 'name' and 'age' columns
selected_df = df.select(['name', 'age'])
print(selected_df)

shape: (3, 2)
┌─────────┬─────┐
│ name    ┆ age │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ Alice   ┆ 25  │
│ Bob     ┆ 30  │
│ Charlie ┆ 35  │
└─────────┴─────┘


### Filtering Rows (`filter`)
You can filter rows in a DataFrame by applying conditions using the `filter` method.

**Example: Filtering Rows**

In [21]:
# Filter rows where age is greater than 30
filtered_df = df.filter(pl.col('age') > 30)
print(filtered_df)

shape: (1, 5)
┌─────────┬─────┬─────────┬────────────┬───────────┐
│ name    ┆ age ┆ city    ┆ double_age ┆ age_group │
│ ---     ┆ --- ┆ ---     ┆ ---        ┆ ---       │
│ str     ┆ i64 ┆ str     ┆ i64        ┆ i64       │
╞═════════╪═════╪═════════╪════════════╪═══════════╡
│ Charlie ┆ 35  ┆ Chicago ┆ 70         ┆ 30        │
└─────────┴─────┴─────────┴────────────┴───────────┘


### 2. Expressions in Polars
Polars allows for efficient data transformations using expressions, which let you manipulate data within columns.

Adding or Modifying Columns (`with_columns` and `mutate`)
You can create new columns or modify existing ones using expressions. The `with_columns` or `mutate` method allows you to apply transformations.

**Example: Creating a New Column**

In [20]:
# Add a new column 'double_age' by multiplying 'age' by 2
df = df.with_columns((pl.col('age') * 2).alias('double_age'))
print(df)

shape: (3, 5)
┌─────────┬─────┬─────────────┬────────────┬───────────┐
│ name    ┆ age ┆ city        ┆ double_age ┆ age_group │
│ ---     ┆ --- ┆ ---         ┆ ---        ┆ ---       │
│ str     ┆ i64 ┆ str         ┆ i64        ┆ i64       │
╞═════════╪═════╪═════════════╪════════════╪═══════════╡
│ Alice   ┆ 25  ┆ New York    ┆ 50         ┆ 20        │
│ Bob     ┆ 30  ┆ Los Angeles ┆ 60         ┆ 30        │
│ Charlie ┆ 35  ┆ Chicago     ┆ 70         ┆ 30        │
└─────────┴─────┴─────────────┴────────────┴───────────┘


**Explanation:**

We use `pl.col("age")` to refer to the `age` column and multiply it by 2 to create a new column called `double_age`.

**Mutate**

The `mutate` method is very similar to `with_columns`, but it's often used for more complex transformations.

In [30]:
# Mutate 'age' to make it an integer age group (e.g., 20-29 -> '20s')
df = df.with_columns((pl.col('age') // 10 * 10).alias('age_group'))
print(df)

shape: (3, 5)
┌─────────┬─────┬─────────────┬────────────┬───────────┐
│ name    ┆ age ┆ city        ┆ double_age ┆ age_group │
│ ---     ┆ --- ┆ ---         ┆ ---        ┆ ---       │
│ str     ┆ i64 ┆ str         ┆ i64        ┆ i64       │
╞═════════╪═════╪═════════════╪════════════╪═══════════╡
│ Alice   ┆ 25  ┆ New York    ┆ 50         ┆ 20        │
│ Bob     ┆ 30  ┆ Los Angeles ┆ 60         ┆ 30        │
│ Charlie ┆ 35  ┆ Chicago     ┆ 70         ┆ 30        │
└─────────┴─────┴─────────────┴────────────┴───────────┘


**Alias**

The `alias` method is used to rename columns in expressions. It helps when you want to create a new column name after applying an expression.

In [31]:
# Create a new column 'double_age' and rename it using 'alias'
df = df.with_columns((pl.col('age') * 2).alias('double_age'))
print(df)

shape: (3, 5)
┌─────────┬─────┬─────────────┬────────────┬───────────┐
│ name    ┆ age ┆ city        ┆ double_age ┆ age_group │
│ ---     ┆ --- ┆ ---         ┆ ---        ┆ ---       │
│ str     ┆ i64 ┆ str         ┆ i64        ┆ i64       │
╞═════════╪═════╪═════════════╪════════════╪═══════════╡
│ Alice   ┆ 25  ┆ New York    ┆ 50         ┆ 20        │
│ Bob     ┆ 30  ┆ Los Angeles ┆ 60         ┆ 30        │
│ Charlie ┆ 35  ┆ Chicago     ┆ 70         ┆ 30        │
└─────────┴─────┴─────────────┴────────────┴───────────┘


### 3. Operations with LazyFrames (Lazy Execution)
Lazy execution allows you to build a sequence of transformations that will only be executed when you explicitly ask for the result (e.g., `with collect()`).

**Why LazyFrames?**

LazyFrames are useful for performance optimization, especially with large datasets. The operations are not executed immediately, but are optimized and then run together.

**Creating a LazyFrame**

You can create a **LazyFrame** from a `DataFrame` by calling `.lazy()`.

In [33]:
# Convert the DataFrame into a LazyFrame
lazy_df = df.lazy()

# Perform multiple transformations on the LazyFrame
result = lazy_df.filter(pl.col('age') > 30).select(['name', 'double_age'])

# Trigger execution by collecting the results
final_result = result.collect()
print(final_result)

shape: (1, 2)
┌─────────┬────────────┐
│ name    ┆ double_age │
│ ---     ┆ ---        │
│ str     ┆ i64        │
╞═════════╪════════════╡
│ Charlie ┆ 70         │
└─────────┴────────────┘


**Explanation:**

- A `LazyFrame` is created by calling `.lazy()` on a `DataFrame`. Then we apply transformations (filter and select) to it.
- The actual execution occurs only when `collect()` is called. This allows Polars to optimize the execution plan, improving performance.
