In [None]:
!pip install polars

In [1]:
import polars as pl

In [31]:
df = pl.DataFrame({
    'product': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Orange'],
    'category': ['Fruit', 'Fruit', 'Fruit', 'Fruit', 'Fruit', 'Fruit'],
    'sales': [100, 150, 200, 50, 120, 80]
})

### 1. Grouping and Aggregation (group_by, agg)

Polars allows you to perform group-based operations with `group_by`, followed by aggregations using `agg`.

**Example: Grouping and Aggregating Data**

Let's assume we have a dataset of sales transactions and we want to group the data by product category and calculate the total sales for each category.

In [24]:
# Group by category and calculate the total sales per category
result = df.group_by('category').agg(
    pl.col('sales').sum().alias('total_sales')
)
print(result)

shape: (1, 2)
┌──────────┬─────────────┐
│ category ┆ total_sales │
│ ---      ┆ ---         │
│ str      ┆ i64         │
╞══════════╪═════════════╡
│ Fruit    ┆ 700         │
└──────────┴─────────────┘


**Explanation:**

- The `groupby("category")` groups the data by the `category` column.
- `agg([pl.col("sales").sum().alias("total_sales")])` calculates the total sales for each category.
You can aggregate with multiple operations like sum, mean, min, max, count, etc.

You can aggregate with multiple operations like `sum`, `mean`, `min`, `max`, `count`, etc.

### 2. Sorting and Joins (sort, join)

**Sorting (`sort`)**

You can sort a DataFrame by one or more columns using `sort`. By default, sorting is done in ascending order.

**Example: Sorting a DataFrame by Sales**

In [25]:
# Sort by sales in descending order
sorted_df = df.sort('sales')
print(sorted_df)

shape: (6, 3)
┌─────────┬──────────┬───────┐
│ product ┆ category ┆ sales │
│ ---     ┆ ---      ┆ ---   │
│ str     ┆ str      ┆ i64   │
╞═════════╪══════════╪═══════╡
│ Orange  ┆ Fruit    ┆ 50    │
│ Orange  ┆ Fruit    ┆ 80    │
│ Apple   ┆ Fruit    ┆ 100   │
│ Banana  ┆ Fruit    ┆ 120   │
│ Banana  ┆ Fruit    ┆ 150   │
│ Apple   ┆ Fruit    ┆ 200   │
└─────────┴──────────┴───────┘


**Joining DataFrames (`join`)**

You can perform SQL-style joins between DataFrames using the `join` method. Polars supports inner, left, right, and outer joins.

**Example: Inner Join**

Let's join two DataFrames: one containing sales data and another containing product details.

In [32]:
products_df = pl.DataFrame({
    'product': ['Apple', 'Banana', 'Orange'],
    'category': ['Fruit', 'Fruit', 'Fruit'],
    'price': [1.5, 1.2, 1.0]
})

# Perform an inner join on the 'product' column
joined_df = df.join(products_df, on='product', how='inner')

print(joined_df)

shape: (6, 5)
┌─────────┬──────────┬───────┬────────────────┬───────┐
│ product ┆ category ┆ sales ┆ category_right ┆ price │
│ ---     ┆ ---      ┆ ---   ┆ ---            ┆ ---   │
│ str     ┆ str      ┆ i64   ┆ str            ┆ f64   │
╞═════════╪══════════╪═══════╪════════════════╪═══════╡
│ Apple   ┆ Fruit    ┆ 100   ┆ Fruit          ┆ 1.5   │
│ Banana  ┆ Fruit    ┆ 150   ┆ Fruit          ┆ 1.2   │
│ Apple   ┆ Fruit    ┆ 200   ┆ Fruit          ┆ 1.5   │
│ Orange  ┆ Fruit    ┆ 50    ┆ Fruit          ┆ 1.0   │
│ Banana  ┆ Fruit    ┆ 120   ┆ Fruit          ┆ 1.2   │
│ Orange  ┆ Fruit    ┆ 80    ┆ Fruit          ┆ 1.0   │
└─────────┴──────────┴───────┴────────────────┴───────┘


**Explanation:**

The `join` operation merges the two DataFrames on the product column, keeping only the rows where the `product` exists in both DataFrames (`how="inner"`).

You can also perform `left`, `right`, and `outer` joins based on the requirement.