### Project 1: Movie Ratings Analysis
#### 📝 Goal:
####  Analyze IMDb-style movie ratings to find:

####  Top-rated movies

####  Average rating per genre

####  Most active users

### 📦 Dataset:
#### Simulated or real CSV with columns:
####  movie_id, title, genre, user_id, rating, timestamp

In [7]:
import pandas as pd

# Sample movie rating data
data = {
    "movie_id": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
    "title": [
        "Inception", "The Matrix", "The Godfather", "The Dark Knight", "Inception",
        "The Godfather", "The Matrix", "The Dark Knight", "Inception", "The Godfather"
    ],
    "genre": [
        "Sci-Fi", "Sci-Fi", "Crime", "Action", "Sci-Fi",
        "Crime", "Sci-Fi", "Action", "Sci-Fi", "Crime"
    ],
    "user_id": [1, 2, 1, 3, 2, 2, 1, 2, 3, 3],
    "rating": [4.5, 5.0, 5.0, 4.7, 4.0, 4.8, 4.9, 4.8, 5.0, 4.9],
    "timestamp": [
        "2022-06-01", "2022-06-02", "2022-06-03", "2022-06-01", "2022-06-02",
        "2022-06-04", "2022-06-03", "2022-06-05", "2022-06-04", "2022-06-06"
    ]
}

# Create DataFrame and save as CSV
df = pd.DataFrame(data)
df.to_csv("movie_ratings.csv", index=False)

print("✅ movie_ratings.csv created successfully!")


✅ movie_ratings.csv created successfully!


In [15]:
import polars as pl

# Load dataset
df = pl.read_csv("movie_ratings.csv")

# Top 10 highest rated movies
top_movies = (
    df.group_by("title")
    .agg(pl.col("rating").mean().alias("avg_rating"))
    .sort("avg_rating", descending=True)
    .head(10)
)

# Average rating per genre
avg_genre = df.group_by("genre").agg(pl.col("rating").mean().alias("genre_avg"))

# Most active users
active_users = df.group_by("user_id").agg(pl.count().alias("rating_count")).sort("rating_count", descending=True)


  active_users = df.group_by("user_id").agg(pl.count().alias("rating_count")).sort("rating_count", descending=True)


#### 📊 Project 2: Sales Dashboard from CSV
#### 📝 Goal:
#### Generate insights from daily sales:

#### Total daily sales

#### Best selling product

#### Sales trends by category



In [22]:
import polars as pl

# Load the CSV
df = pl.read_csv("sales_data.csv", try_parse_dates=True)

# 📅 Total Daily Sales
daily_sales = (
    df.group_by("date")
    .agg(pl.col("amount").sum().alias("total_sales"))
    .sort("date")
)
print("📅 Total Daily Sales:\n", daily_sales)

# 🥇 Best Selling Product (by total amount)
best_products = (
    df.group_by("product")
    .agg(pl.col("amount").sum().alias("total_revenue"))
    .sort("total_revenue", descending=True)
)
print("\n🥇 Best Selling Products:\n", best_products)

# 📈 Sales Trends by Category
category_trends = (
    df.group_by(["date", "category"])
    .agg(pl.col("amount").sum().alias("daily_category_sales"))
    .sort(["date", "category"])
)
print("\n📈 Sales Trends by Category:\n", category_trends)


📅 Total Daily Sales:
 shape: (4, 2)
┌────────────┬─────────────┐
│ date       ┆ total_sales │
│ ---        ┆ ---         │
│ date       ┆ i64         │
╞════════════╪═════════════╡
│ 2023-06-01 ┆ 1618        │
│ 2023-06-02 ┆ 771         │
│ 2023-06-03 ┆ 959         │
│ 2023-06-04 ┆ 1812        │
└────────────┴─────────────┘

🥇 Best Selling Products:
 shape: (6, 2)
┌────────────┬───────────────┐
│ product    ┆ total_revenue │
│ ---        ┆ ---           │
│ str        ┆ i64           │
╞════════════╪═══════════════╡
│ Phone      ┆ 2396          │
│ Laptop     ┆ 1998          │
│ Headphones ┆ 300           │
│ Tablet     ┆ 299           │
│ T-Shirt    ┆ 88            │
│ Jeans      ┆ 79            │
└────────────┴───────────────┘

📈 Sales Trends by Category:
 shape: (8, 3)
┌────────────┬─────────────┬──────────────────────┐
│ date       ┆ category    ┆ daily_category_sales │
│ ---        ┆ ---         ┆ ---                  │
│ date       ┆ str         ┆ i64                  │
╞════════

### 3 Weather 

In [29]:
import polars as pl

# Load the dataset
df = pl.read_csv("weather.csv", try_parse_dates=True)

# 🔧 Fill missing temperature values (forward fill)
df_clean = df.with_columns([
    pl.col("temperature").fill_null(strategy="forward").alias("temperature")
])

# 🔥 Hottest 5 Days
hottest_days = df_clean.sort("temperature", descending=True).head(5)
print("🔥 Hottest Days:\n", hottest_days)

# 📆 Extract Month & Calculate Monthly Average Temperature
df_with_month = df_clean.with_columns(
    pl.col("date").dt.month().alias("month")
)

monthly_avg_temp = (
    df_with_month
    .group_by("month")
    .agg(pl.col("temperature").mean().alias("avg_monthly_temp"))
    .sort("month")
)
print("\n📆 Monthly Avg Temperature:\n", monthly_avg_temp)


🔥 Hottest Days:
 shape: (5, 5)
┌────────────┬──────────┬─────────────┬──────────┬──────────┐
│ date       ┆ location ┆ temperature ┆ humidity ┆ rainfall │
│ ---        ┆ ---      ┆ ---         ┆ ---      ┆ ---      │
│ date       ┆ str      ┆ f64         ┆ i64      ┆ f64      │
╞════════════╪══════════╪═════════════╪══════════╪══════════╡
│ 2023-03-05 ┆ Pune     ┆ 35.6        ┆ 28       ┆ 0.0      │
│ 2023-03-03 ┆ Pune     ┆ 34.0        ┆ 32       ┆ 0.0      │
│ 2023-03-04 ┆ Pune     ┆ 34.0        ┆ 30       ┆ 0.0      │
│ 2023-03-02 ┆ Pune     ┆ 33.0        ┆ 33       ┆ 0.0      │
│ 2023-03-01 ┆ Pune     ┆ 31.5        ┆ 35       ┆ 0.0      │
└────────────┴──────────┴─────────────┴──────────┴──────────┘

📆 Monthly Avg Temperature:
 shape: (3, 2)
┌───────┬──────────────────┐
│ month ┆ avg_monthly_temp │
│ ---   ┆ ---              │
│ i8    ┆ f64              │
╞═══════╪══════════════════╡
│ 1     ┆ 24.84            │
│ 2     ┆ 28.88            │
│ 3     ┆ 33.62            │
└───────┴───