# TV Show Decision Maker (Main Notebook)

This notebook is the main entry point for the project. It demonstrates:
- Loading the Kaggle TMDB TV dataset (local CSV) using **pandas**
- Basic data cleaning / feature extraction
- Creating `TVShow` objects and using `ShowRecommender` (composition)
- Basic plots using **matplotlib**
- An optional interactive menu loop (`while`) for exploring recommendations

Note: The dataset CSV is not committed to git because it is large. You must provide a local path.


## 1) Imports

In [None]:
import os
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from src.tv_show import TVShow
from src.show_recommender import ShowRecommender

## 2) Configure dataset path

Set `CSV_PATH` to wherever you stored `TMDB_tv_dataset_v3.csv`.

Example (Windows):
- `r"C:\\Users\\91940\\Downloads\\archive\\TMDB_tv_dataset_v3.csv"`

In [None]:
# Update this to your local file path.
CSV_PATH = r"C:\\Users\\91940\\Downloads\\archive\\TMDB_tv_dataset_v3.csv"

csv_file = Path(CSV_PATH)
if not csv_file.exists():
    raise FileNotFoundError(f"CSV not found at: {csv_file}")

print(csv_file)

## 3) Load dataset with pandas (advanced library #1)

We load only the columns we need to build `TVShow` objects.

In [None]:
usecols = [
    "name",
    "genres",
    "number_of_episodes",
    "original_language",
    "vote_average",
    "first_air_date",
]

# Limit rows for fast demos; set to None to load everything (slow on a 70MB file).
ROW_LIMIT = 1000

df = pd.read_csv(csv_file, usecols=usecols, nrows=ROW_LIMIT)
df.head()

## 4) Clean / transform columns

- Extract `year` from `first_air_date`
- Convert numeric columns
- Split genres into lists

In [None]:
# Extract year as integer (YYYY) from date strings like "2011-04-17".
df["year"] = pd.to_datetime(df["first_air_date"], errors="coerce").dt.year

# Convert numeric columns; invalid values become NaN.
df["number_of_episodes"] = pd.to_numeric(df["number_of_episodes"], errors="coerce")
df["vote_average"] = pd.to_numeric(df["vote_average"], errors="coerce")

# Drop rows missing required numeric values.
df = df.dropna(subset=["number_of_episodes", "vote_average"]).copy()
df["number_of_episodes"] = df["number_of_episodes"].astype(int)
df["vote_average"] = df["vote_average"].astype(float)

# Normalize strings.
df["name"] = df["name"].fillna("").astype(str).str.strip()
df["genres"] = df["genres"].fillna("").astype(str)
df["original_language"] = df["original_language"].fillna("").astype(str).str.strip()

df.head()

## 5) Build `TVShow` objects + `ShowRecommender`

This section demonstrates:
- A `for` loop
- Exception handling when creating objects
- Composition: `ShowRecommender` stores many `TVShow` objects

In [None]:
recommender = ShowRecommender()

tv_shows = []
skipped = 0

for _, row in df.iterrows():
    try:
        genres_list = [g.strip() for g in row["genres"].split(",") if g.strip()]

        show = TVShow(
            title=row["name"],
            genre=genres_list,
            num_episodes=int(row["number_of_episodes"]),
            avg_rating=float(row["vote_average"]),
            language=row["original_language"],
            year=None if pd.isna(row["year"]) else int(row["year"]),
        )

        tv_shows.append(show)
    except (ValueError, TypeError):
        skipped += 1

recommender.add_shows_from_list(tv_shows)

print(f"Loaded shows: {recommender.get_total_shows()}  |  Skipped rows: {skipped}")
recommender.get_statistic()

## 6) Quick exploration + plots (advanced library #2)

We use matplotlib to visualize rating distribution and top genres.

In [None]:
# Ratings histogram
ratings = np.array([s.avg_rating for s in tv_shows], dtype=float)

plt.figure(figsize=(8, 4))
plt.hist(ratings, bins=np.arange(0, 10.5, 0.5), edgecolor="black")
plt.title("Rating Distribution (vote_average)")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.tight_layout()
plt.show()

In [None]:
# Top genres bar chart
genre_counts = (
    df["genres"]
    .str.split(",")
    .explode()
    .str.strip()
    .replace("", np.nan)
    .dropna()
    .value_counts()
    .head(10)
)

plt.figure(figsize=(10, 4))
genre_counts.sort_values().plot(kind="barh")
plt.title("Top 10 Genres (from loaded subset)")
plt.xlabel("Count")
plt.tight_layout()
plt.show()

## 7) Optional interactive menu (`while` loop)

To avoid hanging automated grading, this is disabled by default.

Set `RUN_INTERACTIVE = True` if you want to use it.

In [None]:
RUN_INTERACTIVE = False

if RUN_INTERACTIVE:
    while True:
        print("\nMenu:")
        print("  1) Filter by genre")
        print("  2) Filter by minimum rating")
        print("  3) Search by title")
        print("  4) Get recommendations (top N by rating)")
        print("  5) Quit")

        choice = input("Enter choice: ").strip()

        if choice == "1":
            genre = input("Genre: ").strip()
            matches = recommender.filter_by_genre(genre)
            print(f"Found {len(matches)} shows")
            for s in matches[:10]:
                print(f"- {s.title} ({s.avg_rating}/10)")

        elif choice == "2":
            min_rating = float(input("Minimum rating (0-10): ").strip())
            matches = recommender.filter_by_rating(min_rating)
            print(f"Found {len(matches)} shows")
            for s in matches[:10]:
                print(f"- {s.title} ({s.avg_rating}/10)")

        elif choice == "3":
            query = input("Search term: ").strip()
            matches = recommender.search_by_title(query)
            print(f"Found {len(matches)} shows")
            for s in matches[:10]:
                print(f"- {s.title} ({s.avg_rating}/10)")

        elif choice == "4":
            limit = int(input("How many recommendations? ").strip())
            recs = recommender.get_recommendations(limit=limit)
            for i, s in enumerate(recs, 1):
                print(f"{i}. {s.title} ({s.avg_rating}/10)")

        elif choice == "5":
            print("Goodbye!")
            break

        else:
            print("Invalid choice, try again.")