# AI/Machine Learning Intern Challenge  
## Name: Hasham Khan  
**University**: Michigan Technological University  
**Program**: Data Science Graduate Student


# Movie Recommendation System

A Netflix movie recommendation system that suggests movies based on your preferences using machine learning.


**Step 1 :**   **Load the Netflix dataset and clean missing values**

In [5]:
import pandas as pd
def load_data(file_path):
    df = pd.read_csv(file_path)
    df.fillna("", inplace=True)
    df['combined_text'] = df['title'] + " " + df['description'] + " " + df['listed_in']
    return df

# Load dataset
df = load_data("netflix_titles.csv")

**Step 2 :**   **Vectorize the combined text(title, description, genre) using TF-IDF(Term Freq-Inverse Doc Freq)**

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
def build_tfidf_matrix(df):
    vectorizer = TfidfVectorizer(stop_words='english')
    tfidf_matrix = vectorizer.fit_transform(df['combined_text'])
    return vectorizer, tfidf_matrix
# TF-IDF matrix for the dataset
vectorizer, tfidf_matrix = build_tfidf_matrix(df)

**Step 3 :**   **Calculating Similarity between user input and movie dataset**

In [3]:
from sklearn.metrics.pairwise import cosine_similarity

# Function to get movie recommendations based on user input
def get_recommendations(user_input, vectorizer, tfidf_matrix, df):
    # Transform user input into the TF-IDF space
    user_tfidf = vectorizer.transform([user_input])
    
    # Calculate cosine similarity
    similarities = cosine_similarity(user_tfidf, tfidf_matrix).flatten()
    
    # Add similarity scores to the dataframe
    df['similarity'] = similarities
    
    # Get top 3 most similar movies  ,we can  from 3 upto any number accordingly
    recommendations = df.nlargest(3, 'similarity')
    
    # Display recommended movies
    print("\nRecommended Movies:")
    print("-" * 50)
    for idx, movie in recommendations.iterrows():
        print(f"\n{movie['title']}")
        print(f"Similarity: {movie['similarity']*100:.1f}%")
        print(f"Genre: {movie['listed_in']}")
        print("-" * 50)

# Example usage (this part will be inside the main loop or called separately)
# get_recommendations("comedy action adventure", vectorizer, tfidf_matrix, df)


**Step 4 :**   **Interactive user input**

In [4]:
# Main loop for interactive user input (optional)
def main_loop():
    while True:
        print("\nEnter what kind of movies you like (or 'quit' to exit):")
        user_input = input("> ")
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        
        get_recommendations(user_input, vectorizer, tfidf_matrix, df)

# Uncomment to run the loop
main_loop()



Enter what kind of movies you like (or 'quit' to exit):


>  i love racing



Recommended Movies:
--------------------------------------------------

Race 2
Similarity: 27.5%
Genre: Action & Adventure, International Movies, Music & Musicals
--------------------------------------------------

Race
Similarity: 24.9%
Genre: Action & Adventure, International Movies, Music & Musicals
--------------------------------------------------

MONKART
Similarity: 23.7%
Genre: Kids' TV, Korean TV Shows
--------------------------------------------------

Enter what kind of movies you like (or 'quit' to exit):


>  I like action movies set in space



Recommended Movies:
--------------------------------------------------

A StoryBots Space Adventure
Similarity: 32.1%
Genre: Children & Family Movies
--------------------------------------------------

Star Trek: Deep Space Nine
Similarity: 28.8%
Genre: TV Action & Adventure, TV Sci-Fi & Fantasy
--------------------------------------------------

The Epic Tales of Captain Underpants in Space
Similarity: 25.4%
Genre: Kids' TV, TV Action & Adventure, TV Comedies
--------------------------------------------------

Enter what kind of movies you like (or 'quit' to exit):


>  quit


Goodbye!
