# Lumaa AI/Machine Learning Intern Challenge

**By Grace Siu**

## Import Packages

In [1]:
# Import the necessary packages
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Load in the Data

The first step is the load in the data. Since this dataset has 1000 rows and we want a small dataset, we will only keep the first 500 rows of the data.

In [2]:
# Load in the movies dataset and make a pandas df
movies = pd.read_csv('IMDB-Movie-Data.csv')

# Keep the top 500 ranked movies
movies_sample = movies[movies['Rank'] <= 500]

# View the new df
movies_sample

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0
...,...,...,...,...,...,...,...,...,...,...,...,...
495,496,I Am Legend,"Drama,Horror,Sci-Fi",Years after a plague kills most of humanity an...,Francis Lawrence,"Will Smith, Alice Braga, Charlie Tahan, Salli ...",2007,101,7.2,565721,256.39,65.0
496,497,Men in Black 3,"Action,Adventure,Comedy",Agent J travels in time to M.I.B.'s early days...,Barry Sonnenfeld,"Will Smith, Tommy Lee Jones, Josh Brolin,Jemai...",2012,106,6.8,278379,179.02,58.0
497,498,Super 8,"Mystery,Sci-Fi,Thriller","During the summer of 1979, a group of friends ...",J.J. Abrams,"Elle Fanning, AJ Michalka, Kyle Chandler, Joel...",2011,112,7.1,298913,126.98,72.0
498,499,Law Abiding Citizen,"Crime,Drama,Thriller",A frustrated man decides to take justice into ...,F. Gary Gray,"Gerard Butler, Jamie Foxx, Leslie Bibb, Colm M...",2009,109,7.4,228339,73.34,34.0


## TF-IDF Vectorizer

The next step is to vectorize the movie descriptions. We will do this using TF-IDF.

In [3]:
# Vectorize the movie descriptions using TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies_sample['Description'])

## Compute Similarity
This function is to compute the cosine similarity between a user preference query and the movie descriptions.

In [4]:
def movie_recommendations(input_description, number_of_recs):

    # Vectorize the input description using TF-IDF
    input_vec = tfidf.transform([input_description])

    # Compute cosine similarity between the input description vector and the movie description vectors
    similarity = cosine_similarity(input_vec, tfidf_matrix).flatten()

    # Find the indices of the top requested number of movies
    top_recommendations = similarity.argsort()[::-1][:number_of_recs]

    # Locate the top movies from the movies_sample df
    recommendations = movies_sample.iloc[top_recommendations]
    
    return recommendations

## Recommend Top Movies
The following code shows an example of using all of the above steps to output top movie recommendatinos based on a user preference query.

In [5]:
# Example user preference query
user_preference = "I would like to watch a romance that includes adventure and comedy."

# Call the function to recommend movies
recommended_movies = movie_recommendations(user_preference, 5)

# Print the user preference and the given movie recommendations
print("User Preference:", user_preference)
print("\nMovie Recommendations:")
top_movies = recommended_movies[['Title', 'Description']].to_dict('records')
for movie in top_movies:
    print(movie)
    print()

User Preference: I would like to watch a romance that includes adventure and comedy.

Movie Recommendations:
{'Title': 'Percy Jackson & the Olympians: The Lightning Thief', 'Description': "A teenager discovers he's the descendant of a Greek god and sets out on an adventure to settle an on-going battle between the gods."}

{'Title': 'Patriots Day', 'Description': 'The story of the 2013 Boston Marathon bombing and the aftermath, which includes the city-wide manhunt to find the terrorists responsible.'}

{'Title': 'Brooklyn', 'Description': 'An Irish immigrant lands in 1950s Brooklyn, where she quickly falls into a romance with a local. When her past catches up with her, however, she must choose between two countries and the lives that exist within.'}

{'Title': 'Folk Hero & Funny Guy', 'Description': "A successful singer-songwriter hatches a plan to help his friend's struggling comedy career and broken love life by hiring him as his opening act on his solo tour."}

{'Title': 'District 9'