<a href="https://colab.research.google.com/github/mohdomer/CodSoft/blob/main/Movie_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> ## **Movie Recommendation System Project** ##

First, we will import the necessary libraries and the dataset and create a Lemmatizer object.

In [5]:
!pip install contractions
import pandas as pd
import re
from nltk.stem import WordNetLemmatizer
from contractions import fix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Collecting contractions
  Downloading contractions-0.1.73-py2.py3-none-any.whl (8.7 kB)
Collecting textsearch>=0.0.21 (from contractions)
  Downloading textsearch-0.0.24-py2.py3-none-any.whl (7.6 kB)
Collecting anyascii (from textsearch>=0.0.21->contractions)
  Downloading anyascii-0.3.2-py3-none-any.whl (289 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m289.9/289.9 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyahocorasick (from textsearch>=0.0.21->contractions)
  Downloading pyahocorasick-2.0.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.8/110.8 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyahocorasick, anyascii, textsearch, contractions
Successfully installed anyascii-0.3.2 contractions-0.1.73 pyahocorasick-2.0.0 textsearch-0.0.24


In [4]:
lemmatizer = WordNetLemmatizer()

In [13]:
# Read CSV files
df1 = pd.read_csv('movie_data_train.csv')
df2 = pd.read_csv('movie_data_solution.csv',quoting=3, error_bad_lines=False)




In [14]:
# Concatenate DataFrames
df = pd.concat([df1, df2], axis=0)

In [18]:
import nltk
nltk.download('wordnet')


[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [20]:
def preprocess(text):
    if pd.isnull(text) or not isinstance(text, str):
        return ""
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)  # remove non-alpha
    text = fix(text)  # expand contractions using the 'fix' function from contractions
    lemmatizer = WordNetLemmatizer()
    text = ' '.join([lemmatizer.lemmatize(word) for word in text.split()])
    return text

In [21]:
# Preprocess plots: lower case, remove punctuation, lemmatize
df['Plot'] = df['Plot Summary'].apply(preprocess)

In [22]:
# Create a TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer(max_features=10_000)
tfidf_matrix = tfidf_vectorizer.fit_transform(df['Plot'])

In [23]:
def recommend_and_print(plot):
    processed_plot = preprocess(plot)
    plot_vec = tfidf_vectorizer.transform([processed_plot])

    # Calculate cosine similarities
    similarities = cosine_similarity(tfidf_matrix, plot_vec)

    # Get top indices
    indices = similarities.argsort(axis=0)[-10:][::-1]

    # Get movie titles using the indices
    recommended_movies = []
    for idx in indices:
        original_idx = df.index[idx]  # Get the original index
        movie_title = df.loc[original_idx, 'Title']
        recommended_movies.append(movie_title)

    # Print recommended movies
    for idx, movie_title in enumerate(recommended_movies, start=1):
        print(f"{idx}. {movie_title}")

    return recommended_movies


In [28]:
# Example plot
plot = input("Enter a plot of a type of movie you would like to see: ")

Enter a plot of a type of movie you would like to see: detective smart


In [29]:
recommend_and_print(plot)

1. (18618, Las altas presiones (2014), drama, "The film follows what happens inside a lonely and heartbroken man who visits his hometown on the pretext of working there. He looks isolated and indifferent to everything. At the beginning,  he walks alone,  and he films an extremely desolate landscape,  and what the audience sees on the screen also lacks vigor. Such a feeling of ruin reflects the man's inner self. Consequently)     love and friendship invigorate the film. What...
Name: Title, dtype: object
2. 46438    In Plain View (2008)
Name: Title, dtype: object
3. 22811    Lieutenant Rex (2013)
Name: Title, dtype: object
4. 10101    Dirk Dagger and the Fallen Idol (2008)
Name: Title, dtype: object
5. 34852    Joe Wilkinson (1999)
Name: Title, dtype: object
6. 45777    The Curious Case of Countess Martina (2012)
Name: Title, dtype: object
7. 46759    Stranger by Night (1994)
Name: Title, dtype: object
8. 8396    Perdition City (????)
Name: Title, dtype: object
9. 43256    "The Great De

[(18618, Las altas presiones (2014), drama, "The film follows what happens inside a lonely and heartbroken man who visits his hometown on the pretext of working there. He looks isolated and indifferent to everything. At the beginning,  he walks alone,  and he films an extremely desolate landscape,  and what the audience sees on the screen also lacks vigor. Such a feeling of ruin reflects the man's inner self. Consequently)     love and friendship invigorate the film. What...
 Name: Title, dtype: object,
 46438    In Plain View (2008)
 Name: Title, dtype: object,
 22811    Lieutenant Rex (2013)
 Name: Title, dtype: object,
 10101    Dirk Dagger and the Fallen Idol (2008)
 Name: Title, dtype: object,
 34852    Joe Wilkinson (1999)
 Name: Title, dtype: object,
 45777    The Curious Case of Countess Martina (2012)
 Name: Title, dtype: object,
 46759    Stranger by Night (1994)
 Name: Title, dtype: object,
 8396    Perdition City (????)
 Name: Title, dtype: object,
 43256    "The Great Dete