## Developed a basic movie recommendation system that scrapes data from the IMDb website. The program will retrieve information from the Most Popular Movies page on IMDb. The data to be scraped is

1) Movie Titles <br>
2) Runtime <br>
3) Rating <br>
4) Age Restriction <br>
5) Genre <br>
6) Writer(s) <br>
7) Director(s) <br>
8) Fun Movie Trivia <br> 

The data is be stored in a Pandas dataframe.

Subsequently, the program will prompt the user to specify their preferences for:

1) Genre <br>
2) Minimum Rating<br>
3) Maximum Runtime<br>
4) Age Restriction<br>

It will then filter the dataframe accordingly and select a movie for recommendation. The recommendation will display the movie title, writer, director, and trivia. If there are no movies that match the user's preferences, the program will recommend any movie from the original dataframe.

In [134]:
import time
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [147]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager

#opening browser
driver3 = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))

In [148]:
url3="https://www.imdb.com/"

In [149]:
driver3.get(url3)

# Going to the top 200 popular IMDB movies

In [150]:
#selecting the menu
menuid=driver3.find_element(By.ID,"imdbHeader-navDrawerOpen")
menuid.click()

In [152]:
#clicking popular movies on menu
menu_items=driver3.find_element(By.CLASS_NAME,"navlinkcat__targetWrapper")
most_pop_movies=menu_items.find_element(By.XPATH,"/html/body/div[2]/nav/div[2]/aside[1]/div/div[2]/div/div[1]/span/div/div/ul/a[3]")
most_pop_movies.click()

# The following data is being scraped:

## 1) Movie Titles 
## 2) Runtime 
## 2) Runtime 
## 3) Rating
## 4) Age Restriction
## 5) Genre 
## 6) Writer(s)
## 7) Director(s) 
## 8) Fun Movie Trivia 


In [226]:

#selecting whole movies
movies_section=driver3.find_element(By.XPATH,"/html/body/div[2]/main/div/div[3]/section/div/div[2]/div")

#creating grid of all movies
movies_grid=movies_section.find_elements(By.CLASS_NAME,"ipc-metadata-list-summary-item")


Movie_name=[]
Movie_runtime=[]
Movie_year=[]
Movie_restriction=[]
Movie_rating=[]
Movie_genre=[]
Movie_writer=[]
Movie_director=[]
Movie_trivia=[]

for items in range(len(movies_grid)):
    boolhelper=False
    try:
        items =movies_section.find_elements(By.CLASS_NAME,"ipc-metadata-list-summary-item")[items]
        
        #Getting year, runtime and Restriction
        try:
            movie_titlebar=items.find_elements(By.CLASS_NAME,"sc-479faa3c-7.jXgjdT.cli-title-metadata")    
            
            for i in movie_titlebar:

                #getting movie year
                try:
                    movie_year=i.find_elements(By.CLASS_NAME,"sc-479faa3c-8.bNrEFi.cli-title-metadata-item")[0]

                    if(int(movie_year.text)>=2024):
                        boolhelper=True
                        break
                    
                    Movie_year.append(movie_year.text)
                    print(movie_year.text)
                
                except NoSuchElementException:
                    continue
               
                #getting movie runtime
                try:
                    movie_runtime=i.find_elements(By.CLASS_NAME,"sc-479faa3c-8.bNrEFi.cli-title-metadata-item")[1]
                    Movie_runtime.append(movie_runtime.text)
                    
                    print(movie_runtime.text)
                except IndexError:
                    Movie_runtime.append('N/A')
            
                #getting movie age restriction
                try:
                    movie_restrictions=i.find_elements(By.CLASS_NAME,"sc-479faa3c-8.bNrEFi.cli-title-metadata-item")[2]
                    Movie_restriction.append(movie_restrictions.text)
                    
                    print(movie_restrictions.text)
                except IndexError:
                    Movie_restriction.append('N/A')
            
            if(boolhelper==True):
                continue
            
            #getting name of movie
            movie_name_section = items.find_element(By.CLASS_NAME,"ipc-title-link-wrapper")
            movie_name = movie_name_section.find_element(By.CLASS_NAME,"ipc-title__text")
            movie_link = movie_name_section.get_attribute('href')
            Movie_name.append(movie_name.text)
            print(movie_name.text)
            print(movie_link)
            
            driver3.get(movie_link)
            time.sleep(5)

            #movie rating
            try:
                movie_ratingclass = driver3.find_elements(By.CLASS_NAME, "sc-e226b0e3-3.dwkouE")
                movie_rating = movie_ratingclass[0].find_element(By.CLASS_NAME, "sc-bde20123-1.cMEQkK")
                Movie_rating.append(movie_rating.text)
                print(movie_rating.text)
            
            except NoSuchElementException:
                Movie_rating.append('N/A')
            
            #Movie genre
            movie_namesection=driver3.find_element(By.CLASS_NAME,"sc-9aa2061f-4.egqNEn")
            movie_Genre=movie_namesection.find_element(By.CLASS_NAME,"ipc-chip-list__scroller")
            genre=movie_Genre.text
            genre=genre.replace('\n',', ')
            Movie_genre.append(genre) 
            
            print(genre)
            
            movie_writdir=movie_namesection.find_elements(By.CLASS_NAME,"sc-9aa2061f-3.iRxAxS")

            for names in movie_writdir:

                #getting movie directors
                Directorsclass=names.find_elements(By.CLASS_NAME, "ipc-metadata-list__item")[0]
                director_name=Directorsclass.find_element(By.CLASS_NAME,"ipc-metadata-list-item__content-container")
                Movie_director.append(director_name.text)

                #Getiing movie writers
                writerclass=names.find_elements(By.CLASS_NAME,"ipc-metadata-list__item")[1]
                writer_name=writerclass.find_element(By.CLASS_NAME,"ipc-metadata-list-item__content-container")
                Movie_writer.append(writer_name.text)
                
            print(writer_name.text)
            print(director_name.text)

            #Fun movie trevia
            Trivia_class=driver3.find_element(By.CLASS_NAME,"chCwWk.ipc-list-card--base")    
            movie_trivia=Trivia_class.find_element(By.CLASS_NAME,"ipc-metadata-list-item__content-container")
            Movie_trivia.append(movie_trivia.text)
            
            driver3.execute_script("window.history.go(-1)")
            WebDriverWait(driver3, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "ipc-title-link-wrapper")))
            time.sleep(5)
                    
        except NoSuchElementException:
                continue
      

    except StaleElementReferenceException:
        movies_section=driver3.find_element(By.XPATH,"/html/body/div[2]/main/div/div[3]/section/div/div[2]/div")
        movies_grid=movies_section.find_elements(By.CLASS_NAME,"ipc-metadata-list-summary-item")
        items=items-1

2023
1h 58m
R
The Killer
https://www.imdb.com/title/tt1136617/?ref_=chtmvm_t_1
6.9
Action, Adventure, Crime
Alexis NolentLuc JacamonAndrew Kevin Walker
David Fincher
2023
2h 37m
PG-13
The Hunger Games: The Ballad of Songbirds & Snakes
https://www.imdb.com/title/tt10545296/?ref_=chtmvm_t_3
7.2
Action, Adventure, Drama
Michael LesslieMichael ArndtSuzanne Collins
Francis Lawrence
2023
1h 49m
PG-13
Five Nights at Freddy's
https://www.imdb.com/title/tt4589218/?ref_=chtmvm_t_6
5.5
Horror, Mystery, Thriller
Scott CawthonSeth CuddebackEmma Tammi
Emma Tammi
2023
2h 13m
PG-13
The Creator
https://www.imdb.com/title/tt11858890/?ref_=chtmvm_t_8
7.0
Action, Adventure, Drama
Gareth EdwardsChris Weitz
Gareth Edwards
2023
1h 46m
R
Thanksgiving
https://www.imdb.com/title/tt1448754/?ref_=chtmvm_t_11
7.1
Horror, Mystery, Thriller
Jeff RendellEli Roth
Eli Roth
2023
2h 13m
PG-13
Rebel Moon - Part One: A Child of Fire
https://www.imdb.com/title/tt14998742/?ref_=chtmvm_t_13
Action, Adventure, Drama
Shay Hatte

In [227]:
print("Year: ",len(Movie_year))
print(Movie_year)
print("Movie_runtime: ",len(Movie_runtime))
print(Movie_runtime)
print("age restriction: ",len(Movie_restriction))
print(Movie_restriction)
print('name',len(Movie_name))
print(Movie_name)
print("Movie_rating: ",len(Movie_rating))
print(Movie_rating)
print("Movie_genre: ",len(Movie_genre))
print(Movie_genre)
print("Movie_director: ",len(Movie_director))
print(Movie_director)
print("Movie_writer: ",len(Movie_writer))
print(Movie_writer)

Year:  48
['2023', '2023', '2023', '2023', '2023', '2023', '2023', '2023', '2023', '2012', '2023', '2023', '2023', '2023', '2023', '2023', '2022', '2023', '2023', '2022', '2023', '2023', '2023', '2023', '2023', '2013', '2023', '2023', '1990', '2014', '2001', '2023', '2023', '2023', '2015', '2023', '2023', '2023', '2023', '2003', '2023', '2003', '2023', '2021', '2023', '2023', '2023', '2022']
Movie_runtime:  48
['1h 58m', '2h 37m', '1h 49m', '2h 13m', '1h 46m', '2h 13m', '1h 43m', '2h 13m', '1h 54m', '2h 22m', '2h 1m', '1h 53m', '1h 36m', '1h 32m', '1h 44m', '2h 21m', '1h 45m', '1h 45m', '2h 14m', '1h 35m', '1h 52m', '1h 56m', '1h 39m', '2h 10m', '1h 58m', '2h 26m', '2h 6m', '1h 22m', '1h 43m', '2h 49m', '2h 32m', '1h 32m', '1h 53m', '1h 41m', '1h 35m', '1h 53m', '2h', '1h 50m', '2h 14m', '2h 15m', '1h 45m', '1h 37m', '1h 56m', '2h 35m', '2h 10m', '2h 14m', '1h 51m', '2h 2m']
age restriction:  48
['R', 'PG-13', 'PG-13', 'PG-13', 'R', 'PG-13', 'PG-13', 'R', 'PG-13', 'PG-13', 'PG-13', 'R'

# Storing the data into organized Dataframe which can be converted to csv or any other database

In [239]:
import pandas as pd
data3={"Movie Name":Movie_name,
        "Year":Movie_year,
        "Run time":Movie_runtime,
       "Age restriction":Movie_restriction,
       "Rating":Movie_rating,
       "Genre":Movie_genre,
       "Writer":Movie_writer,
       "Director":Movie_director,
       "Trivia":Movie_trivia
      }
df3=pd.DataFrame(data3)
df3

# Scraped data

Unnamed: 0,Movie Name,Year,Run time,Age restriction,Rating,Genre,Writer,Director,Trivia
0,The Killer,2023,1h 58m,R,6.9,"Action, Adventure, Crime",Alexis NolentLuc JacamonAndrew Kevin Walker,David Fincher,"The graphic novel ""The Killer"" (written by Ale..."
1,The Hunger Games: The Ballad of Songbirds & Sn...,2023,2h 37m,PG-13,7.2,"Action, Adventure, Drama",Michael LesslieMichael ArndtSuzanne Collins,Francis Lawrence,"According to director Francis Lawrence, he was..."
2,Five Nights at Freddy's,2023,1h 49m,PG-13,5.5,"Horror, Mystery, Thriller",Scott CawthonSeth CuddebackEmma Tammi,Emma Tammi,"Because of Foxy's skeletal-like structure, he ..."
3,The Creator,2023,2h 13m,PG-13,7.0,"Action, Adventure, Drama",Gareth EdwardsChris Weitz,Gareth Edwards,Gareth Edwards tried to make this film as trad...
4,Thanksgiving,2023,1h 46m,R,7.1,"Horror, Mystery, Thriller",Jeff RendellEli Roth,Eli Roth,Based on the mock-trailer from Grindhouse (200...
5,Rebel Moon - Part One: A Child of Fire,2023,2h 13m,PG-13,,"Action, Adventure, Drama",Shay HattenKurt JohnstadZack Snyder,Zack Snyder,Zack Snyder wrote a script for a Star Wars spi...
6,A Haunting in Venice,2023,1h 43m,PG-13,6.6,"Crime, Drama, Horror",Michael GreenAgatha Christie,Kenneth Branagh,Sir Kenneth Branagh worked with the technical ...
7,The Holdovers,2023,2h 13m,R,8.4,"Comedy, Drama",David Hemingson,Alexander Payne,"The entire film was shot in real, practical lo..."
8,Barbie,2023,1h 54m,PG-13,7.0,"Adventure, Comedy, Fantasy",Greta GerwigNoah Baumbach,Greta Gerwig,Barbie is 23% larger than everything in Barbie...
9,The Hunger Games,2012,2h 22m,PG-13,7.2,"Action, Adventure, Sci-Fi",Gary RossSuzanne CollinsBilly Ray,Gary Ross,There was a swear jar on the set. Co-writer an...


# Prompting user to give the following preferences
## Genre
## Minimum Rating
## Maximum Runtime
## Age Restrictio

In [316]:
inputgenre=input("Which movie genre do you prefer: ")
inputminrating=int((input("What should be the minimum rating of the movie: ")))
inputmaxruntime=int(input("What should be the maximum runtime of the movie(in minutes): "))
inputagerestriction=input("What should be the age restriction for the movie: ")

Which movie genre do you prefer: Action
What should be the minimum rating of the movie: 6
What should be the maximum runtime of the movie(in minutes): 300
What should be the age restriction for the movie: PG-13


In [317]:
newfiltered3=df3[df3["Age restriction"]==inputagerestriction]

In [318]:
import pandas as pd

# Assuming newfiltered3 is a DataFrame and inputminrating is defined
newfiltered3["Rating"] = pd.to_numeric(newfiltered3["Rating"], errors='coerce')
newfiltered3 = newfiltered3.dropna(subset=["Rating"], how="any")
newfiltered3 = newfiltered3[newfiltered3["Rating"] >= inputminrating]
newfiltered3 = newfiltered3.reset_index(drop=True)
newfiltered3=newfiltered3[newfiltered3["Genre"].str.contains(inputgenre,case=False)]
newfiltered3=newfiltered3.reset_index(drop=True)
# Display the resulting DataFrame
newfiltered3

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  newfiltered3["Rating"] = pd.to_numeric(newfiltered3["Rating"], errors='coerce')


Unnamed: 0,Movie Name,Year,Run time,Age restriction,Rating,Genre,Writer,Director,Trivia
0,The Hunger Games: The Ballad of Songbirds & Sn...,2023,2h 37m,PG-13,7.2,"Action, Adventure, Drama",Michael LesslieMichael ArndtSuzanne Collins,Francis Lawrence,"According to director Francis Lawrence, he was..."
1,The Creator,2023,2h 13m,PG-13,7.0,"Action, Adventure, Drama",Gareth EdwardsChris Weitz,Gareth Edwards,Gareth Edwards tried to make this film as trad...
2,The Hunger Games,2012,2h 22m,PG-13,7.2,"Action, Adventure, Sci-Fi",Gary RossSuzanne CollinsBilly Ray,Gary Ross,There was a swear jar on the set. Co-writer an...
3,The Hunger Games: Catching Fire,2013,2h 26m,PG-13,7.5,"Action, Adventure, Sci-Fi",Simon BeaufoyMichael ArndtSuzanne Collins,Francis Lawrence,(At around one h 4 mins) When Katniss is in th...
4,Gran Turismo,2023,2h 14m,PG-13,7.2,"Action, Adventure, Drama",Jason HallZach BaylinAlex Tse,Neill Blomkamp,Jann Mardenborough plays a stunt double in the...
5,Dune,2021,2h 35m,PG-13,8.0,"Action, Adventure, Drama",Jon SpaihtsDenis VilleneuveEric Roth,Denis Villeneuve,Denis Villeneuve confirmed in a Vanity Fair ar...
6,Dungeons & Dragons: Honor Among Thieves,2023,2h 14m,PG-13,7.3,"Action, Adventure, Comedy",Jonathan GoldsteinJohn Francis DaleyMichael Gilio,John Francis DaleyJonathan Goldstein,Despite not being involved in the production o...


In [319]:
raw_data = newfiltered3["Run time"]

for counter, time in enumerate(raw_data):
    hours, minutes = map(str, time.split('h '))
    minutes=minutes.replace("m","")
    total_minutes = int(hours) * 60 + int(minutes)
    raw_data[counter]=total_minutes

raw_data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  raw_data[counter]=total_minutes


0    157
1    133
2    142
3    146
4    134
5    155
6    134
Name: Run time, dtype: object

In [320]:
newfiltered3

Unnamed: 0,Movie Name,Year,Run time,Age restriction,Rating,Genre,Writer,Director,Trivia
0,The Hunger Games: The Ballad of Songbirds & Sn...,2023,157,PG-13,7.2,"Action, Adventure, Drama",Michael LesslieMichael ArndtSuzanne Collins,Francis Lawrence,"According to director Francis Lawrence, he was..."
1,The Creator,2023,133,PG-13,7.0,"Action, Adventure, Drama",Gareth EdwardsChris Weitz,Gareth Edwards,Gareth Edwards tried to make this film as trad...
2,The Hunger Games,2012,142,PG-13,7.2,"Action, Adventure, Sci-Fi",Gary RossSuzanne CollinsBilly Ray,Gary Ross,There was a swear jar on the set. Co-writer an...
3,The Hunger Games: Catching Fire,2013,146,PG-13,7.5,"Action, Adventure, Sci-Fi",Simon BeaufoyMichael ArndtSuzanne Collins,Francis Lawrence,(At around one h 4 mins) When Katniss is in th...
4,Gran Turismo,2023,134,PG-13,7.2,"Action, Adventure, Drama",Jason HallZach BaylinAlex Tse,Neill Blomkamp,Jann Mardenborough plays a stunt double in the...
5,Dune,2021,155,PG-13,8.0,"Action, Adventure, Drama",Jon SpaihtsDenis VilleneuveEric Roth,Denis Villeneuve,Denis Villeneuve confirmed in a Vanity Fair ar...
6,Dungeons & Dragons: Honor Among Thieves,2023,134,PG-13,7.3,"Action, Adventure, Comedy",Jonathan GoldsteinJohn Francis DaleyMichael Gilio,John Francis DaleyJonathan Goldstein,Despite not being involved in the production o...


# The best movies according to user's preference

In [321]:
print(inputmaxruntime)
newdataframe=newfiltered3.copy()
newdataframe=newdataframe[newdataframe["Run time"]<=inputmaxruntime]
newdataframe

300


Unnamed: 0,Movie Name,Year,Run time,Age restriction,Rating,Genre,Writer,Director,Trivia
0,The Hunger Games: The Ballad of Songbirds & Sn...,2023,157,PG-13,7.2,"Action, Adventure, Drama",Michael LesslieMichael ArndtSuzanne Collins,Francis Lawrence,"According to director Francis Lawrence, he was..."
1,The Creator,2023,133,PG-13,7.0,"Action, Adventure, Drama",Gareth EdwardsChris Weitz,Gareth Edwards,Gareth Edwards tried to make this film as trad...
2,The Hunger Games,2012,142,PG-13,7.2,"Action, Adventure, Sci-Fi",Gary RossSuzanne CollinsBilly Ray,Gary Ross,There was a swear jar on the set. Co-writer an...
3,The Hunger Games: Catching Fire,2013,146,PG-13,7.5,"Action, Adventure, Sci-Fi",Simon BeaufoyMichael ArndtSuzanne Collins,Francis Lawrence,(At around one h 4 mins) When Katniss is in th...
4,Gran Turismo,2023,134,PG-13,7.2,"Action, Adventure, Drama",Jason HallZach BaylinAlex Tse,Neill Blomkamp,Jann Mardenborough plays a stunt double in the...
5,Dune,2021,155,PG-13,8.0,"Action, Adventure, Drama",Jon SpaihtsDenis VilleneuveEric Roth,Denis Villeneuve,Denis Villeneuve confirmed in a Vanity Fair ar...
6,Dungeons & Dragons: Honor Among Thieves,2023,134,PG-13,7.3,"Action, Adventure, Comedy",Jonathan GoldsteinJohn Francis DaleyMichael Gilio,John Francis DaleyJonathan Goldstein,Despite not being involved in the production o...
