# Web Scraping Project:
**In this project we will `scrap` the `top rated movies` form the `specific page` of `IMDb website`**
___
* `Website page Link`: [Click here](https://www.imdb.com/chart/top/)
* `Date`: 03/06/2023
* `Author`: Malik Hasnain Ali
* `Helper Youtube Channel name`: techTFQ
* `Helping Youtube video link`: [Click here](https://www.youtube.com/watch?v=LCVSmkyB4v8)

**`Tools Used:`**
- BeautifulSoup
- Requests
- openpyxl
- HTML
- Pandas
- Numpy

> **`Importing Libraries:`**
___


In [1]:
from bs4 import BeautifulSoup
import requests

In [21]:
!pip install openpyxl




[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [22]:
import openpyxl

> **`Scrap through openpyxl:`**
___


In [26]:
excel = openpyxl.Workbook()
print(excel.sheetnames)
sheet = excel.active
# Now we give title to our excel sheet
sheet.title = 'IMDB Top 250 Movies'
print(excel.sheetnames)
# Now we give column names in excel file
sheet.append(['Movie Rank','Movie Name','Year of Release','IMDB Rating'])

['Sheet']
['IMDB Top 250 Movies']


> **`Function that scrap the data:`**
___


In [27]:
try:
    source = requests.get('https://www.imdb.com/chart/top/')
    source.raise_for_status() # This code will raise error agar hamara link sahi na howa to
    
    
    soup = BeautifulSoup(source.text,'html.parser')
    # Ab hamain pehla us ki body ka tag ko pehla access krna ho ga
    # movies = soup.find('tbody', class_='lister-list') # Ye inspect krta waqt us pori body ka tag maloom kiya hum na
    movies = soup.find('tbody', class_='lister-list').find_all('tr')
    # Ab is ma hum na us body ka ander wala maal ko access krna ka kosis kr ra hain
    
    # print(movies)
    # print(len(movies)) # To check either the movies length is same or not
    
    for movie in movies:
        # name = movie.find('td',class_='titleColumn') # First we exeute this then comment out this line of code and execute next code
        name = movie.find('td',class_='titleColumn').a.text # a.text mean ka pehla tag select karo
        rank = movie.find('td',class_='titleColumn').get_text(strip=True).split('.')[0]
        year = movie.find('td', class_='titleColumn').span.text.strip('()')
        rating = movie.find('td',class_='ratingColumn imdbRating').strong.text
        
        # To check either it is printing or not
        print(rank,name,year,rating)
        # break # ye break is lia lagaya tha hum na taka hum sirf 1 hi ko extract kare now we cancel break
        # To append above coded data in our named columns
        sheet.append([rank,name,year,rating])

except Exception as e:
    print(e)
    
# To save the data in xlsx file   
excel.save('IMDB Movie Ratings.xlsx')

1 The Shawshank Redemption 1994 9.2
2 The Godfather 1972 9.2
3 The Dark Knight 2008 9.0
4 The Godfather Part II 1974 9.0
5 12 Angry Men 1957 9.0
6 Schindler's List 1993 8.9
7 The Lord of the Rings: The Return of the King 2003 8.9
8 Pulp Fiction 1994 8.8
9 The Lord of the Rings: The Fellowship of the Ring 2001 8.8
10 Il buono, il brutto, il cattivo 1966 8.8
11 Forrest Gump 1994 8.8
12 Fight Club 1999 8.7
13 The Lord of the Rings: The Two Towers 2002 8.7
14 Inception 2010 8.7
15 Star Wars: Episode V - The Empire Strikes Back 1980 8.7
16 The Matrix 1999 8.7
17 GoodFellas 1990 8.7
18 One Flew Over the Cuckoo's Nest 1975 8.6
19 Se7en 1995 8.6
20 It's a Wonderful Life 1946 8.6
21 Shichinin no samurai 1954 8.6
22 The Silence of the Lambs 1991 8.6
23 Saving Private Ryan 1998 8.6
24 Cidade de Deus 2002 8.6
25 Interstellar 2014 8.6
26 La vita è bella 1997 8.6
27 The Green Mile 1999 8.6
28 Star Wars 1977 8.5
29 Terminator 2: Judgment Day 1991 8.5
30 Back to the Future 1985 8.5
31 Sen to Chihiro n

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

> **`Calling data into df:`**
___


In [2]:
df = pd.read_excel('./IMDB Movie Ratings.xlsx')

In [3]:
df.head()

Unnamed: 0,Movie Rank,Movie Name,Year of Release,IMDB Rating
0,1,The Shawshank Redemption,1994,9.2
1,2,The Godfather,1972,9.2
2,3,The Dark Knight,2008,9.0
3,4,The Godfather Part II,1974,9.0
4,5,12 Angry Men,1957,9.0


> **`Top 10 movies based on IMBD rating:`**
___


In [7]:
# Top 10 movies who have the most imbd rating
df.sort_values(by='IMDB Rating',ascending=False).head(10)

Unnamed: 0,Movie Rank,Movie Name,Year of Release,IMDB Rating
0,1,The Shawshank Redemption,1994,9.2
1,2,The Godfather,1972,9.2
2,3,The Dark Knight,2008,9.0
3,4,The Godfather Part II,1974,9.0
4,5,12 Angry Men,1957,9.0
5,6,Schindler's List,1993,8.9
6,7,The Lord of the Rings: The Return of the King,2003,8.9
7,8,Pulp Fiction,1994,8.8
8,9,The Lord of the Rings: The Fellowship of the Ring,2001,8.8
9,10,"Il buono, il brutto, il cattivo",1966,8.8
