### Web Scraping
Python program to download IMDB’s top 250 movies from https://www.imdb.com/search/title/?groups=top_100&sort=user_rating,desc chart/top and load them into a DataFrame. For each movie we will scrape its title, director name, list of actors, release year, and IMDB rating.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
# Modified the URL to get top_250 and get them all in a single page
url = r'https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&count=250'

In [3]:
html = requests.get(url)

In [4]:
soup = BeautifulSoup(html.content, "html.parser")

In [5]:
top_250_html = soup.find_all(name='div',attrs={"class":"lister-item-content"}, limit=250)

In [6]:
headers = ['TITLE', 'RELEASE_YEAR', 'RATING', 'DIRECTOR', 'ACTORS']
scraped_250_movie_dct = {}
for head in headers:
    scraped_250_movie_dct[head] = []

In [7]:
for movie in top_250_html:
    movie = BeautifulSoup(str(movie), "html.parser")
    title = movie.h3.a.text
    year = movie.h3.find_all('span')[1].text
    rating = movie.find('div', {"class":"inline-block ratings-imdb-rating"}).strong.text
    cast = movie.find('p', {"class":""}).find_all(name='a')
    director = cast[0].text
    actors = ",".join(list(map(lambda x: x.text, cast[1:])))
    
    scraped_250_movie_dct[headers[0]].append(title)
    scraped_250_movie_dct[headers[1]].append(year)
    scraped_250_movie_dct[headers[2]].append(rating)
    scraped_250_movie_dct[headers[3]].append(director)
    scraped_250_movie_dct[headers[4]].append(actors)

In [8]:
scraped_250_movie_df = pd.DataFrame(scraped_250_movie_dct)
scraped_250_movie_df.head(1)

Unnamed: 0,TITLE,RELEASE_YEAR,RATING,DIRECTOR,ACTORS
0,The Shawshank Redemption,(1994),9.3,Frank Darabont,"Tim Robbins,Morgan Freeman,Bob Gunton,William ..."


In [9]:
#cleaning year
scraped_250_movie_df['RELEASE_YEAR'] = scraped_250_movie_df['RELEASE_YEAR'].apply(lambda x : x.split('(')[-1].split(')')[0])

In [10]:
scraped_250_movie_df.head(5)

Unnamed: 0,TITLE,RELEASE_YEAR,RATING,DIRECTOR,ACTORS
0,The Shawshank Redemption,1994,9.3,Frank Darabont,"Tim Robbins,Morgan Freeman,Bob Gunton,William ..."
1,The Godfather,1972,9.2,Francis Ford Coppola,"Marlon Brando,Al Pacino,James Caan,Diane Keaton"
2,The Dark Knight,2008,9.0,Christopher Nolan,"Christian Bale,Heath Ledger,Aaron Eckhart,Mich..."
3,Schindler's List,1993,9.0,Steven Spielberg,"Liam Neeson,Ralph Fiennes,Ben Kingsley,Carolin..."
4,The Lord of the Rings: The Return of the King,2003,9.0,Peter Jackson,"Elijah Wood,Viggo Mortensen,Ian McKellen,Orlan..."
