# IMDb's top 250 Movies (All languages)

**Link**: https://www.imdb.com/chart/top/

**IMDb (Internet Movie Database)** is an online database of information related to films, television programs, home videos, video games, and streaming content online – including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews. An additional fan feature, message boards, was abandoned in February 2017. Originally a fan-operated website, the database is owned and operated by IMDb.com, Inc., a subsidiary of Amazon.

### Scraping the website.

The website has the top 250 movies of all times and it keeps updating it over the time. The list contains the movie's name and the IMDb rating it has achieved.

<br>
IMDb takes all the individual votes cast by the registered users and uses them to calculate a single rating. But, they don't use the arithmetic mean or median of the votes to get the final rating. The rating displayed on a movie's page is a weighted average rating.

In [1]:
from  bs4 import BeautifulSoup
import requests

In [2]:
url = requests.get('https://www.imdb.com/chart/top/')

In [3]:
soup = BeautifulSoup(url.text, 'lxml')

In [4]:
print(soup.prettify())

<!DOCTYPE html>
<html xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="app-id=342792525, app-argument=imdb:///?src=mdot" name="apple-itunes-app"/>
  <style>
   body#styleguide-v2 {
                    background: no-repeat fixed center top #000;
                }
  </style>
  <script type="text/javascript">
   var IMDbTimer={starttime: new Date().getTime(),pt:'java'};
  </script>
  <script>
   if (typeof uet == 'function') {
      uet("bb", "LoadTitle", {wb: 1});
    }
  </script>
  <script>
   (function(t){ (t.events = t.events || {})["csm_head_pre_title"] = new Date().getTime(); })(IMDbTimer);
  </script>
  <title>
   IMDb Top 250 - IMDb
  </title>
  <script>
   (function(t){ (t.events = t.events || {})["csm_head_post_title"] = new Date().getTime(); })(IMDbTimer);
  </script>
  <script>
   if (typeof uet == 'function') {
      uet("be", "LoadTitle", {w

In [5]:
table = soup.find('table', class_='chart full-width')

### Storing the required info:

The info is stored in the form of a 'CSV' file. The 'CSV' file has 7 columns these are:

1. 'Movie title' : The title of the movie, it is extracted from the main page(Top 250 movies web page). 
2. 'Release year': The year of release of the movie, it is extracted from the main page(Top 250 movies web page).
3. 'Ratings'     : Ratings recieved by the movie.
4. 'User count'  : Total number of (Registered) users that voted for the movie
5. 'Director'    : Director of the movie, it is extracted from the official IMDb link of the movie.
6. 'Language'    : Languages in which the movie was made, it is extracted from the official IMDb link of the movie.
7. 'IMDB link'   : The offical IMDb link of the movie


In [41]:
import csv

csv_file = open('IMDBTop250.csv', 'w')

csv_writer = csv.writer(csv_file)

csv_writer.writerow(['Movie title', 'Release year', 'Ratings', 'User count', 'Director', 'Language', 'IMDB link'])

for movie in table.find_all('tr')[1:]:
    movie_title = movie.find_all('td')[1].a.text
    movie_year = movie.find_all('td')[1].span.text
    movie_link = 'https://www.imdb.com' + movie.find_all('td')[1].a['href']
    rating = movie.find_all('td')[2].strong.text
    user_count = movie.find_all('td')[2].strong['title'].split(' ')[3]
    
    movie_url = requests.get(movie_link)
    movie_soup = BeautifulSoup(movie_url.text, 'lxml')
    director_name = movie_soup.find('div', class_='credit_summary_item').a.text
    language = ''
    for lang in movie_soup.find('div', {'id' : 'titleDetails'}).find_all('div', class_='txt-block')[2].find_all('a'):
        language += (lang.text + ' ')
    
    csv_writer.writerow([movie_title, movie_year, rating, user_count, director_name, language, movie_link])