# Scraping Project

# IMDB Top Rated TV shows 

This project consists in Scrapping an IMDB Top Rated TV shows list, that contains the best 250 TV shows.

The database includes the following variables: 

- Rank
- TV show
- IMDB Rating
- URL of the TV show in IMDB
- Year
- Star Cast

## 1. Importing the webpage

In [1]:
import requests
page = requests.get('https://www.imdb.com/chart/toptv/?ref_=nv_tvv_250')

In [2]:
print(page.text[0:500])




<!DOCTYPE html>
<html
    xmlns:og="http://ogp.me/ns#"
    xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
         
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">

    
    
    

    
    
    

    <meta name="apple-itunes-app" content="app-id=342792525, app-argument=imdb:///?src=mdot">
            <style>
                body#styleguide-v2 {
                    background: no-repeat fixed center top #000;
                }
           


## 2. Parsing HTML with BeautifulSoup

In [3]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.text, 'html.parser')

## 3. Collecting the records

In [4]:
import re

In [5]:
soup.title.string

'IMDb Top 250 TV - IMDb'

In [6]:
TVshows = soup.select('td.titleColumn')

In [7]:
rank = [b.attrs.get('data-value') for b in soup.select('td.posterColumn span[name=rk]')]

In [8]:
urls = [a.attrs.get('href') for a in soup.select('td.titleColumn a')]

In [9]:
crew = [a.attrs.get('title') for a in soup.select('td.titleColumn a')]

In [10]:
ratings = [b.attrs.get('data-value') for b in soup.select('td.posterColumn span[name=ir]')]

In [11]:
imdb_list_tv = []

for i in range(0, len(TVshows)):
    TV_string = TVshows[i].get_text()
    TV = (' '.join(TV_string.split()).replace('.', ''))
    TV_title = TV[len(str(i))+1:-7]
    year = re.search('\((.*?)\)', TV_string).group(1)
    place = TV[:len(str(i))-(len(TV))]
    data = {"TV Show": TV_title,
            "Year": year,
            "Rank": rank[i],
            "Star Cast": crew[i],
            "Rating": ratings[i],
            "url": 'www.imdb.com' + urls[i]}
    imdb_list_tv.append(data)

## 4. Constructing the Data Frame

In [12]:
import pandas as pd
dftv = pd.DataFrame(imdb_list_tv, columns=['Rank', 'TV Show', 'Year', 'Star Cast', 'Rating', 'url'])

In [13]:
dftv.head(15)

Unnamed: 0,Rank,TV Show,Year,Star Cast,Rating,url
0,1,Planeta Tierra II,2016,David Attenborough,9.498023497499815,www.imdb.com/title/tt5491994/?pf_rd_m=A2FGELUU...
1,2,Hermanos de sangre,2001,"Scott Grimes, Damian Lewis",9.455660989345809,www.imdb.com/title/tt0185906/?pf_rd_m=A2FGELUU...
2,3,Juego de tronos,2011,"Emilia Clarke, Peter Dinklage",9.444816116815035,www.imdb.com/title/tt0944947/?pf_rd_m=A2FGELUU...
3,4,Planeta Tierra,2006,"David Attenborough, Sigourney Weaver",9.443685136356558,www.imdb.com/title/tt0795176/?pf_rd_m=A2FGELUU...
4,5,Breaking Bad,2008,"Bryan Cranston, Aaron Paul",9.410885777486014,www.imdb.com/title/tt0903747/?pf_rd_m=A2FGELUU...
5,6,The Wire (Bajo escucha),Bajo escucha,"Dominic West, Lance Reddick",9.306207737185034,www.imdb.com/title/tt0306414/?pf_rd_m=A2FGELUU...
6,7,Cosmos: Una odisea en el espacio-tiempo,2014,"Neil deGrasse Tyson, Stoney Emshwiller",9.252967388132207,www.imdb.com/title/tt2395695/?pf_rd_m=A2FGELUU...
7,8,Rick y Morty,2013,"Justin Roiland, Chris Parnell",9.231547136982352,www.imdb.com/title/tt2861424/?pf_rd_m=A2FGELUU...
8,9,Cosmos,1980,"Carl Sagan, Jaromír Hanzlík",9.224728275671469,www.imdb.com/title/tt0081846/?pf_rd_m=A2FGELUU...
9,10,Planeta azul II,2017,"David Attenborough, Peter Drost",9.211800559044011,www.imdb.com/title/tt6769208/?pf_rd_m=A2FGELUU...


## 5. Putting the information in a CSV

In [14]:
dftv.to_csv('IMDB_250_TV.csv', index=False, encoding='utf-8')

In [None]:
import requests
page = requests.get('https://www.imdb.com/chart/toptv/?ref_=nv_tvv_250')