 ## <center>TOP ACTION MOVIES</center>

using the famous rotten tomatoes website, we scrape the top 140 action movies and their details based on their rotten tomato score.

This web scraping is processed using Python packages, "requests" and "beautifulSoup"

### Step-1 Setup

In this step, we will setup our environment for scraping the website.

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

Using rotten tomatoes website for top action movies to watch, and using the 'requests' package's '.get' method, we get the html file of the page

In [3]:
base_url='http://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/2/'
r = requests.get(base_url)
html=r.content
html

b'<!DOCTYPE html>\n<html lang="en-US">\n    <head prefix="og: http://ogp.me/ns# flixstertomatoes: http://ogp.me/ns/apps/flixstertomatoes#">\n    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />\n        <meta property=\'og.description\' content="From John Wick and Die Hard to Mad Max and Atomic Blonde, these best action movies ever will thrill you and get the adrenaline pumping!" />\n    <meta name=\'description\' content="From John Wick and Die Hard to Mad Max and Atomic Blonde, these best action movies ever will thrill you and get the adrenaline pumping!" />\n    <meta property=\'og:title\' content="140 Essential Action Movies To Watch Now" />\n    <meta property=\'og:type\' content="article" />\n    <meta property=\'og:image\' content="https://s3-us-west-2.amazonaws.com/flx-editorial-wordpress/wp-content/uploads/2019/06/06180025/RT_140_ESSENTIAL_ACTION_600x314.jpg" />\n    <meta property=\'og:url\' content="https://editorial.rottentomatoes.com/guide/140-essentia

From the above output, it doesn't look understandable. so, to make it more understandable, we will use the BeautifulSoup module, to parse the html file.

BeautifulSoup has three parsers in total, and its documentation suggests, 'lxml' is their best parser

In [4]:
soup = BeautifulSoup(html, "lxml")
soup

<!DOCTYPE html>
<html lang="en-US">
<head prefix="og: http://ogp.me/ns# flixstertomatoes: http://ogp.me/ns/apps/flixstertomatoes#">
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="From John Wick and Die Hard to Mad Max and Atomic Blonde, these best action movies ever will thrill you and get the adrenaline pumping!" property="og.description"/>
<meta content="From John Wick and Die Hard to Mad Max and Atomic Blonde, these best action movies ever will thrill you and get the adrenaline pumping!" name="description"/>
<meta content="140 Essential Action Movies To Watch Now" property="og:title"/>
<meta content="article" property="og:type"/>
<meta content="https://s3-us-west-2.amazonaws.com/flx-editorial-wordpress/wp-content/uploads/2019/06/06180025/RT_140_ESSENTIAL_ACTION_600x314.jpg" property="og:image"/>
<meta content="https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/" property="og:url"/>
<meta content="175594" name="edi

From above outputs, we can see that the parsed file is much more arranged and understandable for us to locate required classes and tags.

For crosschecking, we can write the parsed html to a file, and save on our local machine and check if it was parsed correctly.

Now, once we parse the html file, we need to find out which part of the file do we have the required information.
we need **Title, released year, Director, rotten tomatoes score, adjusted score, Synopsis, Description and Cast of the film**


Now, to find out which part of the html has these data, we need to use a brower, (I used Chrome) right click on the title and select 'Inspect' on the dropdown, which will give us the html page in the developers column.

Upon inspection, we found that all the related data is under the **<div class='col-sm-18 col-full-xs countdown-item-content'**

Using BeautifulSoups' find_all method we will retrive with that particular class name and tag from the html file

In [5]:
divs=soup.find_all('div',class_='col-sm-18 col-full-xs countdown-item-content')
divs

[<div class="col-sm-18 col-full-xs countdown-item-content">
 <div class="row countdown-item-title-bar">
 <div class="col-sm-20 col-full-xs" style="height: 100%;">
 <div class="article_movie_title" style="float: left;">
 <div><h2><a href="https://www.rottentomatoes.com/m/1018009-running_scared/">Running Scared</a> <span class="subtle start-year">(1986)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">60%</span></h2></div>
 </div>
 </div>
 <div class="col-sm-4 col-full-xs" style="height: 100%;">
 <div class="countdown-index">#140</div>
 </div>
 </div>
 <div class="row countdown-item-details">
 <div class="col-sm-24">
 <div class="info countdown-adjusted-score"><span class="descriptor">Adjusted Score: </span>61.182% <span class="glyphicon glyphicon-question-sign" data-html="true" data-original-title="The Adjusted Score comes from a weighted formula (Bayesian) that we use that accounts for variation in the number of reviews per movie." data-placement="to

### Step -2

In this step, we will filter and extract the required content from the html file we parsed

once we got retrieved the related content, we need to find the part where the movies details that we need are present.

Upon inspection we see that the **title, year and score** are present in the **href** with **<a** tag

Using list comprehension, and fin_all method we filter all the lines with **<a** tag and **h2** ref

In [12]:
headings=[div.find('h2') for div in divs]
type(headings)

list

We see that the is a list, so we take the first 3 elements of the list using indexing and study the contents of the elements


In [13]:
headings[0:4]

[<h2><a href="https://www.rottentomatoes.com/m/1018009-running_scared/">Running Scared</a> <span class="subtle start-year">(1986)</span> <span class="icon tiny fresh" title="Fresh"></span> <span class="tMeterScore">60%</span></h2>,
 <h2><a href="https://www.rottentomatoes.com/m/equilibrium/">Equilibrium</a> <span class="subtle start-year">(2002)</span> <span class="icon tiny rotten" title="Rotten"></span> <span class="tMeterScore">40%</span></h2>,
 <h2><a href="https://www.rottentomatoes.com/m/hero/">Hero</a> <span class="subtle start-year">(2004)</span> <span class="icon tiny certified" title="Certified Fresh"></span> <span class="tMeterScore">95%</span></h2>,
 <h2><a href="https://www.rottentomatoes.com/m/1017666-road_house/">Road House</a> <span class="subtle start-year">(1989)</span> <span class="icon tiny rotten" title="Rotten"></span> <span class="tMeterScore">39%</span></h2>]

once we filter all the **h2** elements from the html, we observer that

**title** is in **<a** tag,

**year** is in **<span** tag with **class='subtle start-year'**

**score** is in **<span** tag with **class='tMeterScore**

Using list comprehensions, we will filter the desired content

In [14]:
movie_names=[heading.find('a').string for heading in headings]
movie_names

['Running Scared',
 'Equilibrium',
 'Hero',
 'Road House',
 'Unstoppable',
 'Shaft',
 'The Villainess (Ak-Nyeo)',
 'Highlander',
 'Die Hard 2',
 'National Treasure',
 'The Protector (Tom yum goong) (Warrior King)',
 'Revenge',
 'El Mariachi',
 'A Touch of Zen',
 'Top Gun',
 'Con Air',
 'The Expendables 2',
 'The Mummy',
 'Mr. & Mrs. Smith',
 'Rush Hour',
 'The Equalizer',
 'Captain America: Civil War',
 'Air Force One',
 'Bloodsport',
 'Blade',
 'Bad Boys',
 'Die Hard: With a Vengeance',
 'The Running Man',
 'Code of Silence',
 "Shoot 'Em Up",
 'Crank',
 'Machete',
 'Drive',
 'Batman',
 'Under Siege',
 'Independence Day',
 'Bullitt',
 'Wanted',
 'Superman',
 'Ronin',
 'They Live',
 'Cliffhanger',
 "Marvel's The Avengers",
 'Hot Fuzz',
 'The Warriors',
 'Starship Troopers',
 'Elite Squad: The Enemy Within',
 'Point Break',
 'The Long Kiss Goodnight',
 'The Guest',
 'Taken',
 '300',
 'True Lies',
 'Demolition Man',
 'Hardcore Henry',
 'Police Story (Ging chaat goo si) (Police Force)',
 '

In [15]:
movie_years=[year.find('span', class_='subtle start-year').string for year in headings]
movie_years

['(1986)',
 '(2002)',
 '(2004)',
 '(1989)',
 '(2010)',
 '(1971)',
 '(2017)',
 '(1986)',
 '(1990)',
 '(2004)',
 '(2005)',
 '(2018)',
 '(1993)',
 '(1969)',
 '(1986)',
 '(1997)',
 '(2012)',
 '(1999)',
 '(2005)',
 '(1998)',
 '(2014)',
 '(2016)',
 '(1997)',
 '(1988)',
 '(1998)',
 '(1995)',
 '(1995)',
 '(1987)',
 '(1985)',
 '(2007)',
 '(2006)',
 '(2010)',
 '(2011)',
 '(1989)',
 '(1992)',
 '(1996)',
 '(1968)',
 '(2008)',
 '(1978)',
 '(1998)',
 '(1988)',
 '(1993)',
 '(2012)',
 '(2007)',
 '(1979)',
 '(1997)',
 '(2011)',
 '(1991)',
 '(1996)',
 '(2014)',
 '(2009)',
 '(2007)',
 '(1994)',
 '(1993)',
 '(2016)',
 '(1985)',
 '(2001)',
 '(2015)',
 '(1997)',
 '(1986)',
 '(2017)',
 '(1995)',
 '(2006)',
 '(1984)',
 '(2005)',
 '(2004)',
 '(2001)',
 '(1981)',
 '(2000)',
 '(2004)',
 '(2011)',
 '(1992)',
 '(1989)',
 '(2005)',
 '(2010)',
 '(2008)',
 '(2018)',
 '(2017)',
 '(1964)',
 '(1976)',
 '(2017)',
 '(1972)',
 '(2014)',
 '(2005)',
 '(1971)',
 '(2015)',
 '(1990)',
 '(1996)',
 '(1971)',
 '(2014)',
 '(2003)',

we notice that the years have '(',')' around them and are of string type, we need to remove those brackets and convert them into int data type

In [16]:
years=[year.strip('()') for year in movie_years]
years= [int(year) for year in years]
years

[1986,
 2002,
 2004,
 1989,
 2010,
 1971,
 2017,
 1986,
 1990,
 2004,
 2005,
 2018,
 1993,
 1969,
 1986,
 1997,
 2012,
 1999,
 2005,
 1998,
 2014,
 2016,
 1997,
 1988,
 1998,
 1995,
 1995,
 1987,
 1985,
 2007,
 2006,
 2010,
 2011,
 1989,
 1992,
 1996,
 1968,
 2008,
 1978,
 1998,
 1988,
 1993,
 2012,
 2007,
 1979,
 1997,
 2011,
 1991,
 1996,
 2014,
 2009,
 2007,
 1994,
 1993,
 2016,
 1985,
 2001,
 2015,
 1997,
 1986,
 2017,
 1995,
 2006,
 1984,
 2005,
 2004,
 2001,
 1981,
 2000,
 2004,
 2011,
 1992,
 1989,
 2005,
 2010,
 2008,
 2018,
 2017,
 1964,
 1976,
 2017,
 1972,
 2014,
 2005,
 1971,
 2015,
 1990,
 1996,
 1971,
 2014,
 2003,
 1993,
 2018,
 2010,
 1995,
 2002,
 2019,
 2012,
 2002,
 2010,
 1997,
 1985,
 2008,
 2011,
 2011,
 1987,
 1996,
 1987,
 2017,
 2006,
 2017,
 1994,
 1989,
 2014,
 1973,
 1985,
 1982,
 2015,
 1984,
 2000,
 2003,
 1994,
 1994,
 1994,
 2014,
 2001,
 1987,
 2007,
 1990,
 1982,
 1995,
 2012,
 2018,
 1981,
 1986,
 1992,
 1999,
 1991,
 1988,
 2015]

The scores have '%' and in the format string, so we will use strip() method to remove the % sign and convert them into interger

In [48]:
t_score=[heading.find('span',class_="tMeterScore").string for heading in headings]
original_scores=[int(x.strip('%')) for x in t_score]
original_scores[0:5]

[60, 40, 95, 39, 86]

Now that we got all the important data we needed from that particular class. Now we need **Consensus**. Again by inspecting the the html page, we find that consensus is in **'div tag** and  **class_='info critics-consensus**

Using BeasutifulSoup's find_all method we filter the class

In [18]:
consensus=[div.find('div', class_='info critics-consensus') for div in divs]
consensus[0]

<div class="info critics-consensus"><span class="descriptor">Critics Consensus:</span> Running Scared struggles to strike a consistent balance between violent action and humor, but the chemistry between its well-matched leads keeps things entertaining.</div>

Now, let's see what does the consensus have in it's content

In [19]:
for x in consensus[0]:
    print(x)

<span class="descriptor">Critics Consensus:</span>
 Running Scared struggles to strike a consistent balance between violent action and humor, but the chemistry between its well-matched leads keeps things entertaining.


So, the second part of the each **<div** tag has required consensus.and it is also observerd that there is an unwanted space presnet right before the consensus text. we will use the strip menthod to remove the space.

In [21]:
consensus[0].contents[1].strip()

'Running Scared struggles to strike a consistent balance between violent action and humor, but the chemistry between its well-matched leads keeps things entertaining.'

Now, we will apply this for all the elements consensus list. (using List comprehension)

In [22]:
consensus_text=[con.contents[1].strip() for con in consensus]
consensus_text[0:5]

['Running Scared struggles to strike a consistent balance between violent action and humor, but the chemistry between its well-matched leads keeps things entertaining.',
 'Equilibrium is a reheated mishmash of other sci-fi movies.',
 'With death-defying action sequences and epic historic sweep, Hero offers everything a martial arts fan could ask for.',
 "Whether Road House is simply bad or so bad it's good depends largely on the audience's fondness for Swayze -- and tolerance for violently cheesy action.",
 "As fast, loud, and relentless as the train at the center of the story, Unstoppable is perfect popcorn entertainment -- and director Tony Scott's best movie in years."]

Next is Directors. Again by inspecting the the html page, we find Director of each title is in **'div** tag and **class_='info director**

In [23]:
directors_list=[div.find('div', class_='info director') for div in divs]
directors_list[0:5]

[<div class="info director">
 <span class="descriptor">Directed By:</span> <a class="" href="/celebrity/peter_hyams/">Peter Hyams</a></div>,
 <div class="info director">
 <span class="descriptor">Directed By:</span> <a class="" href="/celebrity/kurt_wimmer/">Kurt Wimmer</a></div>,
 <div class="info director">
 <span class="descriptor">Directed By:</span> <a class="" href="/celebrity/zhang_yimou/">Zhang Yimou</a></div>,
 <div class="info director">
 <span class="descriptor">Directed By:</span> <a class="" href="/celebrity/rowdy_herrington/">Rowdy Herrington</a></div>,
 <div class="info director">
 <span class="descriptor">Directed By:</span> <a class="" href="/celebrity/tony_scott/">Tony Scott</a></div>]

Once, we filter the class that has Director, we observe that the name of the director is in tag **<a**.

Using list comprehension we filter the **<a** tag to get the director's name

In [24]:
directors=[x.find('a') for x in directors_list]
directors

[<a class="" href="/celebrity/peter_hyams/">Peter Hyams</a>,
 <a class="" href="/celebrity/kurt_wimmer/">Kurt Wimmer</a>,
 <a class="" href="/celebrity/zhang_yimou/">Zhang Yimou</a>,
 <a class="" href="/celebrity/rowdy_herrington/">Rowdy Herrington</a>,
 <a class="" href="/celebrity/tony_scott/">Tony Scott</a>,
 <a class="" href="/celebrity/gordon_parks/">Gordon Parks</a>,
 <a class="" href="/celebrity/jung_byoung_gil/">Jung Byung-gil</a>,
 <a class="" href="/celebrity/russell_mulcahy/">Russell Mulcahy</a>,
 <a class="" href="/celebrity/renny_harlin/">Renny Harlin</a>,
 <a class="" href="/celebrity/jon_turteltaub/">Jon Turteltaub</a>,
 <a class="" href="/celebrity/prachya_pinkaew/">Prachya Pinkaew</a>,
 <a class="" href="/celebrity/coralie_fargeat/">Coralie Fargeat</a>,
 <a class="" href="/celebrity/robert_rodriguez/">Robert Rodriguez</a>,
 <a class="" href="/celebrity/king_hu/">King Hu</a>,
 <a class="" href="/celebrity/tony_scott/">Tony Scott</a>,
 <a class="" href="/celebrity/simon_

from the output we observe thata, some of the director has 'None', and upon inspecting the website, it is indeed seen that some of the movies, doesn't have directors listed. so we need to fill those empty places with None for our list.

If we filter **<a** for the director names, we get an error because, we cannot filter none, hence we need to assign **None** in the place, with no director

In [25]:
directors=[None if x.find('a') is None else x.find('a').string for x in directors_list]
directors[0:5]

['Peter Hyams', 'Kurt Wimmer', 'Zhang Yimou', 'Rowdy Herrington', 'Tony Scott']

Similarly, we get the cast, and upon inspecting the website we find that the cast details are available in **<div** with **class=info cast**.

Using list comprehension, we will filter for **<div** tags

In [26]:
cast_list=[x.find('div',class_='info cast') for x in divs]
cast_list[0]

<div class="info cast">
<span class="descriptor">Starring:</span> <a class="" href="/celebrity/gregory_hines/">Gregory Hines</a>, <a class="" href="/celebrity/billy_crystal/">Billy Crystal</a>, <a class="" href="/celebrity/jimmy_smits/">Jimmy Smits</a>, <a class="" href="/celebrity/steven_bauer/">Steven Bauer</a></div>

In [27]:
links=cast_list[0].find_all('a')
links

[<a class="" href="/celebrity/gregory_hines/">Gregory Hines</a>,
 <a class="" href="/celebrity/billy_crystal/">Billy Crystal</a>,
 <a class="" href="/celebrity/jimmy_smits/">Jimmy Smits</a>,
 <a class="" href="/celebrity/steven_bauer/">Steven Bauer</a>]

From the above output, we see the first element of the list has sublists, which means the cast is actally a list of lists. for having a better understanding and view, we will convert a list of lists into a single list.

In [29]:
cast_names=', '.join([x.string for x in links])
cast_names

'Gregory Hines, Billy Crystal, Jimmy Smits, Steven Bauer'

Now we will apply the same logic for all the elements of the cast_list variable

We can do it in two ways. one is doing it with a regular written for loops.

In [30]:
cast=[]

for c in cast_list:
    links=c.find_all('a')
    cast_names=[link.string for link in links]
    result=', '.join(cast_names)
    cast.append(result)
cast

['Gregory Hines, Billy Crystal, Jimmy Smits, Steven Bauer',
 'Christian Bale, Emily Watson, Taye Diggs, Angus Macfadyen',
 'Jet Li, Tony Leung Chiu Wai, Maggie Cheung, Daoming Chen',
 'Patrick Swayze, Kelly Lynch, Sam Elliott, Ben Gazzara',
 'Denzel Washington, Chris Pine, Rosario Dawson, Ethan Suplee',
 'Richard Roundtree, Moses Gunn, Gwen Mitchell, Christopher St. John',
 'Ok-bin Kim, Kim Seo-hyung, Shin Ha-kyun, Bang Sung-jun',
 'Christopher Lambert, Sean Connery, Roxanne Hart, Clancy Brown',
 'Bruce Willis, Bonnie Bedelia, William Atherton, Reginald VelJohnson',
 'Nicolas Cage, Diane Kruger, Justin Bartha, Sean Bean',
 'Tony Jaa, Petchtai Wongkamlao, Bongkoj Khongmalai, Bongkoo Kongmalai',
 'Matilda Anna Ingrid Lutz, Kevin Janssens, Vincent Colombe, Guillaume Bouchède',
 'Carlos Gallardo, Consuelo Gómez, Reinol Martinez, Peter Marquardt',
 'Feng Hsu, Chun Shih, Pai Ying, Tien Peng',
 'Tom Cruise, Kelly McGillis, Anthony Edwards, Val Kilmer',
 'Nicolas Cage, John Cusack, John Malkov

Next is converting the above logic into a list comprehension

In [31]:
cast=[', '.join([link.string for link in c.find_all('a')]) for c in cast_list]
cast

['Gregory Hines, Billy Crystal, Jimmy Smits, Steven Bauer',
 'Christian Bale, Emily Watson, Taye Diggs, Angus Macfadyen',
 'Jet Li, Tony Leung Chiu Wai, Maggie Cheung, Daoming Chen',
 'Patrick Swayze, Kelly Lynch, Sam Elliott, Ben Gazzara',
 'Denzel Washington, Chris Pine, Rosario Dawson, Ethan Suplee',
 'Richard Roundtree, Moses Gunn, Gwen Mitchell, Christopher St. John',
 'Ok-bin Kim, Kim Seo-hyung, Shin Ha-kyun, Bang Sung-jun',
 'Christopher Lambert, Sean Connery, Roxanne Hart, Clancy Brown',
 'Bruce Willis, Bonnie Bedelia, William Atherton, Reginald VelJohnson',
 'Nicolas Cage, Diane Kruger, Justin Bartha, Sean Bean',
 'Tony Jaa, Petchtai Wongkamlao, Bongkoj Khongmalai, Bongkoo Kongmalai',
 'Matilda Anna Ingrid Lutz, Kevin Janssens, Vincent Colombe, Guillaume Bouchède',
 'Carlos Gallardo, Consuelo Gómez, Reinol Martinez, Peter Marquardt',
 'Feng Hsu, Chun Shih, Pai Ying, Tien Peng',
 'Tom Cruise, Kelly McGillis, Anthony Edwards, Val Kilmer',
 'Nicolas Cage, John Cusack, John Malkov

Now left are adjusted scores and Synopsis of the movies.

For adjusted scores , after inspecting the website we find that the cast details are available in **<div** tag with **class=info countdown-adjusted-score**.

Using list comprehension, we will filter for **<div** tags

In [37]:
adjusted_score=[div.find('div', class_='info countdown-adjusted-score') for div in divs]
adjusted_score[0]

<div class="info countdown-adjusted-score"><span class="descriptor">Adjusted Score: </span>61.182% <span class="glyphicon glyphicon-question-sign" data-html="true" data-original-title="The Adjusted Score comes from a weighted formula (Bayesian) that we use that accounts for variation in the number of reviews per movie." data-placement="top" data-toggle="tooltip" rel="tooltip" title=""></span></div>

In [38]:
for x in adjusted_score[0]:
    print(x)

<span class="descriptor">Adjusted Score: </span>
61.182% 
<span class="glyphicon glyphicon-question-sign" data-html="true" data-original-title="The Adjusted Score comes from a weighted formula (Bayesian) that we use that accounts for variation in the number of reviews per movie." data-placement="top" data-toggle="tooltip" rel="tooltip" title=""></span>


Using a for loop we find the contents of the 1st element of the adjusted_stores, and we see that the score is the second element

In [39]:
adjusted_score[0].contents[1]

'61.182% '

We also notice that there's a **%** symbol in the result, and the data type is a string and we need it in integer format. 

So using a list comprehension and the above technique, we will apply the same logic for all the elements of the list.

In [40]:
ad_scores=[score.contents[1].strip('% ') for score in adjusted_score]
ad_scores=[float(scores) for scores in ad_scores]
ad_scores[0:5]

[61.182, 41.992, 100.762, 41.986, 91.476]

The last data we need is the synopsis of each movie. And once again upon inspecting the website html page using the developer tools we will find out that synopsis is in **class=info synopsis** with a **<div** tag.

using find_all method we will find part of the html with **Div** tag and the class.

In [41]:
synopsis=[div.find('div', class_='info synopsis') for div in divs]
synopsis[0]

<div class="info synopsis"><span class="descriptor">Synopsis:</span> Distinguished by a sharp, witty dialogue between its two cop protagonists, Ray and Danny (Gregory Hines and Billy Crystal), this...<a class="" data-pageheader="" href="https://www.rottentomatoes.com/m/1018009-running_scared/" target="_top"> [More]</a></div>

In [42]:
for x in synopsis[0]:
    print(x)

<span class="descriptor">Synopsis:</span>
 Distinguished by a sharp, witty dialogue between its two cop protagonists, Ray and Danny (Gregory Hines and Billy Crystal), this...
<a class="" data-pageheader="" href="https://www.rottentomatoes.com/m/1018009-running_scared/" target="_top"> [More]</a>


And again our movie synopsis is the second part of the content.  and there's an unwanted space at the begining of the description. we will remove it using the strip method.

In [44]:
synopsis[0].contents[1].strip()

'Distinguished by a sharp, witty dialogue between its two cop protagonists, Ray and Danny (Gregory Hines and Billy Crystal), this...'

In [46]:
synopsis_text=[x.contents[1].strip() for x in synopsis]
synopsis_text[0:5]

['Distinguished by a sharp, witty dialogue between its two cop protagonists, Ray and Danny (Gregory Hines and Billy Crystal), this...',
 'In the nation of Libria, there is always peace among men. The rules of the Librian system are simple. If...',
 "Hero is two-time Academy Award nominee Zhang Yimou's directorial attempt at exploring the concept of a Chinese hero. During the...",
 'Dalton (Swayze) is a true gentleman with a degree in philosophy from NYU. He also has a flip side -...',
 'In this action thriller from director Tony Scott, rookie train operator Will (Chris Pine) and grizzled veteran engineer Frank (Denzel...']

### Step -3

We succesfully scraped the rotten tomatoes website for the top action movies in the page, and got all the required data we need. Now we need to create a data frame, which is used to visualize the data in tabluar format.

Other option is to connect to a database and load them into a table in the data base. 

Using the pandas library, we create a dataframe.

In [49]:
movies_info = pd.DataFrame()

movies_info['Movie Title']=movie_names
movies_info['Year']=years
movies_info['scores']=original_scores
movies_info['Adjusted Scores']=ad_scores
movies_info['Director']=directors
movies_info['Synopsis']=synopsis_text
movies_info['cast']=cast
movies_info['Consensus']=consensus_text

movies_info

Unnamed: 0,Movie Title,Year,scores,Adjusted Scores,Director,Synopsis,cast,Consensus
0,Running Scared,1986,60,61.182,Peter Hyams,"Distinguished by a sharp, witty dialogue betwe...","Gregory Hines, Billy Crystal, Jimmy Smits, Ste...",Running Scared struggles to strike a consisten...
1,Equilibrium,2002,40,41.992,Kurt Wimmer,"In the nation of Libria, there is always peace...","Christian Bale, Emily Watson, Taye Diggs, Angu...",Equilibrium is a reheated mishmash of other sc...
2,Hero,2004,95,100.762,Zhang Yimou,Hero is two-time Academy Award nominee Zhang Y...,"Jet Li, Tony Leung Chiu Wai, Maggie Cheung, Da...",With death-defying action sequences and epic h...
3,Road House,1989,39,41.986,Rowdy Herrington,Dalton (Swayze) is a true gentleman with a deg...,"Patrick Swayze, Kelly Lynch, Sam Elliott, Ben ...",Whether Road House is simply bad or so bad it'...
4,Unstoppable,2010,86,91.476,Tony Scott,In this action thriller from director Tony Sco...,"Denzel Washington, Chris Pine, Rosario Dawson,...","As fast, loud, and relentless as the train at ..."
5,Shaft,1971,88,92.024,Gordon Parks,"Shaft, a highly successful film, spawned an in...","Richard Roundtree, Moses Gunn, Gwen Mitchell, ...",This is the man that would risk his neck for h...
6,The Villainess (Ak-Nyeo),2017,84,86.913,Jung Byung-gil,"Since she was a little girl, Sook-hee was rais...","Ok-bin Kim, Kim Seo-hyung, Shin Ha-kyun, Bang ...",The Villainess offers enough pure kinetic thri...
7,Highlander,1986,69,71.927,Russell Mulcahy,"Among humans for centuries, an immortal specie...","Christopher Lambert, Sean Connery, Roxanne Har...","People hate Highlander because it's cheesy, bo..."
8,Die Hard 2,1990,68,72.291,Renny Harlin,"""Another basement, another elevator...how can ...","Bruce Willis, Bonnie Bedelia, William Atherton...","It lacks the fresh thrills of its predecessor,..."
9,National Treasure,2004,46,50.848,Jon Turteltaub,Benjamin Franklin Gates is a third generation ...,"Nicolas Cage, Diane Kruger, Justin Bartha, Sea...","National Treasure is no treasure, but it's a f..."
