## Promoting Tourism in San Francisco
<p>San Francisco has been home to many famous films, including the action classic “Bullitt” and the recent science-fiction epic “Rise of the Planet of the Apes”. To celebrate the cinematic history of the city, the tourism board has asked you to perform some analyses.</p>
<p>Their idea is to promote the 10 most popular filming locations in San Franciso. The board plans to create an attraction at each of the 10 locations based on the biggest film (by worldwide income) shot there.</p>
<p>At your disposal are two datasets. One contains every location and film shot in San Franciso. The other dataset contains movie details drawn from the Internet Movie Database (IMDB). </p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:16px"><b>datasets/locations.csv - Filming locations of movies shot in San Francisco since 1924</b>
    </div>
    <div> Source: <a href="https://data.sfgov.org/Culture-and-Recreation/Film-Locations-in-San-Francisco/yitu-d5am">Film Locations in San Francisco</a></div>

<ul>
    <li><b>Title: </b>Title of the movie. Note that some films may share the same title, and are only differentiated by year of release.</li>
    <li><b>Release Year: </b>Year of release in cinemas.</li>
    <li><b>Locations: </b>Name of location in San Francisco where a scene was shot for the movie.</li>
    <li><b>Production Company: </b>Company that produced the film.</li>
    <li><b>Distributor: </b>Company that distributed the film.</li>
</ul>
    </div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6; margin-top: 17px;">
    <div style="font-size:16px"><b>datasets/imdb_movies.csv - Data on over 85,000 movies up to 2020</b>
    </div>
    <div>Source: <a href="https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset">Kaggle (IMDb movies extensive dataset)</a></div>
<ul>
    <li><b>imdb_title_id: </b>Unique film id.</li>
    <li><b>title: </b>Title of the film. Note that some films may share the same title, and are only differentiated by year of release.</li>
    <li><b>year: </b>The year of release.</li> 
    <li><b>genre: </b>The genres of the film. The primary genre of the film is the first genre listed.</li>
    <li><b>duration: </b>The duration of the film in minutes.</li>
    <li><b>director: </b>The name of the director.</li>
    <li><b>actors: </b>The leading actors of the film.</li>
    <li><b>avg_vote: </b>Average review given to the film.</li>
    <li><b>worldwide_gross_income: </b>Total income for the film worldwide in US dollars.</li>
</ul>
    </div>

In [1]:
import pandas as pd
import numpy as np

In [2]:
location = pd.read_csv('datasets/locations.csv')

In [3]:
location.head()

Unnamed: 0,Title,Release Year,Locations,Production Company,Distributor
0,180,2011,Epic Roasthouse (399 Embarcadero),SPI Cinemas,
1,180,2011,Mason & California Streets (Nob Hill),SPI Cinemas,
2,180,2011,Justin Herman Plaza,SPI Cinemas,
3,180,2011,200 block Market Street,SPI Cinemas,
4,180,2011,City Hall,SPI Cinemas,


In [4]:
location.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1743 entries, 0 to 1742
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Title               1743 non-null   object
 1   Release Year        1743 non-null   int64 
 2   Locations           1689 non-null   object
 3   Production Company  1741 non-null   object
 4   Distributor         1642 non-null   object
dtypes: int64(1), object(4)
memory usage: 68.2+ KB


In [5]:
movies = pd.read_csv('datasets/imdb_movies.csv')

In [6]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85854 entries, 0 to 85853
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   imdb_title_id           85854 non-null  object 
 1   title                   85854 non-null  object 
 2   year                    85854 non-null  int64  
 3   genre                   85854 non-null  object 
 4   duration                85854 non-null  int64  
 5   director                85767 non-null  object 
 6   actors                  85785 non-null  object 
 7   avg_vote                85854 non-null  float64
 8   worldwide_gross_income  31016 non-null  object 
dtypes: float64(1), int64(2), object(6)
memory usage: 5.9+ MB


In [7]:
movies.head()

Unnamed: 0,imdb_title_id,title,year,genre,duration,director,actors,avg_vote,worldwide_gross_income
0,tt0000009,Miss Jerry,1894,Romance,45,Alexander Black,"Blanche Bayliss, William Courtenay, Chauncey D...",5.9,
1,tt0000574,The Story of the Kelly Gang,1906,"Biography, Crime, Drama",70,Charles Tait,"Elizabeth Tait, John Tait, Norman Campbell, Be...",6.1,
2,tt0001892,Den sorte drøm,1911,Drama,53,Urban Gad,"Asta Nielsen, Valdemar Psilander, Gunnar Helse...",5.8,
3,tt0002101,Cleopatra,1912,"Drama, History",100,Charles L. Gaskill,"Helen Gardner, Pearl Sindelar, Miss Fielding, ...",5.2,
4,tt0002130,L'Inferno,1911,"Adventure, Drama, Fantasy",68,"Francesco Bertolini, Adolfo Padovan","Salvatore Papa, Arturo Pirovano, Giuseppe de L...",7.0,


In [8]:
movies = movies[movies['avg_vote']>6.0]

In [9]:
movies = movies[(movies['genre'].str.contains('Action'))|(movies['genre'].str.contains('Drama'))|(movies['genre'].str.contains('Biography'))]

In [10]:
movies.head()

Unnamed: 0,imdb_title_id,title,year,genre,duration,director,actors,avg_vote,worldwide_gross_income
1,tt0000574,The Story of the Kelly Gang,1906,"Biography, Crime, Drama",70,Charles Tait,"Elizabeth Tait, John Tait, Norman Campbell, Be...",6.1,
4,tt0002130,L'Inferno,1911,"Adventure, Drama, Fantasy",68,"Francesco Bertolini, Adolfo Padovan","Salvatore Papa, Arturo Pirovano, Giuseppe de L...",7.0,
6,tt0002423,Madame DuBarry,1919,"Biography, Drama, Romance",85,Ernst Lubitsch,"Pola Negri, Emil Jannings, Harry Liedtke, Edua...",6.8,
7,tt0002445,Quo Vadis?,1913,"Drama, History",120,Enrico Guazzoni,"Amleto Novelli, Gustavo Serena, Carlo Cattaneo...",6.2,
10,tt0002646,Atlantis,1913,Drama,121,August Blom,"Olaf Fønss, Ida Orloff, Ebba Thomsen, Carl Lau...",6.6,


In [11]:
title = list(location['Title'].unique())

In [12]:
title

['180',
 '24 Hours on Craigslist',
 'Summertime',
 'Ballers Season 3',
 'Chance Season 2',
 'A Night Full of Rain',
 'Vegas in Space',
 'Nine Months',
 'Beautiful Boy',
 'About a Boy',
 'Age of Adaline',
 'After the Thin Man',
 'Ant-Man',
 'Americana',
 'Another 48 Hours',
 'Around the Fire',
 'Attack of the Killer Tomatoes',
 'Basic Instinct',
 'Bedazzled',
 'Blue Jasmine',
 'Bee Season',
 'Bicentennial Man',
 'Big Eyes',
 'Big Sur',
 'Boys and Girls',
 'Smile Again, Jenny Lee',
 'Broken-A Modern Love Story',
 'Silicon Valley Season 4',
 'Cardinal X',
 'Superman',
 'The Bachelor',
 'Murder in the First, Season 1',
 'Time After Time',
 'Bullitt',
 'The Game',
 'Nine to Five',
 'Herbie Rides Again',
 'Edtv',
 'Venom',
 'Sister Act 2: Back in the Habit',
 'Casualties of War',
 'Memoirs of an Invisible Man',
 'The Organization',
 'Looking Season 2 ep 202',
 'San Andreas',
 'The Caine Mutiny',
 'Looking Season 2 ep 203',
 'The Last Black Man in San Francisco',
 'Class Action',
 'Common Thr

In [13]:
len(title)

311

In [14]:
#IMDB movies in San Fransisco
movies = movies[movies['title'].isin(title)]

In [15]:
len(movies)

134

In [16]:
#Find top 10 most popular location
popular_location = location['Locations'].value_counts().head(10)

In [17]:
popular_location = list(popular_location.index)

In [18]:
popular_location

['Golden Gate Bridge',
 'City Hall',
 'Fairmont Hotel (950 Mason Street, Nob Hill)',
 'Treasure Island',
 'Coit Tower',
 'Palace of Fine Arts (3301 Lyon Street)',
 'Chinatown',
 'Bay Bridge',
 'Grace Cathedral Episcopal Church (1100 California Street)',
 'Hall of Justice (850 Bryant Street)']

In [19]:
#Fix the object type in worldwide gross income
movies['worldwide_gross_income'] = movies['worldwide_gross_income'].fillna('0000')
movies['worldwide_gross_income'] = movies['worldwide_gross_income'].apply(lambda x: x[2:])
movies['worldwide_gross_income'] = movies['worldwide_gross_income'].astype(int)

In [20]:
#Sort movies by gross income
movies = movies.sort_values(by=['worldwide_gross_income'],ascending=False)

In [21]:
movies.head()

Unnamed: 0,imdb_title_id,title,year,genre,duration,director,actors,avg_vote,worldwide_gross_income
56282,tt1270797,Venom,2018,"Action, Adventure, Sci-Fi",112,Ruben Fleischer,"Tom Hardy, Michelle Williams, Riz Ahmed, Scott...",6.7,856085151
64581,tt2103281,Dawn of the Planet of the Apes,2014,"Action, Adventure, Drama",130,Matt Reeves,"Andy Serkis, Jason Clarke, Gary Oldman, Keri R...",7.6,710644566
28066,tt0109830,Forrest Gump,1994,"Drama, Romance",142,Robert Zemeckis,"Tom Hanks, Rebecca Williams, Sally Field, Mich...",8.8,678229452
76734,tt5095030,Ant-Man and the Wasp,2018,"Action, Adventure, Comedy",118,Peyton Reed,"Paul Rudd, Evangeline Lilly, Michael Peña, Wal...",7.1,622674139
50523,tt0831387,Godzilla,2014,"Action, Adventure, Sci-Fi",123,Gareth Edwards,"Aaron Taylor-Johnson, CJ Adams, Ken Watanabe, ...",6.4,524976069


In [22]:
#Find the top gross title and year
title = []
year = []
for loc in popular_location:
    top1 = list(location[location['Locations'] == loc]['Title'])
    title.append(movies[movies['title'].isin(top1)]['title'].iloc[0])
    year.append(movies[movies['title'].isin(top1)]['year'].iloc[0])

In [23]:
title

['Superman',
 'Dawn of the Planet of the Apes',
 'The Rock',
 'Patch Adams',
 'San Andreas',
 'Forrest Gump',
 'Basic Instinct',
 'The Game',
 'The Towering Inferno',
 'Basic Instinct']

In [24]:
year

[1978, 2014, 1996, 1998, 2015, 1994, 1992, 1997, 1974, 1992]

In [25]:
data = {'Location':popular_location, 'Title':title, 'Year':year}

In [26]:
sf_hits = pd.DataFrame(data)

In [27]:
sf_hits

Unnamed: 0,Location,Title,Year
0,Golden Gate Bridge,Superman,1978
1,City Hall,Dawn of the Planet of the Apes,2014
2,"Fairmont Hotel (950 Mason Street, Nob Hill)",The Rock,1996
3,Treasure Island,Patch Adams,1998
4,Coit Tower,San Andreas,2015
5,Palace of Fine Arts (3301 Lyon Street),Forrest Gump,1994
6,Chinatown,Basic Instinct,1992
7,Bay Bridge,The Game,1997
8,Grace Cathedral Episcopal Church (1100 Califor...,The Towering Inferno,1974
9,Hall of Justice (850 Bryant Street),Basic Instinct,1992
