## Promoting Tourism in San Francisco
<p>San Francisco has been home to many famous films, including the action classic “Bullitt” and the recent science-fiction epic “Rise of the Planet of the Apes”. To celebrate the cinematic history of the city, the tourism board has asked you to perform some analyses.</p>
<p>Their idea is to promote the 10 most popular filming locations in San Franciso. The board plans to create an attraction at each of the 10 locations based on the biggest film (by worldwide income) shot there.</p>
<p>At your disposal are two datasets. One contains every location and film shot in San Franciso. The other dataset contains movie details drawn from the Internet Movie Database (IMDB). </p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:16px"><b>datasets/locations.csv - Filming locations of movies shot in San Francisco since 1924</b>
    </div>
    <div> Source: <a href="https://data.sfgov.org/Culture-and-Recreation/Film-Locations-in-San-Francisco/yitu-d5am">Film Locations in San Francisco</a></div>

<ul>
    <li><b>Title: </b>Title of the movie. Note that some films may share the same title, and are only differentiated by year of release.</li>
    <li><b>Release Year: </b>Year of release.</li>
    <li><b>Locations: </b>Name of location in San Francisco where a scene was shot for the movie.</li>
    <li><b>Production Company: </b>Company that produced the film.</li>
    <li><b>Distributor: </b>Company that distributed the film.</li>
</ul>
    </div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6; margin-top: 17px;">
    <div style="font-size:16px"><b>datasets/imdb_movies.csv - Data on over 85,000 movies up to 2020</b>
    </div>
    <div>Source: <a href="https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset">Kaggle (IMDb movies extensive dataset)</a></div>
<ul>
    <li><b>imdb_title_id: </b>Unique film id.</li>
    <li><b>title: </b>Title of the film. Note that some films may share the same title, and are only differentiated by year of release.</li>
    <li><b>year: </b>The year of release.</li> 
    <li><b>genre: </b>The genres of the film. The primary genre of the film is the first genre listed.</li>
    <li><b>duration: </b>The duration of the film in minutes.</li>
    <li><b>director: </b>The name of the director.</li>
    <li><b>actors: </b>The leading actors of the film.</li>
    <li><b>avg_vote: </b>Average review given to the film.</li>
    <li><b>worldwide_gross_income: </b>Total income for the film worldwide in US dollars.</li>
</ul>
    </div>

In [15]:
# PACKAGES
library(readr)
library(dplyr)
library(stringr)

# LOADING THE DATASETS
locations <- read_csv('datasets/locations.csv')
imdb_movies <- read_csv('datasets/imdb_movies.csv')

# FINDING THE MOST POPULAR LOCATIONS IN SF
popular_locations <- locations %>%
    filter(!is.na(Locations)) %>% #Drop the rows with no location
    group_by(Locations) %>%
    count() %>%
    arrange(desc(n)) %>%
    head(10)

# FILTERING THE MOVIES FILMED ON THE MOST POPULAR LOCATIONS
location_filter <- popular_locations$Locations

popular_movies <- locations %>%
    select(-`Production Company`, -Distributor) %>% #Not interested in the Production Company or Distributor information
    filter(Locations %in% location_filter)

# ADDING THE GENRE AND GROSS INCOME FOR EACH MOVIE
imdb_movies_selected <- imdb_movies %>%
    select(title, year, genre, avg_vote, worldwide_gross_income)

movies <- left_join(popular_movies,
                    imdb_movies_selected,
                    by = c("Title" = "title", "Release Year" = "year")) #Some films may share the same title and are only differentiated by year of release

movies_filtered <- movies %>%
    filter(avg_vote > 6) %>%
    filter(str_detect(tolower(genre), "action") | str_detect(tolower(genre), "drama") | str_detect(tolower(genre), "biography")) %>%
    mutate(gross_income_numeric = as.numeric(str_replace(worldwide_gross_income, "\\$", "")))

# FINDING THE HIGHEST GROSSING FILM FOR EACH LOCATION
sf_hits <- movies_filtered %>%
    filter(!is.na(gross_income_numeric)) %>%
    group_by(Locations) %>%
    filter(gross_income_numeric == max(gross_income_numeric)) %>%
    arrange(match(Locations, location_filter)) %>%
    mutate(`Release Year` = as.integer(`Release Year`)) %>% #Year from numeric to integer
    select(Location = Locations, Title, Year =`Release Year`)

sf_hits

[1mRows: [22m[34m1743[39m [1mColumns: [22m[34m5[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): Title, Locations, Production Company, Distributor
[32mdbl[39m (1): Release Year


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.

[1mRows: [22m[34m85854[39m [1mColumns: [22m[34m9[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (6): imdb_title_id, title, genre, director, actors, worldwide_gross_income
[32mdbl[39m (3): year, duration, avg_vote


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this 

Location,Title,Year
<chr>,<chr>,<int>
Golden Gate Bridge,Superman,1978
City Hall,Dawn of the Planet of the Apes,2014
"Fairmont Hotel (950 Mason Street, Nob Hill)",The Rock,1996
Treasure Island,Patch Adams,1998
Coit Tower,San Andreas,2015
Palace of Fine Arts (3301 Lyon Street),Forrest Gump,1994
Chinatown,Basic Instinct,1992
Bay Bridge,The Game,1997
Grace Cathedral Episcopal Church (1100 California Street),The Towering Inferno,1974
Hall of Justice (850 Bryant Street),Basic Instinct,1992
