#### Regression | Project Proposal

# Exporting American Movie Adaptations 

## Abstract

Attempt to build a linear regression model to predict international movie gross revenue based on 44 years of movie data available at [boxofficemojo.com](https://www.boxofficemojo.com/genre/sg1625223425/?ref_=bo_gs_table_10) to decide which movie adaptations to export. The Ridge regression model showed a good fit based on an R<sup>2</sup> score. However, the mean absolute error showed the model was off by `$`42.7 to `$`45.7 million. Feature transformation and engineering did not improve the model.  

## Design

Recent press has reported [book adaptations](https://www.forbes.com/sites/adamrowe1/2018/07/11/why-book-based-films-earn-53-more-at-the-worldwide-box-office/?sh=290c1dfc306f) earn more worldwide at the box office. A small [2020 survey](https://www.bhg.com/news/book-vs-movie-debate/) of Americans found that respondents almost equally enjoyed the book as much as a movie adaptation. There are [super earning](https://filmthreat.com/features/what-makes-a-book-to-film-adaptation-so-successful/) book-to-film adaptations which followed super earning book sales, such as the *Harry Potter* series and *Lord of the Rings*. Could this also be true of other countries? 

The [original source for adaptation](https://stephenfollows.com/highest-grossing-movie-adaptations/) can influence its success. And certain artistic qualities make [book-to-movie](https://news.northeastern.edu/2012/03/27/kelly/) adaptations successful, but they can be subjective and hard to model. 

The (fictitious) international movie distributor, Movies Worldwide, Inc., wants a model to inform their choice of which movie adaptations to export. Their niche market is to export movies with affordable movie distribution rights. 

**Research Question:** Can a model predict a movie adaptation's international total gross revenue based on movie data available on boxofficemojo.com?

## Data

This analysis uses data scraped from boxofficemojo.com, specifically the data for movie adaptations of books<sup>1</sup>, television shows<sup>2</sup>, events, video games, and plays from 1978 to 20222. 

The linear regression model has robust domestic data. However, it did not result in a powerful model to predict international gross revenue for movie adaptations. Feature transformations include converting categorical data to dummy variables (MPAA ratings and distributors) and a log transformation (domestic total gross). Engineered features include interactions, such as profit (domestic total gross - budget), opening profit (domestic opening revenue - budget), and opening data (domestic opening revenue * opening_theathers). 

The ridge model regression with selected features scored almost an equal R<sup>2</sup> score on train and test data, suggesting a good fit. However, evaluation with mean absolute error metric showed that the model was off by `$`45.5 million, after tuning the model is decreased, to `$`42.7 million.  

The analysis showed that predicting international gross revenue based on domestic data in this model is not reliable enough to make investment decisions. Recommend looking at country-specific box office revenue from Boxofficemojo.com, which could better predict international gross revenue. 
<br></br>
<br></br>
<sup>1</sup> Books include young adult novels, contemporary novels, children's books, and comic books. 

<sup>2</sup> Television shows include children shows and cartoons.  


## Tools

* Requests, BeautifulSoup, pickle  <br/>
* Pandas, Numpy, Statsmodels, Scikit-learn <br/>


## Communication

Slides and code are available on https://github.com/slp22/regression-project.
<br/>
<br/>
<br/>

#### Figure 1. 
### Heatmap of Adaptation Movie Dataframe Correlations

![heatmap.png](attachment:heatmap.png)

#### Figure 2. 
### Simple Linear Regression Model: Residuals vs. Predicted

![lin-reg-residuals-v-predicted.png](attachment:lin-reg-residuals-v-predicted.png)

#### Figure 3. 
### Tuned Linear Regression Model: Residuals vs. Predicted

![lin-reg-3-res-v-pred.png](attachment:lin-reg-3-res-v-pred.png)