# Team Popcorn

---

You're a force to be reckoned with. You are team "Popcorn". Working for a big movie studio, you need to report on metrics that will help: **A)** sell movie ideas to potential investors, and **B)** maximize product placement and / or sponsorships.

**Your studios lead data scientist has given you some direction / starting points:**
 - Which movies remained in the top 10 the longest?
 - Which movies were good investments?
 - Are there any interesting trends throughout the year?
 - Google anything interesting about flagship movies in terms of partnerships and how those deals could be relevant to consider in our own research.
 
**Bonus:**
 - Do any holidays impact sales performance or position?  How could we leverage this?
 - What could we look at outside our dataset that may help project good investments?


#### End with some kind of recommendation for new partnership engagements, tied to your opening goals / metrics.

How reliable can your reocmmendation(s) be?  Back up your assumptions with facts and data.

_[There's a data dictionary available!](http://www.amstat.org/publications/jse/v17n1/datasets.mclaren.html)_

Keep in mind the main points when presenting your findings!  It's interesting to share details and sidepoints, but make sure they're supporting and relating to pitching movies to investment, and helping maximize our partnership goals.


In [15]:
import pandas as pd
import numpy as np

movies = pd.read_csv("./movie_weekend.csv")
movies.head()

Unnamed: 0,NUMBER,MOVIE,WEEK_NUM,WEEKEND_PER_THEATER,WEEKEND_DATE
0,1.0,A Beautiful Mind,1.0,701.0,12/21/01
1,1.0,A Beautiful Mind,2.0,14820.0,12/28/01
2,1.0,A Beautiful Mind,3.0,8940.0,1/4/02
3,1.0,A Beautiful Mind,4.0,6850.0,1/11/02
4,1.0,A Beautiful Mind,5.0,5280.0,1/18/02


In [25]:
# weekend per box office number of reciepts
g = movies['WEEKEND_PER_THEATER'].groupby(movies['MOVIE'])

In [27]:
g.sum().sort_values(ascending = False)[:10]

MOVIE
Star Wars                   228181.0
ET                          201257.0
Empire Strikes Back, The    178013.0
American Beauty             165891.0
Titanic                     165701.0
Return of the Jedi          163572.0
Million Dollar Baby         154115.0
Chicago                     146062.0
Raiders of the Lost Ark     144778.0
Forrest Gump                128534.0
Name: WEEKEND_PER_THEATER, dtype: float64

In [30]:
most = movies['WEEK_NUM'].groupby(movies['MOVIE'])

In [34]:
most.max().sort_values(ascending = False)[:10]

MOVIE
ET                         52.0
Raiders of the Lost Ark    43.0
Return of the Jedi         42.0
Forrest Gump               42.0
Titanic                    41.0
American Beauty            38.0
Chicago                    36.0
Beverly Hills Cop          33.0
Shakespeare in Love        33.0
Gladiator                  33.0
Name: WEEK_NUM, dtype: float64

In [39]:
per_bo_per_weekend = g.sum() / most.max()
per_bo_per_weekend.sort_values(ascending = False)[:10]

MOVIE
Empire Strikes Back, The    11867.533333
Star Wars                    7360.677419
Million Dollar Baby          6164.600000
Spider-Man                   5865.875000
Spider-Man 3                 5864.000000
Batman                       5604.692308
Shrek 2                      5564.000000
Shrek the Third              5366.333333
Jurassic Park                4643.809524
American Beauty              4365.552632
dtype: float64

In [57]:
g.mean().sort_values(ascending = False)[:10]

MOVIE
Empire Strikes Back, The    11867.533333
Star Wars                    7360.677419
Million Dollar Baby          6164.600000
Spider-Man                   5865.875000
Spider-Man 3                 5864.000000
Batman                       5604.692308
Shrek 2                      5564.000000
Shrek the Third              5366.333333
Jurassic Park                4643.809524
American Beauty              4365.552632
Name: WEEKEND_PER_THEATER, dtype: float64

In [51]:
movies['date'] = pd.to_datetime(movies['WEEKEND_DATE'])
movies['month'] = movies['date'].apply(lambda x: x.month)

In [52]:
movies.head()

Unnamed: 0,NUMBER,MOVIE,WEEK_NUM,WEEKEND_PER_THEATER,WEEKEND_DATE,date,month
0,1.0,A Beautiful Mind,1.0,701.0,12/21/01,2001-12-21,12.0
1,1.0,A Beautiful Mind,2.0,14820.0,12/28/01,2001-12-28,12.0
2,1.0,A Beautiful Mind,3.0,8940.0,1/4/02,2002-01-04,1.0
3,1.0,A Beautiful Mind,4.0,6850.0,1/11/02,2002-01-11,1.0
4,1.0,A Beautiful Mind,5.0,5280.0,1/18/02,2002-01-18,1.0


In [55]:
months = movies['WEEKEND_PER_THEATER'].groupby(movies['month']).mean()
months.sort_values(ascending = False)

month
6.0     5747.409091
5.0     5666.909091
7.0     4897.986111
12.0    4428.346535
1.0     4179.762887
11.0    3723.229167
8.0     3158.563380
2.0     2845.222222
3.0     2357.727273
9.0     2292.787611
10.0    1698.093750
4.0     1509.526316
Name: WEEKEND_PER_THEATER, dtype: float64

In [58]:
movies['year'] = movies['date'].apply(lambda x: x.year)
years = movies['WEEKEND_PER_THEATER'].groupby(movies['year']).mean()
years.sort_values(ascending = False)

year
1980.0    13985.500000
1990.0    10547.428571
1977.0     8236.909091
1989.0     5604.692308
1978.0     5218.777778
1993.0     5014.736842
1998.0     4717.560000
1981.0     4155.000000
1997.0     4067.222222
2002.0     4047.650602
1994.0     4039.625000
1982.0     4034.854167
1984.0     3679.818182
1983.0     3642.277778
2004.0     3631.077670
2005.0     3565.937500
1999.0     3292.270492
2003.0     3082.861702
2007.0     3074.260870
1996.0     2994.208333
2000.0     2871.819672
1985.0     2780.724138
2006.0     2737.015152
2001.0     2559.634409
1991.0     2375.652174
1995.0     1244.821429
Name: WEEKEND_PER_THEATER, dtype: float64

`how = 'any'` sfsfs