# Now You Code 4: Movie Goers Zipcode Lookup

The movie company has hired you to help them enhance their data set. They would like to know which **US State** each of the respondents in their movie goers survey comes from, and ask you to produce a list of states and a count of movie goers from that state.

The movie goers dataset `'NYC1-moviegoers.csv'` from NYC1 contains `'zip_code'` but not city and state.

We will load another pandas dataset, **the Zipcode Database** here: 
`'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv'` This data set contains Zip codes with primary city, state and approximate location.

Your goal is to figure out how to use the `DataFrame.merge()` method to combine these two data sets on matching zip code values.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html 

After you merge the dataset, then you can complete the task and provide a count of movie goers by state.


In [1]:
# import pandas
import pandas as pd

# this turns off warning messages
import warnings
warnings.filterwarnings('ignore')

### Part 1: Load the movie goers dataset into a Pandas DataFrame

Write code to load the movie goers dataset (in csv format) into the variable `moviegoers` and then print the first few rows. 

In [2]:
moviegoers = pd.read_csv("NYC1-moviegoers.csv")
moviegoers.sample(10)

Unnamed: 0,user_id,age,gender,occupation,zip_code
342,343,43,M,engineer,30093
779,780,49,M,programmer,94560
135,136,51,M,other,97365
136,137,50,M,educator,84408
431,432,22,M,entertainment,50311
473,474,51,M,executive,93711
476,477,23,F,student,2125
901,902,45,F,artist,97203
857,858,63,M,educator,9645
509,510,34,M,other,98038


### Part 2: Load the zip code database into a Pandas DataFrame

Write code to load the movie goers dataset (in csv format) into the variable `zipcodes` and then print the first few rows. 

The database (in csv format) can be found here: `'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv'`  

**HINT:**  You must include the named argument `dtype={'Zipcode': object}` to the `read_csv()` method to force the `Zipcode` series to be the same type as in the `moviegoers` dataframe.

In [9]:
dtype = {"Zipcode": object}
dtype = pd.read_csv('https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv', dtype = {"Zipcode" : object})
dtype.sample(10)

Unnamed: 0,Zipcode,ZipCodeType,City,State,LocationType,Lat,Long,Location,Decommisioned,TaxReturnsFiled,EstimatedPopulation,TotalWages
42402,77097,STANDARD,HOUSTON,TX,PRIMARY,29.76,-95.36,NA-US-TX-HOUSTON,True,,,
5358,14605,STANDARD,ROCHESTER,NY,PRIMARY,43.16,-77.61,NA-US-NY-ROCHESTER,False,3841.0,6470.0,61777051.0
18419,80836,STANDARD,STRATTON,CO,PRIMARY,39.3,-102.6,NA-US-CO-STRATTON,False,519.0,925.0,12978098.0
2154,6489,STANDARD,SOUTHINGTON,CT,PRIMARY,41.6,-72.88,NA-US-CT-SOUTHINGTON,False,16183.0,28496.0,854742647.0
32204,78469,PO BOX,CORPUS CHRISTI,TX,PRIMARY,27.8,-97.39,NA-US-TX-CORPUS CHRISTI,False,429.0,693.0,15979579.0
38247,21531,STANDARD,FRIENDSVILLE,MD,PRIMARY,39.66,-79.4,NA-US-MD-FRIENDSVILLE,False,1004.0,1787.0,27624499.0
32672,77982,PO BOX,PORT O CONNOR,TX,PRIMARY,28.44,-96.4,NA-US-TX-PORT O CONNOR,False,513.0,909.0,14980762.0
31312,76466,PO BOX,OLDEN,TX,PRIMARY,32.42,-98.73,NA-US-TX-OLDEN,False,,,
8349,25854,STANDARD,HICO,WV,PRIMARY,38.11,-80.94,NA-US-WV-HICO,False,446.0,821.0,14043779.0
40545,29209,STANDARD,COLUMBIA,SC,PRIMARY,34.0,-81.03,NA-US-SC-COLUMBIA,False,15114.0,25598.0,558351530.0


### Part 3: Merge both data sets into a single combined DataFrame

Next we must merge the `moviegoers` DataFrame with the `zipcodes` DataFrame. To do this you must specify which zip code column from `moviegoers` matches the zip cod column from `zipcodes` (as you can see they have different names).

```
Help on method merge in module pandas.core.frame:

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) method of pandas.core.frame.DataFrame instance
    Merge DataFrame objects by performing a database-style join operation by
    columns or indexes.
```

The type of merge we will do is an `inner`, because we only want rows when the zip codes match. This is called an *intersection*.

To complete a merge we must specify the column names from the left and right DataFrames.  Most of the code has been written for you. Your task is to complete the columns for the merge, replacing `????` with the appropriate column names.

In [12]:
result = pd.merge(moviegoers, dtype, how = "inner", on=None, left_on="zip_code", right_on="Zipcode", left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False)
result.sample(10)

Unnamed: 0,user_id,age,gender,occupation,zip_code,Zipcode,ZipCodeType,City,State,LocationType,Lat,Long,Location,Decommisioned,TaxReturnsFiled,EstimatedPopulation,TotalWages
416,401,46,F,healthcare,84107,84107,STANDARD,SALT LAKE CITY,UT,PRIMARY,40.65,-111.88,NA-US-UT-SALT LAKE CITY,False,14631.0,24075.0,488226100.0
219,185,53,F,librarian,97403,97403,STANDARD,EUGENE,OR,PRIMARY,44.03,-123.05,NA-US-OR-EUGENE,False,3668.0,5658.0,136848700.0
102,640,20,M,student,61801,61801,STANDARD,URBANA,IL,PRIMARY,40.1,-88.2,NA-US-IL-URBANA,False,9801.0,14473.0,282794000.0
704,759,20,F,student,68503,68503,STANDARD,LINCOLN,NE,PRIMARY,40.81,-96.68,NA-US-NE-LINCOLN,False,6301.0,9617.0,144304600.0
161,134,31,M,programmer,80236,80236,STANDARD,DENVER,CO,PRIMARY,39.76,-104.87,NA-US-CO-DENVER,False,6910.0,11942.0,226626500.0
187,263,41,M,programmer,55346,55346,STANDARD,EDEN PRAIRIE,MN,PRIMARY,44.84,-93.45,NA-US-MN-EDEN PRAIRIE,False,8543.0,15754.0,603815400.0
382,363,20,M,student,87501,87501,STANDARD,SANTA FE,NM,PRIMARY,35.67,-105.95,NA-US-NM-SANTA FE,False,8454.0,12352.0,281913400.0
170,146,45,M,artist,83814,83814,STANDARD,COEUR D ALENE,ID,PRIMARY,47.59,-116.91,NA-US-ID-COEUR D ALENE,False,10311.0,16915.0,316216500.0
291,264,36,F,writer,90064,90064,STANDARD,LOS ANGELES,CA,PRIMARY,34.03,-118.43,NA-US-CA-LOS ANGELES,False,13940.0,22483.0,1105519000.0
218,184,37,M,librarian,76013,76013,STANDARD,ARLINGTON,TX,PRIMARY,32.69,-97.12,NA-US-TX-ARLINGTON,False,14338.0,23891.0,522424400.0


### Part 4: Merge both data sets into a single combined DataFrame

Finally, produce the desired output a list of states and counts of movie goers from the survey in each state.

Here's the top 5 for reference:

```
CA    116
MN     78
NY     60
TX     51
IL     50
```

In [14]:
result["State"].value_counts()

CA    116
MN     78
NY     60
TX     51
IL     50
PA     34
OH     32
VA     27
MD     27
FL     24
WA     24
MI     23
WI     22
OR     20
CO     20
GA     19
NC     19
MO     17
AZ     14
DC     14
IA     14
TN     12
KY     11
SC     11
IN      9
UT      9
OK      9
ID      7
LA      6
NE      6
AK      5
KS      4
WV      3
MS      3
AL      3
DE      3
NV      3
NM      2
MT      2
HI      2
ND      2
AR      1
WY      1
SD      1
AP      1
Name: State, dtype: int64

## Step 5: Questions

1. Pandas programs are different than typical Python programs. Explain the process by which you got the final solution?

a. . the final solution comes from pandas taking info from datasets and assigns them to columns/rows making it an easier/cleaner version of reading a file.
2. What was the most difficult aspect of this assignment? 
a. coding specific parts and the little details that give back name/key/value errors.

## Reminder of Evaluation Criteria

1. What the problem attempted (analysis, code, and answered questions) ?
2. What the problem analysis thought out? (does the program match the plan?)
3. Does the code execute without syntax error?
4. Does the code solve the intended problem?
5. Is the code well written? (easy to understand, modular, and self-documenting, handles errors)
