## Reading DataFrames from multiple files
When data is spread among several files, you usually invoke pandas' `read_csv()` (or a similar data import function) multiple times to load the data into several DataFrames.

The data files for this example have been derived from a list of Olympic medals awarded between 1896 & 2008 compiled by the Guardian.

The column labels of each DataFrame are `NOC`, `Country`, & `Total` where `NOC` is a three-letter code for the name of the country and `Total` is the number of medals of that type won (bronze, silver, or gold).

In [2]:
import numpy as np
import pandas as pd

In [2]:
# Import pandas
import pandas as pd

# Read 'Bronze.csv' into a DataFrame: bronze
bronze = pd.read_csv('../data/23. Uniendo datafarmes/Bronze.csv', sep=';')

# Read 'Silver.csv' into a DataFrame: silver
silver = pd.read_csv('../data/23. Uniendo datafarmes/Silver.csv', sep=';')

# Read 'Gold.csv' into a DataFrame: gold
gold = pd.read_csv('../data/23. Uniendo datafarmes/Gold.csv', sep=';')

# Print the first five rows of gold
gold.head()

Unnamed: 0,NOC,Total
0,USA,930
1,,395
2,GER,247
3,GBR,207
4,FRA,192


## Reading DataFrames from multiple files in a loop

Notice that this approach is not restricted to working with CSV files. That is, even if your data comes in other formats, as long as pandas has a suitable data import function, you can apply a loop or comprehension to generate a list of DataFrames imported from the source files.

In [4]:
# Create the list of file names: filenames
path = '../data/23. Uniendo datafarmes/'
filenames = ['Gold.csv', 'Silver.csv', 'Bronze.csv']

# Create the list of three DataFrames: dataframes
dataframes = []
for filename in filenames:
    dataframes.append(pd.read_csv(path+filename, sep=';'))

# Print top 5 rows of 1st DataFrame in dataframes
dataframes[0].head()

Unnamed: 0,NOC,Total
0,USA,930
1,,395
2,GER,247
3,GBR,207
4,FRA,192


## Combining DataFrames from multiple data files
In this exercise, you'll combine the three DataFrames from earlier exercises - gold, silver, & bronze - into a single DataFrame called `medals`.  

The approach you'll use here is clumsy. Later on in the course, you'll see various powerful methods that are frequently used in practice for concatenating or merging DataFrames.

In [5]:
# Make a copy of gold: medals
medals = gold.copy()

# Create list of new column labels: new_labels
new_labels = ['NOC', 'Gold']

# Rename the columns of medals using new_labels
medals.columns = new_labels

# Add columns 'Silver' & 'Bronze' to medals
medals['Silver'] = silver['Total']
medals['Bronze'] = bronze['Total']

# Print the head of medals
medals.head()

Unnamed: 0,NOC,Gold,Silver,Bronze
0,USA,930,728,639
1,,395,319,296
2,GER,247,284,320
3,GBR,207,255,252
4,FRA,192,212,234


## Appending pandas Series
In this exercise, you'll load sales data from the months January, February, and March into DataFrames. Then, you'll extract Series with the 'Units' column from each and append them together with method chaining using `.append()`.

To check that the stacking worked, you'll print slices from these Series, and finally, you'll add the result to figure out the total units sold in the first quarter.

In [45]:
# Load 'sales-jan-2015.csv' into a DataFrame: jan
jan = pd.read_csv('../data/23. Uniendo datafarmes//sales-jan-2015.csv', parse_dates=True, index_col='Date')

# Load 'sales-feb-2015.csv' into a DataFrame: feb
feb = pd.read_csv('../data/23. Uniendo datafarmes/sales-feb-2015.csv', parse_dates=True, index_col='Date')

# Load 'sales-mar-2015.csv' into a DataFrame: mar
mar = pd.read_csv('../data/23. Uniendo datafarmes/sales-mar-2015.csv', parse_dates=True, index_col='Date')

In [46]:
jan.head()

Unnamed: 0_level_0,Company,Product,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-01-21 19:13:21,Streeplex,Hardware,11
2015-01-09 05:23:51,Streeplex,Service,8
2015-01-06 17:19:34,Initech,Hardware,17
2015-01-02 09:51:06,Hooli,Hardware,16
2015-01-11 14:51:02,Hooli,Hardware,11


In [10]:

# Extract the 'Units' column from jan: jan_units
jan_units = jan['Units']

# Extract the 'Units' column from feb: feb_units
feb_units = feb['Units']

# Extract the 'Units' column from mar: mar_units
mar_units = mar['Units']

# Append feb_units and then mar_units to jan_units: quarter1
quarter1 = jan_units.append(feb_units).append(mar_units)

quarter1.head()

Date
2015-01-21 19:13:21    11
2015-01-09 05:23:51     8
2015-01-06 17:19:34    17
2015-01-02 09:51:06    16
2015-01-11 14:51:02    11
Name: Units, dtype: int64

In [11]:
# Print the second slice from quarter1
quarter1.loc['feb 26, 2015':'mar 7, 2015']

Date
2015-02-26 08:57:45     4
2015-02-26 08:58:51     1
2015-03-06 10:11:45    17
2015-03-06 02:03:56    17
Name: Units, dtype: int64

In [14]:
quarter2 = jan_units.append(feb_units, ignore_index=True).append(mar_units, ignore_index=True)
quarter2.head()

0    11
1     8
2    17
3    16
4    11
Name: Units, dtype: int64

## Concatenating pandas Series along row axis
Having learned how to append Series, you'll now learn how to achieve the same result by concatenating Series instead. You'll continue to work with the sales data you've seen previously. This time, the DataFrames jan, feb, and mar have been pre-loaded.

Your job is to use `pd.concat()` with a list of Series to achieve the same result that you would get by chaining calls to `.append()`.

You may be wondering about the difference between `pd.concat()` and pandas' `.append()` method. One way to think of the difference is that `.append()` is a specific case of a concatenation, while `pd.concat()` gives you more flexibility, as you'll see in later exercises.

In [12]:
# Initialize empty list: units
units = []

# Build the list of Series
for month in [jan, feb, mar]:
    units.append(month['Units'])

# Concatenate the list: quarter1
quarter1 = pd.concat([jan, feb, mar])

# Print slices from quarter1
quarter1.loc['feb 26, 2015':'mar 7, 2015']

Unnamed: 0_level_0,Company,Product,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-02-26 08:57:45,Streeplex,Service,4
2015-02-26 08:58:51,Streeplex,Service,1
2015-03-06 10:11:45,Mediacore,Software,17
2015-03-06 02:03:56,Mediacore,Software,17


# Concatenating pandas DataFrames along column axis
The function `pd.concat()` can concatenate DataFrames horizontally as well as vertically (vertical is the default). To make the DataFrames stack horizontally, you have to specify the keyword argument `axis=1` or `axis='columns'`.

In this exercise, you'll use weather data with maximum and mean daily temperatures sampled at different rates (quarterly versus monthly). You'll concatenate the rows of both and see that, where rows are missing in the coarser DataFrame, null values are inserted in the concatenated DataFrame. This corresponds to an outer join (which you will explore in more detail in later exercises).

In [22]:
weather = pd.read_csv('../data/23. Uniendo datafarmes/monthly_max_temp.csv')
weather

Unnamed: 0,Date,Max TemperatureF,Mean TemperatureF,Min TemperatureF,Max Dew PointF,MeanDew PointF,Min DewpointF,Max Humidity,Mean Humidity,Min Humidity,...,Max VisibilityMiles,Mean VisibilityMiles,Min VisibilityMiles,Max Wind SpeedMPH,Mean Wind SpeedMPH,Max Gust SpeedMPH,PrecipitationIn,CloudCover,Events,WindDirDegrees
0,2013-1-1,32,28,21,30,27,16,100,89,77,...,10,6,2,10,8,,0.00,8,Snow,277
1,2013-1-2,25,21,17,14,12,10,77,67,55,...,10,10,10,14,5,,0.00,4,,272
2,2013-1-3,32,24,16,19,15,9,77,67,56,...,10,10,10,17,8,26.0,0.00,3,,229
3,2013-1-4,30,28,27,21,19,17,75,68,59,...,10,10,6,23,16,32.0,0.00,4,,250
4,2013-1-5,34,30,25,23,20,16,75,68,61,...,10,10,10,16,10,23.0,0.21,5,,221
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,2013-12-27,41,34,27,25,20,14,66,56,48,...,10,10,10,12,8,,0.00,4,,207
361,2013-12-28,52,43,34,27,24,19,56,47,38,...,10,10,10,16,10,20.0,0.00,0,,197
362,2013-12-29,44,42,39,40,37,23,100,87,45,...,10,5,0,15,8,21.0,0.66,8,Fog-Rain,301
363,2013-12-30,41,32,23,37,21,14,87,76,64,...,10,8,1,16,9,26.0,0.01,8,Rain-Snow,320


In [24]:
weather_max = weather['Max TemperatureF']
weather_mean = weather['Mean TemperatureF']

# Create a list of weather_max and weather_mean
weather_list = [weather_max, weather_mean]

# Concatenate weather_list horizontally
weather1 = pd.concat(weather_list, axis='columns') # weather.loc[:,['Max TemperatureF','Mean TemperatureF']]
weather1

Unnamed: 0,Max TemperatureF,Mean TemperatureF
0,32,28
1,25,21
2,32,24
3,30,28
4,34,30
...,...,...
360,41,34
361,52,43
362,44,42
363,41,32


## Reading multiple files to build a DataFrame
It is often convenient to build a large DataFrame by parsing many files as DataFrames and concatenating them all at once. You'll do this here with three files, but, in principle, this approach can be used to combine data from dozens or hundreds of files.

In [31]:
#Initialize an empyy list: medals
medal_types = ['bronze', 'silver', 'gold']
path = '../data/23. Uniendo datafarmes/'
medals =[]

for medal in medal_types:
    # Create the file name: file_name
    file_name = path + '%s.csv' % medal
    # Create list of column names: columns
    columns = ['Country', medal]
    # Read file_name into a DataFrame: medal_df
    medal_df = pd.read_csv(file_name, header=0, index_col='Country', names=columns, sep=';')
    # Append medal_df to medals
    medals.append(medal_df)

In [33]:
# Concatenate medals horizontally: medals_df
medals_df = pd.concat(medals, axis='columns')
medals_df.head()

Unnamed: 0_level_0,bronze,silver,gold
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
USA,639,728,930
,296,319,395
GER,320,284,247
GBR,252,255,207
FRA,234,212,192


## Concatenating vertically to get MultiIndexed rows
When stacking a sequence of DataFrames vertically, it is sometimes desirable to construct a MultiIndex to indicate the DataFrame from which each row originated. This can be done by specifying the keys parameter in the call to `pd.concat()`, which generates a hierarchical index with the labels from keys as the outermost index label. So you don't have to rename the columns of each DataFrame as you load it. Instead, only the Index column needs to be specified.

In [39]:
medal_types = ['Gold', 'Silver', 'Bronze']
medals = []
for medal in medal_types:

    file_name = "../data/23. Uniendo datafarmes/%s.csv" % medal
    
    # Read file_name into a DataFrame: medal_df
    medal_df = pd.read_csv(file_name, sep=';', index_col='NOC')
    
    # Append medal_df to medals
    medals.append(medal_df)

In [40]:
# Concatenate medals: medals
medals_df = pd.concat(medals, keys=['gold', 'silver', 'bronze'])
medals_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Total
Unnamed: 0_level_1,NOC,Unnamed: 2_level_1
gold,USA,930
gold,,395
gold,GER,247
gold,GBR,207
gold,FRA,192


## Concatenating horizontally to get MultiIndexed columns
It is also possible to construct a DataFrame with hierarchically indexed columns. 

In [41]:
medals = pd.concat(medals, keys=['gold', 'silver', 'bronze'], axis=1)
medals

Unnamed: 0_level_0,gold,silver,bronze
Unnamed: 0_level_1,Total,Total,Total
NOC,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
USA,930,728,639
,395,319,296
GER,247,284,320
GBR,207,255,252
FRA,192,212,234
...,...,...,...
SUD,0,1,0
TOG,0,0,1
TGA,0,1,0
UAE,1,0,0


## Concatenating DataFrames from a dict

You'll do this by constructing a dictionary of these DataFrames and then concatenating them.

In [43]:
# Make the list of tuples: month_list
month_list = [('january', jan), ('february', feb), ('march', mar)]

# Create an empty dictionary: month_dict
month_dict = {}

for month_name, month_data in month_list:

    # Group month_data: month_dict[month_name]
    month_dict[month_name] = month_data.groupby('Company').sum()

# Concatenate data in month_dict: sales
sales = pd.concat(month_dict)

In [44]:
sales.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Units
Unnamed: 0_level_1,Company,Unnamed: 2_level_1
january,Acme Coporation,76
january,Hooli,70
january,Initech,37
january,Mediacore,15
january,Streeplex,50


# Concatenating DataFrames with inner join

Your task is to compute an inner join.

In [48]:
# Create the list of DataFrames: medal_list
medal_list = [bronze, silver, gold]

# Concatenate medal_list horizontally using an inner join: medals
medals = pd.concat(medal_list, join='inner',keys=['bronze', 'silver', 'gold'], axis=1)
medals.head()

Unnamed: 0_level_0,bronze,bronze,silver,silver,gold,gold
Unnamed: 0_level_1,NOC,Total,NOC,Total,NOC,Total
0,USA,639,USA,728,USA,930
1,,296,,319,,395
2,GER,320,GER,284,GER,247
3,GBR,252,GBR,255,GBR,207
4,FRA,234,FRA,212,FRA,192


# JOINs

<img src= '../images/joins.jpg'>

Understanding the difference between a one-to-one and one-to-many relationship is a useful skill.

## Merging on a specific column

Let's merge dataframes using the `pd.merge` method from pandas and `on=`. It merges DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

It is necessary to understand that inner joins only return the rows with matching values in both tables.

In [55]:
revenue = pd.DataFrame({'city':['Austin','Denver','Springfield'], 'revenue':[100, 83, 4], 'branch_id':[10, 20, 31]})
revenue

Unnamed: 0,city,revenue,branch_id
0,Austin,100,10
1,Denver,83,20
2,Springfield,4,31


In [54]:
managers = pd.DataFrame({'city':['Austin','Denver','Mendocino'], 'manager':['Charlers','Joel','Brett'], 'branch_id':[10, 20, 47]})
managers

Unnamed: 0,city,manager,branch_id
0,Austin,Charlers,10
1,Denver,Joel,20
2,Mendocino,Brett,47


In [56]:
merge_by_city = pd.merge(revenue, managers, on='city')
merge_by_city

Unnamed: 0,city,revenue,branch_id_x,manager,branch_id_y
0,Austin,100,10,Charlers,10
1,Denver,83,20,Joel,20


In [58]:
merge_by_id = pd.merge(revenue, managers, on='branch_id')
merge_by_id

Unnamed: 0,city_x,revenue,branch_id,city_y,manager
0,Austin,100,10,Austin,Charlers
1,Denver,83,20,Denver,Joel


In [3]:
select*from m

SyntaxError: invalid syntax (<ipython-input-3-c8cc6f58e78f>, line 1)

## Merging on columns with non-matching labels

Given this, it will take a bit more work for you to join or merge on the city/branch name. You have to specify the `left_on` and `right_on` parameters in the call to `pd.merge()`.

In [59]:
managers = pd.DataFrame({'branch':['Austin','Denver','Mendocino'], 'manager':['Charlers','Joel','Brett'], 'branch_id':[10, 20, 47]})
revenue = pd.DataFrame({'city':['Austin','Denver','Springfield'], 'revenue':[100, 83, 4], 'branch_id':[10, 20, 31]})

In [60]:
combined = pd.merge(revenue, managers, left_on='city', right_on='branch')
combined

Unnamed: 0,city,revenue,branch_id_x,branch,manager,branch_id_y
0,Austin,100,10,Austin,Charlers,10
1,Denver,83,20,Denver,Joel,20


## Merging on multiple columns

Your goal in this exercise is to use `pd.merge()` to merge DataFrames using multiple columns (using `'branch_id'`, `'city'`, and `'state'` in this case).

In [66]:
revenue = pd.DataFrame({'city':['Austin','Denver','Springfield','Mendocino'], 'revenue':[100, 83, 4, 200], 'branch_id':[10, 20, 30, 47]})
managers = pd.DataFrame({'city':['Austin','Denver','Mendocino','Springfield'], 'manager':['Charlers','Joel','Brett','Sally'], 'branch_id':[10, 20, 47,31]})

In [67]:
# Add 'state' column to revenue: revenue['state']
revenue['state'] = ['TX','CO','IL','CA']

# Add 'state' column to managers: managers['state']
managers['state'] = ['TX','CO','CA','MO']

In [68]:
# Merge revenue & managers on 'branch_id', 'city', & 'state': combined
combined = pd.merge(revenue, managers, on=['branch_id', 'city', 'state'])
combined

Unnamed: 0,city,revenue,branch_id,state,manager
0,Austin,100,10,TX,Charlers
1,Denver,83,20,CO,Joel
2,Mendocino,200,47,CA,Brett


## Left & right merging on multiple columns

By merging `revenue` and `sales` with a right merge, you can identify the missing revenue values.

In [76]:
sales = pd.DataFrame({'city': {0: 'Mendocino', 1: 'Denver', 2: 'Austin', 3: 'Springfield', 4: 'Springfield'},
         'state': {0: 'CA', 1: 'CO', 2: 'TX', 3: 'MO', 4: 'IL'},
         'units': {0: 1, 1: 4, 2: 2, 3: 5, 4: 1}})
sales

Unnamed: 0,city,state,units
0,Mendocino,CA,1
1,Denver,CO,4
2,Austin,TX,2
3,Springfield,MO,5
4,Springfield,IL,1


In [77]:
revenue = pd.DataFrame({'branch_id': {0: 10, 1: 20, 2: 30, 3: 47}, 'city': {0: 'Austin', 1: 'Denver', 2: 'Springfield', 3: 'Mendocino'},
           'revenue': {0: 100, 1: 83, 2: 4, 3: 200},
           'state': {0: 'TX', 1: 'CO', 2: 'IL', 3: 'CA'}})
revenue

Unnamed: 0,branch_id,city,revenue,state
0,10,Austin,100,TX
1,20,Denver,83,CO
2,30,Springfield,4,IL
3,47,Mendocino,200,CA


Here, you don't need to specify `left_on` or `right_on` because the columns to merge on have matching labels.

In [78]:
revenue_and_sales = pd.merge(revenue, sales, how='right', on=['city','state'])
revenue_and_sales

Unnamed: 0,branch_id,city,revenue,state,units
0,10.0,Austin,100.0,TX,2
1,20.0,Denver,83.0,CO,4
2,30.0,Springfield,4.0,IL,1
3,47.0,Mendocino,200.0,CA,1
4,,Springfield,,MO,5


Setting `how='left'` with the `.merge()` method is a useful technique for enriching or enhancing a dataset with additional information from a different table. 

By merging sales and managers with a left merge, you can identify the missing manager.

In [83]:
managers = pd.DataFrame({'branch': {0: 'Austin', 1: 'Denver', 2: 'Mendocino', 3: 'Springfield'},
            'branch_id': {0: 10, 1: 20, 2: 47, 3: 31},
            'manager': {0: 'Charlers', 1: 'Joel', 2: 'Brett', 3: 'Sally'},
            'state': {0: 'TX', 1: 'CO', 2: 'CA', 3: 'MO'}})
managers

Unnamed: 0,branch,branch_id,manager,state
0,Austin,10,Charlers,TX
1,Denver,20,Joel,CO
2,Mendocino,47,Brett,CA
3,Springfield,31,Sally,MO


In [84]:
sales_and_managers = pd.merge(sales, managers, how='left', left_on=['city', 'state'],right_on=['branch', 'state'])
sales_and_managers

Unnamed: 0,city,state,units,branch,branch_id,manager
0,Mendocino,CA,1,Mendocino,47.0,Brett
1,Denver,CO,4,Denver,20.0,Joel
2,Austin,TX,2,Austin,10.0,Charlers
3,Springfield,MO,5,Springfield,31.0,Sally
4,Springfield,IL,1,,,


## Merging DataFrames with outer join

You will try to merge the merged DataFrames on all matching keys (which computes an inner join by default). You can compare the result to an outer join and also to an outer join with restricted subset of columns as keys.

In [89]:
merge_default = pd.merge(sales_and_managers, revenue_and_sales) # inner as default
merge_default

Unnamed: 0,city,state,units,branch,branch_id,manager,revenue
0,Mendocino,CA,1,Mendocino,47.0,Brett,200.0
1,Denver,CO,4,Denver,20.0,Joel,83.0
2,Austin,TX,2,Austin,10.0,Charlers,100.0


One cool aspect of using an outer join is that, because it returns all rows from both merged tables and null where they do not match, you can use it to find rows that do not have a match in the other table. 

In [91]:
merge_outer = pd.merge(sales_and_managers, revenue_and_sales, how='outer')
merge_outer

Unnamed: 0,city,state,units,branch,branch_id,manager,revenue
0,Mendocino,CA,1,Mendocino,47.0,Brett,200.0
1,Denver,CO,4,Denver,20.0,Joel,83.0
2,Austin,TX,2,Austin,10.0,Charlers,100.0
3,Springfield,MO,5,Springfield,31.0,Sally,
4,Springfield,IL,1,,,,
5,Springfield,IL,1,,30.0,,4.0
6,Springfield,MO,5,,,,


In [92]:
merge_outer_on = pd.merge(sales_and_managers, revenue_and_sales, on=['city','state'],  how='outer')
merge_outer_on

Unnamed: 0,city,state,units_x,branch,branch_id_x,manager,branch_id_y,revenue,units_y
0,Mendocino,CA,1,Mendocino,47.0,Brett,47.0,200.0,1
1,Denver,CO,4,Denver,20.0,Joel,20.0,83.0,4
2,Austin,TX,2,Austin,10.0,Charlers,10.0,100.0,2
3,Springfield,MO,5,Springfield,31.0,Sally,,,5
4,Springfield,IL,1,,,,30.0,4.0,1


## Self join
Merging a table to itself can be useful when you want to compare values in a column to other values in the same column.

To avoid columns with suffixies, try setting the `suffixes=` to '_dir' and '_crew' for the left and right tables respectively.

In [3]:
crews = pd.DataFrame({'id': {0: 19995, 2: 19995, 4: 19995, 6: 19995, 7: 19995},
         'job': {0: 'Editor',2: 'Sound Designer',4: 'Casting',6: 'Director',7: 'Writer'},
         'name': {0: 'Stephen E. Rivkin',2: 'Christopher Boyes',4: 'Mali Finn',6: 'James Cameron',7: 'James Cameron'}})
crews

Unnamed: 0,id,job,name
0,19995,Editor,Stephen E. Rivkin
2,19995,Sound Designer,Christopher Boyes
4,19995,Casting,Mali Finn
6,19995,Director,James Cameron
7,19995,Writer,James Cameron


In [6]:
crews_self_merged = crews.merge(crews, on='id', how='inner', suffixes=('_dir','_crew'))
crews_self_merged.head()

Unnamed: 0,id,job_dir,name_dir,job_crew,name_crew
0,19995,Editor,Stephen E. Rivkin,Editor,Stephen E. Rivkin
1,19995,Editor,Stephen E. Rivkin,Sound Designer,Christopher Boyes
2,19995,Editor,Stephen E. Rivkin,Casting,Mali Finn
3,19995,Editor,Stephen E. Rivkin,Director,James Cameron
4,19995,Editor,Stephen E. Rivkin,Writer,James Cameron


Pandas treats a merge of a table to itself the same as any other merge. Therefore, it does not limit you from chaining multiple `.merge()` methods together.

# PRACTICE 1

Suppose you have two DataFrames: `students` (with columns 'StudentID', 'LastName', 'FirstName', and 'Major') and `midterm_results` (with columns 'StudentID', 'Q1', 'Q2', and 'Q3' for their scores on midterm questions).

You want to combine the DataFrames into a single DataFrame `grades`, and be able to easily spot which students wrote the midterm and which didn't (their midterm question scores 'Q1', 'Q2', & 'Q3' should be filled with NaN values).

You also want to drop rows from `midterm_results` in which the 'StudentID' is not found in `students`.

Which of the following strategies gives the desired result?

    1) grades = pd.merge(students, midterm_results, how='left')
    2) grades = pd.merge(students, midterm_results, how='right')
    3) grades = pd.merge(students, midterm_results, how='inner')
    4) grades = pd.merge(students, midterm_results, how='outer')

# PRACTICE 2

In [96]:
us = pd.read_csv('../data/23. Uniendo datafarmes/gdp_usa.csv', index_col='DATE', parse_dates=True)
us_annual = us.resample('A').last().pct_change(10).dropna()
us_annual

Unnamed: 0_level_0,VALUE
DATE,Unnamed: 1_level_1
1957-12-31,0.827507
1958-12-31,0.782686
1959-12-31,0.953137
1960-12-31,0.689354
1961-12-31,0.630959
1962-12-31,0.608342
1963-12-31,0.694179
1964-12-31,0.744691
1965-12-31,0.765875
1966-12-31,0.809885


In [97]:
china = pd.read_csv('../data/23. Uniendo datafarmes/gdp_china.csv', index_col='Year', parse_dates=True)
china_annual =  china.resample('A').last().pct_change(10).dropna()
china_annual

Unnamed: 0_level_0,GDP
Year,Unnamed: 1_level_1
1970-12-31,0.546128
1971-12-31,0.98886
1972-12-31,1.402472
1973-12-31,1.730085
1974-12-31,1.408556
1975-12-31,1.311927
1976-12-31,0.998271
1977-12-31,1.391842
1978-12-31,1.119941
1979-12-31,1.246687


In [98]:
# Concatenate china_annual and us_annual: gdp
gdp = pd.concat([china_annual, us_annual], join='inner', axis=1)
gdp

Unnamed: 0_level_0,GDP,VALUE
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
1970-12-31,0.546128,1.017187
1971-12-31,0.98886,1.05227
1972-12-31,1.402472,1.172566
1973-12-31,1.730085,1.258858
1974-12-31,1.408556,1.295246
1975-12-31,1.311927,1.284181
1976-12-31,0.998271,1.321715
1977-12-31,1.391842,1.455503
1978-12-31,1.119941,1.558705
1979-12-31,1.246687,1.623907


## Practice 3
In this exercise, stock prices in US Dollars for the S&P 500 in 2015 have been obtained from Yahoo Finance. The files `sp500.csv` for sp500 and `exchange.csv` for the exchange rates are both provided to you.

Using the daily exchange rate to Pounds Sterling, your task is to convert both the Open and Close column prices.

In [105]:
# Read 'sp500.csv' into a DataFrame: sp500
sp500 = pd.read_csv('../data/23. Uniendo datafarmes/yahoo.txt',index_col='Date',parse_dates=True)
dollars = sp500[['Open','Close']]
dollars

Unnamed: 0_level_0,Open,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-01-02,2058.899902,2058.199951
2015-01-05,2054.439941,2020.579956
2015-01-06,2022.150024,2002.609985
2015-01-07,2005.550049,2025.900024
2015-01-08,2030.609985,2062.139893
...,...,...
2015-12-24,2063.520020,2060.989990
2015-12-28,2057.770020,2056.500000
2015-12-29,2060.540039,2078.360107
2015-12-30,2077.340088,2063.360107


In [103]:
# Read 'exchange.csv' into a DataFrame: exchange
exchange = pd.read_csv('../data/23. Uniendo datafarmes/exchange.csv', index_col='Date', parse_dates=True)
exchange

Unnamed: 0_level_0,GBP/USD
Date,Unnamed: 1_level_1
2015-01-02,0.65101
2015-01-05,0.65644
2015-01-06,0.65896
2015-01-07,0.66344
2015-01-08,0.66151
...,...
2015-12-23,0.67285
2015-12-24,0.66926
2015-12-29,0.67597
2015-12-30,0.67427


In [107]:
pounds = pd.merge(dollars, exchange, how='inner', on='Date')
pounds

Unnamed: 0_level_0,Open,Close,GBP/USD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-01-02,2058.899902,2058.199951,0.65101
2015-01-05,2054.439941,2020.579956,0.65644
2015-01-06,2022.150024,2002.609985,0.65896
2015-01-07,2005.550049,2025.900024,0.66344
2015-01-08,2030.609985,2062.139893,0.66151
...,...,...,...
2015-12-23,2042.199951,2064.290039,0.67285
2015-12-24,2063.520020,2060.989990,0.66926
2015-12-29,2060.540039,2078.360107,0.67597
2015-12-30,2077.340088,2063.360107,0.67427
