# Joining Data with Pandas

**Description**
Being able to combine and work with multiple datasets is an essential skill for 
any aspiring Data Scientist. pandas is a crucial cornerstone of the Python data 
science ecosystem, with Stack Overflow recording 5 million views for pandas questions. 
Learn to handle multiple DataFrames by combining, organizing, joining, and reshaping 
them using pandas. You'll work with datasets from the World Bank and the City Of 
Chicago. You will finish the course with a solid skillset for data-joining in pandas.

## Data Merging Basics

Learn how you can merge disparate data using inner joins. By combining information 
from multiple sources you’ll uncover compelling insights that may have previously 
been hidden. You’ll also learn how the relationship between those sources, such 
as *one-to-one* or *one-to-many*, can affect your result.


### Inner Join

**Your first inner join**

You have been tasked with figuring out what the most popular types of fuel used in 
Chicago taxis are. To complete the analysis, you need to merge the taxi_owners and 
taxi_veh tables together on the vid column. You can then use the merged table along 
with the `.value_counts()` method to find the most common fuel_type.

Since you'll be working with pandas throughout the course, the package will be preloaded 
for you as pd in each exercise in this course. Also the taxi_owners and taxi_veh 
DataFrames are loaded for you.

In [43]:
import numpy as np
import pandas as pd

# Load data into DataFrames
taxi_owners = pd.read_pickle('taxi_owners.p')
print('Taxi Owners: \n',taxi_owners.sample(8))
print('Shape: ',taxi_owners.shape,'\n')

taxi_veh = pd.read_pickle('taxi_vehicles.p')
print('Taxi Vehicles: \n',taxi_veh.sample(8))
print('Shape: ',taxi_own_veh.shape)

Taxi Owners: 
         rid   vid                          owner                address    zip
1352  T3753  3753                   3753 EJM INC  4118 W. LAWRENCE AVE.  60630
2073  T3780  3780                 CITY TAXI INC.    4536 N. ELSTON AVE.  60630
1563  T2124  2124           ALEXIS CAB CO., INC.  2945 W. PETERSON AVE.  60659
462   T2991  2991                    KIM CAB INC  4626 W. CORNELIA AVE.  60641
2437  T6968  6968             B & B TAXI CAB INC    3351 W. ADDISON ST.  60618
2459   T232   232                 BABY CAB CORP.    2617 S. WABASH AVE.  60616
2803  T3979  3979                 3979 TAXI CORP    9696 W. FOSTER AVE.  60656
2516  T2599  2599  STEFANIE AND AUDREY CAB CORP.    9696 W. FOSTER AVE.  60656
Shape:  (3519, 5) 

Taxi Vehicles: 
        vid     make    model  year fuel_type                owner
2627  2668   TOYOTA    CAMRY  2015    HYBRID  CHICAGO CAB 2 CORP.
2573  1140     FORD   ESCAPE  2011    HYBRID          NECT 20 LLC
158   3059   TOYOTA    CAMRY  2014    H

In [45]:
# STEP 1: Merge taxi_owners with taxi_veh on the column vid, and save the result 
# to taxi_own_veh.
# Merge the taxi_owners and taxi_veh tables
taxi_own_veh = taxi_owners.merge(taxi_veh, on='vid')

# Print the column names of the taxi_own_veh
print(taxi_own_veh.columns,'\n')

# Print taxi owners and vehicles table
print('Taxi Owners and Vehicles:\n',taxi_own_veh.sample(8))
print('Shape: ',taxi_own_veh.shape)

Index(['rid', 'vid', 'owner_x', 'address', 'zip', 'make', 'model', 'year',
       'fuel_type', 'owner_y'],
      dtype='object') 

Taxi Owners and Vehicles:
         rid   vid  ...               fuel_type                              owner_y
2460  T3051  3051  ...                GASOLINE                        Z J A CAB INC
181   T4012  4012  ...                  HYBRID                      TRAVEL SURF INC
2821  T6747  6747  ...                  HYBRID  TAXI MEDALLION GROUP, LLC SERIES IV
3370  T1279  1279  ...  COMPRESSED NATURAL GAS                          NECT 11 LLC
1198  T5642  5642  ...                  HYBRID                          ASHRAF CORP
329   T6344  6344  ...                GASOLINE                       MYNEWLOVE INC.
2676  T2375  2375  ...                  HYBRID          CHICAGO MEDALLION NINE, LLC
1110  T2352  2352  ...                  HYBRID                   NEXTGEN TAXI I INC

[8 rows x 10 columns]
Shape:  (3519, 10)


In [46]:
# STEP 2; Set the left and right table suffixes for overlapping columns of the merge 
# to _own and _veh, respectively.

# Merge the taxi_owners and taxi_veh tables setting a suffix
taxi_own_veh = taxi_owners.merge(taxi_veh, on='vid', suffixes=('_own','_veh'))

# Print the column names of taxi_own_veh
print(taxi_own_veh.columns,'\n')

# Print taxi owners and vehicles table
print('Taxi Owners and Vehicles: \n',taxi_own_veh.sample(8))
print('Shape: ',taxi_own_veh.shape)

Index(['rid', 'vid', 'owner_own', 'address', 'zip', 'make', 'model', 'year',
       'fuel_type', 'owner_veh'],
      dtype='object') 

Taxi Owners and Vehicles: 
         rid   vid                owner_own  ...  year fuel_type                owner_veh
3377  T6002  6002             BRANDON INC.  ...  2017    HYBRID             BRANDON INC.
345   T3247  3247               Z L R CORP  ...  2015    HYBRID               Z L R CORP
2589   T714   714          JZG TRANSIT CO.  ...  2016    HYBRID          JZG TRANSIT CO.
1396  T4678  4678       TRIPS AND TIPS INC  ...  2014    HYBRID       TRIPS AND TIPS INC
1528   T384   384      MASHAALLAH HNZ CORP  ...  2018    HYBRID      MASHAALLAH HNZ CORP
437   T5410  5410  SANTORINI TWO CAB CORP.  ...  2016    HYBRID  SANTORINI TWO CAB CORP.
1342  T3477  3477            GHANA CAN INC  ...  2015    HYBRID            GHANA CAN INC
2527  T1787  1787               MAKENA INC  ...  2016    HYBRID               MAKENA INC

[8 rows x 10 columns]
Shape:  (3519

In [15]:
# STEP 3: Select the fuel_type column from taxi_own_veh and print the value_counts() 
# to find the most popular fuel_types used

# Merge the taxi_owners and taxi_veh tables setting a suffix
taxi_own_veh = taxi_owners.merge(taxi_veh, on='vid', suffixes=('_own','_veh'))

# Print the value_counts to find the most popular fuel_type
print(taxi_own_veh['fuel_type'].value_counts())

fuel_type
HYBRID                    2792
GASOLINE                   611
FLEX FUEL                   89
COMPRESSED NATURAL GAS      27
Name: count, dtype: int64


**Inner joins and number of rows returned**

All of the merges you have studied to this point are called inner joins. It is 
necessary to understand that inner joins only return the rows with matching 
values in both tables. You will explore this further by reviewing the merge 
between the wards and census tables, then comparing it to merges of copies of 
these tables that are slightly altered, named wards_altered, and census_altered. 
The first row of the wards column has been changed in the altered tables. You 
will examine how this affects the merge between them. The tables have been 
loaded for you.

For this exercise, it is important to know that the wards and census tables 
start with 50 rows.

In [48]:
# Load data into DataFrames
wards = pd.read_pickle('ward.p')
print('Wards: \n',wards.sample(8))
print('Shape: ',wards.shape,'\n')

# Load data into DataFrames
census = pd.read_pickle('census.p')
print('Census: \n',census.sample(8))
print('Shape: ',census.shape)

Wards: 
    ward               alderman                        address    zip
18   19      Matthew J. O'Shea     10400 SOUTH WESTERN AVENUE  60643
8     9       Anthony A. Beale            34 EAST 112TH PLACE  60628
17   18      Derrick G. Curtis        8359 SOUTH PULASKI ROAD  60652
40   41  Anthony V. Napolitano       7442 NORTH HARLEM AVENUE  60631
4     5     Leslie A. Hairston          2325 EAST 71ST STREET  60649
29   30      Ariel E. Reyboras    3559 NORTH MILWAUKEE AVENUE  60641
24   25   Daniel "Danny" Solis  1800 SOUTH BLUE ISLAND AVENUE  60608
38   39       Margaret Laurino      4404 WEST LAWRENCE AVENUE  60630
Shape:  (50, 4) 

Census: 
    ward  pop_2000  pop_2010 change                                 address    zip
35   36     63376     54766   -14%            2918 NORTH RUTHERFORD AVENUE  60634
1     2     54361     55805     3%                WM WASTE MANAGEMENT 1500  60622
24   25     55954     54539    -3%           1632-1746 SOUTH MILLER STREET  60608
15   16     50

In [16]:
# Load data into DataFrames
census = pd.read_pickle('ward.p')
print('Ward Census: \n',wards_census.sample(8),'\n')

Ward Census: 
    ward                alderman                           address    zip
24   25    Daniel "Danny" Solis     1800 SOUTH BLUE ISLAND AVENUE  60608
10   11  Patrick Daley Thompson         3659 SOUTH HALSTED STREET  60609
32   33            Deborah Mell        3001 WEST IRVING PARK ROAD  60618
28   29        Chris Taliaferro            6272 WEST NORTH AVENUE  60639
41   42          Brendan Reilly  325 WEST HURON STREET, SUITE 510  60654
42   43          Michelle Smith         2523 NORTH HALSTED STREET  60614
35   36        Gilbert Villegas                6934 WEST DIVERSEY  60607
16   17          David H. Moore         7313 SOUTH ASHLAND AVENUE  60636 



In [37]:
# Merge wards and census on the ward column and save the result to wards_census.
# Merge the wards and census tables on the ward column
wards_census = wards.merge(census, on='ward')

# Validate data
print(wards.head(),'\n')

# Print the shape of wards_census
print('wards_census table shape:', wards_census.shape)

  ward            alderman                          address    zip
0    1  Proco "Joe" Moreno        2058 NORTH WESTERN AVENUE  60647
1    2       Brian Hopkins       1400 NORTH  ASHLAND AVENUE  60622
2    3          Pat Dowell          5046 SOUTH STATE STREET  60609
3    4    William D. Burns  435 EAST 35TH STREET, 1ST FLOOR  60616
4    5  Leslie A. Hairston            2325 EAST 71ST STREET  60649 

wards_census table shape: (50, 9)


In [38]:
# Copying wards table and changing the first ward value
wards_altered = wards.copy()
wards_altered.loc[0,'ward'] = 63

# Print the first few rows of the wards_altered table to view the change 
print(wards_altered[['ward']].head())

  ward
0   63
1    2
2    3
3    4
4    5


In [39]:
# Merge the wards_altered and census tables on the ward column, and notice the 
# difference in returned rows.

# Merge the wards_altered and census tables on the ward column
wards_altered_census = wards_altered.merge(census, on='ward')

# Print the shape of wards_altered_census
print('wards_altered_census table shape:', wards_altered_census.shape)

wards_altered_census table shape: (49, 9)


### One-to-many Relationships

**One-to-many merge**

A business may have one or multiple owners. In this exercise, you will continue 
to gain experience with one-to-many merges by merging a table of business owners, 
called biz_owners, to the licenses table. Recall from the video lesson, with a 
one-to-many relationship, a row in the left table may be repeated if it is related 
to multiple rows in the right table. In this lesson, you will explore this further 
by finding out what is the most common business owner title. (i.e., secretary, CEO, 
or vice president)

The licenses and biz_owners DataFrames are loaded for you.

In [49]:
# Load data into DataFrames
licenses = pd.read_pickle('licenses.p')
print('Licenses: \n',licenses.sample(8))
print('Shape: ',licenses.shape,'\n')

# Load data into DataFrames
biz_owners = pd.read_pickle('business_owners.p')
print('Business Owners: \n',biz_owners.sample(8))
print('Shape: ',biz_owners.shape,'\n')

Licenses: 
      account ward  aid                    business                   address    zip
4716  310354   27  763  WASHINGTON FOOD MART, INC.  2100 W WASHINGTON BLVD 1  60612
7280  387360   25  895      ELLE NAILS SPA 1, LTD.          912 W MADISON ST  60607
1537  230053   15  NaN           SANCHEZ GROCERIES            2042 W 51ST ST  60609
9599   67559    8  NaN          DIALLO & TAILORING  8647 S COTTAGE GROVE AVE  60619
4080  294963   22  197           LA VILLITA TRAVEL          3851 W 26TH ST 1  60623
8352   47568   24  NaN       SAM'S MUFFLER & BRAKE   3818 W ROOSEVELT RD 1ST  60624
8488   50268   25  829                 BUTCH'S TAP        1801 W 19TH ST 1ST  60608
1067  212040   35  NaN       GRANITE GALLERY, INC.       3430 W HENDERSON ST  60618
Shape:  (10000, 6) 

Business Owners: 
       account first_name  last_name            title
6833   277997     SANDRA   WECHSLER            OTHER
1541    19580    IGNACIO      LOPEZ            OTHER
7104     2808      CHERI        N

In [50]:
# Merge the licenses and biz_owners table on account
licenses_owners = licenses.merge(biz_owners, on='account')
print('Shape: ',licenses_owners.shape)

# Group the results by title then count the number of accounts
counted_df = licenses_owners.groupby('title').agg({'account':'count'})

# Sort the counted_df in desending order
sorted_df = counted_df.sort_values('account',ascending=False)

# Use .head() method to print the first few rows of sorted_df
print(sorted_df.head())

Shape:  (19497, 9)
                 account
title                   
PRESIDENT           6259
SECRETARY           5205
SOLE PROPRIETOR     1658
OTHER               1200
VICE PRESIDENT       970


In [None]:
# Sort pop_vac_lic by vacant, account, andpop_2010 in descending, ascending, and 
# ascending order respectively. Save it as sorted_pop_vac_lic.

# Merge land_use and census and merge result with licenses including suffixes
land_cen_lic = land_use.merge(census, on='ward') \
                    .merge(licenses, on='ward', suffixes=('_cen','_lic'))

# Group by ward, pop_2010, and vacant, then count the # of accounts
pop_vac_lic = land_cen_lic.groupby(['ward','pop_2010','vacant'], 
                                   as_index=False).agg({'account':'count'})

# Sort pop_vac_lic and print the results
sorted_pop_vac_lic = pop_vac_lic.sort_values(['vacant','account','pop_2010'], 
                                             ascending=[False,True,True])

# Print the top few rows of sorted_pop_vac_lic
print(sorted_pop_vac_lic.head(20))

### Merging Multiple DataFrames

**Total riders in a month**

Your goal is to find the total number of rides provided to passengers passing 
through the Wilson station (`station_name == 'Wilson'`) when riding Chicago's 
public transportation system on weekdays (`day_type == 'Weekday'`) in July (`month == 7`).
Luckily, Chicago provides this detailed data, but it is in three different tables. 
You will work on merging these tables together to answer the question. This data is 
different from the business related data you have seen so far, but all the information 
you need to answer the question is provided.

The cal, ridership, and stations DataFrames have been loaded for you. 
The relationship between the tables can be seen in the diagram below.

<img title="Total Riders in a Month Relations" alt="Total Riders in a Month Relations" 
    src="Total Riders in a Month Relations.png">

In [52]:
# Load data into DataFrames
cal = pd.read_pickle('cta_calendar.p')
print('Calendar: \n',cal.sample(8))
print('Shape: ',cal.shape,'\n')

# Load data into DataFrames
ridership = pd.read_pickle('cta_ridership.p')
print('Ridership: \n',ridership.sample(8))
print('Shape: ',ridership.shape,'\n')

# Load data into DataFrames
stations = pd.read_pickle('cta_stations.p')
print('Stations: \n',stations.sample(8))
print('Shape: ',stations.shape)

Calendar: 
      year  month  day  day_type
91   2019      4    2   Weekday
102  2019      4   13  Saturday
301  2019     10   29   Weekday
297  2019     10   25   Weekday
273  2019     10    1   Weekday
307  2019     11    4   Weekday
219  2019      8    8   Weekday
76   2019      3   18   Weekday
Shape:  (365, 4) 

Ridership: 
      station_id  year  month  day  rides
643       40080  2019     10    6   2258
2233      41500  2019      2   13   2723
472       40080  2019      4   18   4590
1618      40540  2019      6    8   4566
1584      40540  2019      5    5   3167
1178      40120  2019      3   25   2854
3062      41660  2019      5   23  21757
2367      41500  2019      6   27   2702
Shape:  (3285, 5) 

Stations: 
     station_id    station_name                     location
11       40130            51st       (41.80209, -87.618487)
27       40290    Ashland/63rd       (41.77886, -87.663766)
19       40210    Damen-Cermak      (41.854517, -87.675975)
101      41090    Monroe/St

In [54]:
# Merge the ridership and cal tables together, starting with the ridership table 
# on the left and save the result to the variable ridership_cal. If you code takes 
# too long to run, your merge conditions might be incorrect.

# Merging 2 tables together using multiple key columns
ridership_cal = ridership.merge(cal, on=['year','month','day'])

print(ridership_cal.head())
print('Shape: ',ridership_cal.shape,'\n')

# Extend the previous merge to three tables by also merging the stations table.
ridership_cal_stations = ridership.merge(cal, on=['year','month','day']) \
            				.merge(stations, on='station_id')

print(ridership_cal_stations.head())
print('Shape: ',ridership_cal_stations.shape,'\n')

# Create a variable called filter_criteria to select the appropriate rows 
# from the merged table so that you can sum the rides column.

# Create a filter to filter ridership_cal_stations
filter_criteria = ((ridership_cal_stations['month'] == 7) 
                   & (ridership_cal_stations['day_type'] == 'Weekday') 
                   & (ridership_cal_stations['station_name'] == 'Wilson'))

# Use .loc and the filter to select for rides
print('Total number of rides: ',ridership_cal_stations.loc[filter_criteria, 'rides'].sum())

  station_id  year  month  day  rides        day_type
0      40010  2019      1    1    576  Sunday/Holiday
1      40010  2019      1    2   1457         Weekday
2      40010  2019      1    3   1543         Weekday
3      40010  2019      1    4   1621         Weekday
4      40010  2019      1    5    719        Saturday
Shape:  (3285, 6) 

  station_id  year  month  ...        day_type        station_name                 location
0      40010  2019      1  ...  Sunday/Holiday  Austin-Forest Park  (41.870851, -87.776812)
1      40010  2019      1  ...         Weekday  Austin-Forest Park  (41.870851, -87.776812)
2      40010  2019      1  ...         Weekday  Austin-Forest Park  (41.870851, -87.776812)
3      40010  2019      1  ...         Weekday  Austin-Forest Park  (41.870851, -87.776812)
4      40010  2019      1  ...        Saturday  Austin-Forest Park  (41.870851, -87.776812)

[5 rows x 8 columns]
Shape:  (3285, 8) 

Total number of rides:  140005


**Three table merge**

To solidify the concept of a three DataFrame merge, practice another exercise. A 
reasonable extension of our review of Chicago business data would include looking 
at demographics information about the neighborhoods where the businesses are. A 
table with the median income by zip code has been provided to you. You will merge 
the `licenses` and `wards` tables with this new income-by-zip-code table called `zip_demo`.

The licenses, wards, and zip_demo DataFrames have been loaded for you.

In [56]:
# Load data into DataFrames
zip_demo = pd.read_pickle('zip_demo.p')
print('Stations: \n',zip_demo.sample(8))
print('Shape: ',zip_demo.shape)

Stations: 
       zip  income
26  60657   88708
53  60173   79024
40  60655   94524
12  60659   50554
56  60653   28411
22  60638   67045
23  60623   31445
48  60661  104714
Shape:  (66, 2)


In [57]:
# Starting with the licenses table, merge to it the zip_demo table on 
# the zip column. Then merge the resulting table to the wards table on 
# the ward column. Save result of the three merged tables to a variable 
# named licenses_zip_ward.

# Merge licenses and zip_demo, on zip; and merge the wards on ward
licenses_zip_ward = licenses.merge(zip_demo, on='zip').merge(wards, on='ward')
print(licenses_zip_ward.sample(5))
print('Shape: ',licenses_zip_ward.shape,'\n')

# Group the results of the three merged tables by the column alderman and 
# find the median income.

# Print the results by alderman and show median income
print(licenses_zip_ward.groupby('alderman').agg({'income':'median'}))

     account ward  aid  ...             alderman                         address_y  zip_y
8669    5265    9  763  ...     Anthony A. Beale               34 EAST 112TH PLACE  60628
3783  287775   42  NaN  ...       Brendan Reilly  325 WEST HURON STREET, SUITE 510  60654
1444  222383   28  NaN  ...       Jason C. Ervin             2602 WEST 16TH STREET  60612
5421  328992   27  942  ...  Walter Burnett, Jr.            4 NORTH WESTERN AVENUE  60612
3125  273046   15  NaN  ...     Raymond A. Lopez             1650 WEST 63RD STREET  60636

[5 rows x 10 columns]
Shape:  (9994, 10) 

                             income
alderman                           
Ameya Pawar                 66246.0
Anthony A. Beale            38206.0
Anthony V. Napolitano       82226.0
Ariel E. Reyboras           41307.0
Brendan Reilly             110215.0
Brian Hopkins               87143.0
Carlos Ramirez-Rosa         66246.0
Carrie M. Austin            38206.0
Chris Taliaferro            55566.0
Daniel "Danny" Solis

**One-to-many merge with multiple tables**

In this exercise, assume that you are looking to start a business in the city of 
Chicago. Your perfect idea is to start a company that uses goats to mow the lawn 
for other businesses. However, you have to choose a location in the city to put 
your goat farm. You need a location with a great deal of space and relatively few 
businesses and people around to avoid complaints about the smell. You will need 
to merge three tables to help you choose your location. The `land_use` table has 
info on the percentage of vacant land by city ward. The `census` table has population 
by `ward`, and the `licenses` table lists businesses by ward.

The land_use, census, and licenses tables have been loaded for you.

In [58]:
# Load data into DataFrames
land_use = pd.read_pickle('land_use.p')
print('Land Use: \n',land_use.sample(8))
print('Shape: ',land_use.shape)

Land Use: 
    ward  residential  commercial  industrial  vacant  other
19   20           23           2           3      15     57
32   33           42           5           4       1     48
42   43           34           9           0       1     56
25   26           36           5           4       2     53
12   13           49           3           2       1     45
43   44           41           6           0       0     53
3     4           22          13           0       7     58
30   31           46           8           8       0     38
Shape:  (50, 6)


In [60]:
# Merge land_use and census on the ward column. Merge the result of this with 
# licenses on the ward column, using the suffix _cen for the left table and _lic 
# for the right table. Save this to the variable land_cen_lic.

# Merge land_use and census and merge result with licenses including suffixes
land_cen_lic = land_use.merge(census, 
                              on='ward').merge(licenses, 
                                               on='ward', 
                                               suffixes=('_cen','_lic'))

# Group by ward, pop_2010, and vacant, then count the # of accounts
pop_vac_lic = land_cen_lic.groupby(['ward','pop_2010','vacant'],
                                   as_index=False).agg({'account':'count'})

# Sort pop_vac_lic and print the results
sorted_pop_vac_lic = pop_vac_lic.sort_values(['vacant','account','pop_2010'], 
                                             ascending=[False,True,True])

# Print the top few rows of sorted_pop_vac_lic
print(sorted_pop_vac_lic.head())

   ward  pop_2010  vacant  account
47    7     51581      19       80
12   20     52372      15      123
1    10     51535      14      130
16   24     54909      13       98
7    16     51954      13      156


## Merging Tables With Different Join Types

Take your knowledge of joins to the next level. In this chapter, you’ll work with 
TMDb movie data as you learn about left, right, and outer joins. You’ll also discover 
how to merge a table to itself and merge on a DataFrame index.


### Left Join

**Counting missing rows with left join**

The Movie Database is supported by volunteers going out into the world, collecting 
data, and entering it into the database. This includes financial data, such as 
movie budget and revenue. If you wanted to know which movies are still missing 
data, you could use a left join to identify them. Practice using a left join by 
merging the movies table and the financials table.

The movies and financials tables have been loaded for you.

In [61]:
# Load data into DataFrames
movies = pd.read_pickle('mov_movies.p')
print('Movies: \n',movies.sample(8))
print('Shape: ',movies.shape,'\n')

financials = pd.read_pickle('mov_financials.p')
print('Movie Financials: \n',financials.sample(8))
print('Shape: ',financials.shape)

Movies: 
           id                        title  popularity release_date
4723   43074                 Ghostbusters   66.218060   2016-07-14
2001    9339                        Click   41.176631   2006-06-22
2770   16727                 The Namesake    3.604863   2006-09-02
1153  125537  Smiling Fish & Goat On Fire    0.007340   1999-09-16
4479    4380              Shall We Dance?   14.231899   2004-10-15
3593   11876     The Horseman on the Roof    2.877488   1995-09-20
4660   26748                    Lone Star    5.960149   1996-06-21
2199    6537                The Orphanage   29.071955   2007-08-27
Shape:  (4803, 4) 

Movie Financials: 
          id    budget      revenue
2378  38448  12000000    1644755.0
924    2252  51500000   55112356.0
2319  36691  15000000    5024782.0
1657  10030  25000000   59192128.0
410    8247  85000000  222231186.0
1375  24662  32000000    3566637.0
2916  11697   3200000    8000000.0
2260  10876  13500000    7060876.0
Shape:  (3229, 3)


### Other Join

TBD

### merging a table to itself

TBD

### Merging on indexes

TBD

## Advanced Merging and Concatenating

In this chapter, you’ll leverage powerful filtering techniques, including semi-joins and anti-joins. You’ll also learn how to glue DataFrames by vertically combining and using the pandas.concat function to create new datasets. Finally, because data is rarely clean, you’ll also learn how to validate your newly combined data structures.


### Filtering Joins

TBD

### Concatenate DataFrames Together Vertically

TBD

### Verifying Integrity

TBD

## Merging Ordered and Time-Series Data

In this final chapter, you’ll step up a gear and learn to apply pandas' specialized methods for merging time-series and ordered data together with real-world financial and economic data from the city of Chicago. You’ll also learn how to query resulting tables using a SQL-style format, and unpivot data using the melt method.

### Using `.merge_ordered()`

TBD

### Using `.merge_asof()`

TBD

### Selecting Data with `.query()`

TBD

### Reshaping Data with `.melt()`

TBD