# Merging Extra Practice

For these questions, you'll be using data from the Austin Animal Center, containing information on animal intakes and animal outcomes. The original sources of this data are [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/about_data).

Read the provided csv files into DataFrames named "intakes" and "outcomes".


In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [14]:
intakes = pd.DataFrame(pd.read_csv('../data/Austin_Animal_Center_Intakes.csv'))
outcomes = pd.DataFrame(pd.read_csv('../data/Austin_Animal_Center_Outcomes.csv'))

#### 0. Is the relationship between the intakes and outcomes tables one-to-one, many-to-one, one-to-many, or many-to-many?


In [18]:
intakes.sort_values('Animal ID')

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
3993,A006100,Scamp,12/19/2014 10:21:00 AM,12/19/2014 10:21:00 AM,8700 Research Blvd in Austin (TX),Public Assist,Normal,Dog,Neutered Male,7 years,Spinone Italiano Mix,Yellow/White
18662,A006100,Scamp,12/07/2017 02:07:00 PM,12/07/2017 02:07:00 PM,Colony Creek And Hunters Trace in Austin (TX),Stray,Normal,Dog,Neutered Male,10 years,Spinone Italiano Mix,Yellow/White
84304,A006100,Scamp,03/07/2014 02:26:00 PM,03/07/2014 02:26:00 PM,8700 Research in Austin (TX),Public Assist,Normal,Dog,Neutered Male,6 years,Spinone Italiano Mix,Yellow/White
65517,A047759,Oreo,04/02/2014 03:55:00 PM,04/02/2014 03:55:00 PM,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,10 years,Dachshund,Tricolor
89109,A134067,Bandit,11/16/2013 09:02:00 AM,11/16/2013 09:02:00 AM,12034 Research Blvd in Austin (TX),Public Assist,Injured,Dog,Neutered Male,16 years,Shetland Sheepdog,Brown/White
...,...,...,...,...,...,...,...,...,...,...,...,...
124113,A830173,,03/03/2021 03:59:00 PM,03/03/2021 03:59:00 PM,14912 Fagerquist Rd in Travis (TX),Stray,Normal,Dog,Intact Male,1 year,Cairn Terrier,Brown
124102,A830174,,03/03/2021 03:59:00 PM,03/03/2021 03:59:00 PM,14912 Fagerquist Rd in Travis (TX),Stray,Normal,Dog,Intact Female,1 month,Black Mouth Cur,Brown/Black
124117,A830180,Gigi,03/03/2021 04:31:00 PM,03/03/2021 04:31:00 PM,Austin (TX),Owner Surrender,Normal,Dog,Intact Female,9 years,Australian Cattle Dog/Belgian Malinois,Brown Brindle/White
124119,A830181,Nona,03/03/2021 04:31:00 PM,03/03/2021 04:31:00 PM,Austin (TX),Owner Surrender,Normal,Cat,Spayed Female,4 years,Domestic Shorthair Mix,White/Black


In [19]:
outcomes.sort_values('Animal ID')

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
94523,A006100,Scamp,12/07/2017 12:00:00 AM,12/07/2017 12:00:00 AM,07/09/2007,Return to Owner,,Dog,Neutered Male,10 years,Spinone Italiano Mix,Yellow/White
52735,A006100,Scamp,12/20/2014 04:35:00 PM,12/20/2014 04:35:00 PM,07/09/2007,Return to Owner,,Dog,Neutered Male,7 years,Spinone Italiano Mix,Yellow/White
107324,A006100,Scamp,03/08/2014 05:10:00 PM,03/08/2014 05:10:00 PM,07/09/2007,Return to Owner,,Dog,Neutered Male,6 years,Spinone Italiano Mix,Yellow/White
36406,A047759,Oreo,04/07/2014 03:12:00 PM,04/07/2014 03:12:00 PM,04/02/2004,Transfer,Partner,Dog,Neutered Male,10 years,Dachshund,Tricolor
75922,A134067,Bandit,11/16/2013 11:54:00 AM,11/16/2013 11:54:00 AM,10/16/1997,Return to Owner,,Dog,Neutered Male,16 years,Shetland Sheepdog,Brown/White
...,...,...,...,...,...,...,...,...,...,...,...,...
124455,A830112,,03/02/2021 04:37:00 PM,03/02/2021 04:37:00 PM,01/18/2021,Euthanasia,Suffering,Dog,Unknown,1 month,Pit Bull,Tan/White
124461,A830114,,03/02/2021 05:07:00 PM,03/02/2021 05:07:00 PM,03/02/2020,Euthanasia,Suffering,Other,Unknown,1 year,Fox,Brown/White
124465,A830138,,03/03/2021 10:49:00 AM,03/03/2021 10:49:00 AM,03/03/2019,Euthanasia,Rabies Risk,Other,Unknown,,Skunk,Black
124475,A830156,,03/03/2021 02:46:00 PM,03/03/2021 02:46:00 PM,03/03/2020,Euthanasia,Rabies Risk,Other,Unknown,,Raccoon,Black


- **The dataframes are very similarly shaped, but there are more values in the outcomes dataframe in most columns, so the dataframes have a one-to-many relationship if the intakes are on the left and outcomes are on the right.**

#### 1. The key identifier variable for this data is the Animal ID. Perform a merge to determine if there are any animal ids from the intakes data that do not appear in the outcomes data and vice versa. Think carefully about which column or columns you'd like to merge on.


#### 2. Sometimes, the column you want to merge on is the index of the DataFrame. For this problem, let's try merging on the index. Set the index of the intakes and outcomes dataframes to "Animal ID" (using the [set_index method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html) and then merge the two. Afterwards, you may want to set the index back to how it originally was (using the [reset_index method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html)).


#### 3. Merge the intakes and outcomes dataframe on just the Animal ID column. Notice that any other columns that are in common get a _x and _y suffix. Use the suffixes parameter to change this to _intake and _outcome. Are there are animal ids which have different Name values in the intake and outcomes DataFrames?


#### 4. Merging just on Animal ID doesn't necessarily match a given intake to its corresponding outcome. However, we can change the way that we merge to try to match these up correctly.


    a. We'll need to make use of the DateTime columns in order to make this merge work. Convert these columns to the datetime type.  


    b. Now, use the [merge_asof function](https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html) in order to match intake rows to their corresponding outcome rows. That is, each intake row should be matched with the outcome that is nearest to it in the future (still matched by Animal ID).  


    c. Are there any instance of an intake that doesn't have an outcome before the next intake? Keep only the last intake row for those cases.  


    d. Create a new column showing the time between intake and outcome. What does the distribution of these values look like? Does it vary by animal type?  


    e. Find all rows where the animal is intact at intake and spayed or neutered at outcome. What percentage of intact intakes result in a spay or neuter? 