# Austin Shelter Wrangle Notes <a name='top'></a>

This notebook contains notes and code to develop the `final_adoption_report` for the <a href='https://github.com/stephenfitzsimon/pet_adoption_project'>Austin Shelter Pet Outcomes</a> project.

- two tables are downloaded
- duplicate `animal_id` rows are dropped
- columns dropped: `animal_id_i`
- datetime columns are transformed to month-year
- animal_type, breed, and color were consistent, dropped the intake column
- name column had nulls that appeared as mismatched.  replaced with string
- drop 15 NaN outcome_types
    - outcome_subtype NaN are replaced with no subtype
    - SCRP is Stray Cat Release Program
- drop nulls for sex_outcome, age_outcome, sex_intake
- converted age_at_intake and age_at_outcome into days

### Contents

1. <a href='#download'>Getting data from the internet</a>
2. <a href='#joining'>Joining the table data</a>
3. <a href='#datetime'>Handling the datetime columns </a>
4. <a href='#integrity'>Checking data integrity of select columns </a>
5. <a href='#scripts'>Testing the scripts</a>

In [1]:
import os
import requests
import pandas as pd
from sodapy import Socrata

## Getting the data from the internet <a name='download'></a>

Write a function that downloads the data from the internet.  Use the <a href='https://dev.socrata.com/'>Socrata Open Data API.</a>

In [2]:
def download_data():
    """
    Returns the pet outcome and pet intake dataframes from the SODA
    """
    client = Socrata("data.austintexas.gov", None)
    results_outcome = client.get("9t4d-g238", limit=200_000)
    results_intake = client.get("wter-evkm", limit=200_000)

    # Convert to pandas DataFrame
    df_outcome = pd.DataFrame.from_records(results_outcome)
    df_intake = pd.DataFrame.from_records(results_intake)
    return df_outcome, df_intake

#df_o, df_i = download_data()

In [3]:
# df_i

Use the function to check for a `.csv` file. Allow for the user to force a url query

In [4]:
def get_pet_data(query_url = False):
    file_o = 'pet_outcomes.csv'
    file_i = 'pet_intake.csv'
    if os.path.isfile(file_o) and os.path.isfile(file_i) and not query_url:
        #return dataframe from file
        print('Returning saved csv files.')
        df_o = pd.read_csv(file_o).drop(columns = ['Unnamed: 0'])
        df_i = pd.read_csv(file_i).drop(columns = ['Unnamed: 0'])
        return df_o, df_i
    else:
        print('Getting data from url...')
        df_o, df_i = download_data()
        print('Saving to .csv files...')
        df_o.to_csv(file_o)
        df_i.to_csv(file_i)
        print('Returned dataframes.')
        return df_o, df_i

#df_o, df_i = get_pet_data(query_url=True)

In [5]:
# df_i

In [6]:
# df_o

In [7]:
df_o, df_i = get_pet_data()

Returning saved csv files.


All the data looks like it's here

<a href='#top'>Back to Top</a>

## Joining the table data <a name='joining'></a>

In [8]:
df_o.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 141170 entries, 0 to 141169
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   animal_id         141170 non-null  object
 1   name              99536 non-null   object
 2   datetime          141170 non-null  object
 3   monthyear         141170 non-null  object
 4   date_of_birth     141170 non-null  object
 5   outcome_type      141148 non-null  object
 6   animal_type       141170 non-null  object
 7   sex_upon_outcome  141169 non-null  object
 8   age_upon_outcome  141146 non-null  object
 9   breed             141170 non-null  object
 10  color             141170 non-null  object
 11  outcome_subtype   64771 non-null   object
dtypes: object(12)
memory usage: 12.9+ MB


In [9]:
df_i.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 141303 entries, 0 to 141302
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   animal_id         141303 non-null  object
 1   name              99619 non-null   object
 2   datetime          141303 non-null  object
 3   datetime2         141303 non-null  object
 4   found_location    141303 non-null  object
 5   intake_type       141303 non-null  object
 6   intake_condition  141303 non-null  object
 7   animal_type       141303 non-null  object
 8   sex_upon_intake   141302 non-null  object
 9   age_upon_intake   141303 non-null  object
 10  breed             141303 non-null  object
 11  color             141303 non-null  object
dtypes: object(12)
memory usage: 12.9+ MB


First join the tables .  Start with adding `_i` for intake to all the intake columns to identify

In [10]:
def rename_intake(df):
    return df.add_suffix('_i')

df_i = rename_intake(df_i)

Now join the tables on `animal_id`.  There are repetitions in the `animal_id` columns.  These represent $\approx 20,000$ rows.  Simply drop them and join the two tables with an inner join.

In [11]:
df_i.animal_id_i.value_counts()

A721033    33
A718223    14
A718877    12
A706536    11
A700407     9
           ..
A785847     1
A785845     1
A785952     1
A785955     1
A521520     1
Name: animal_id_i, Length: 126412, dtype: int64

In [12]:
df_o[df_o['animal_id']=='A700407']

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,outcome_subtype
9760,A700407,Beaux,2021-09-19T15:59:00.000,2021-09-19T15:59:00.000,2014-07-13T00:00:00.000,Adoption,Dog,Neutered Male,7 years,Labrador Retriever Mix,Black,
29139,A700407,Beaux,2019-11-29T18:21:00.000,2019-11-29T18:21:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,5 years,Labrador Retriever Mix,Black,
29353,A700407,Beaux,2019-11-25T16:38:00.000,2019-11-25T16:38:00.000,2014-07-13T00:00:00.000,Rto-Adopt,Dog,Neutered Male,5 years,Labrador Retriever Mix,Black,
45434,A700407,Beaux,2019-02-16T15:41:00.000,2019-02-16T15:41:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,4 years,Labrador Retriever Mix,Black,
62099,A700407,Beaux,2018-02-24T15:43:00.000,2018-02-24T15:43:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,3 years,Labrador Retriever Mix,Black,
80746,A700407,Beaux,2017-01-25T18:09:00.000,2017-01-25T18:09:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,2 years,Labrador Retriever Mix,Black,
97189,A700407,Beaux,2016-02-25T16:45:00.000,2016-02-25T16:45:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,1 year,Labrador Retriever Mix,Black,
110732,A700407,Beaux,2015-06-07T18:16:00.000,2015-06-07T18:16:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,10 months,Labrador Retriever Mix,Black,
113305,A700407,Beaux,2015-04-24T17:53:00.000,2015-04-24T17:53:00.000,2014-07-13T00:00:00.000,Return to Owner,Dog,Neutered Male,9 months,Labrador Retriever Mix,Black,


In [13]:
df_i[df_i['animal_id_i']=='A700407']

Unnamed: 0,animal_id_i,name_i,datetime_i,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i
13782,A700407,Beaux,2021-06-17T11:10:00.000,2021-06-17T11:10:00.000,Cameron And Saint Johns in Austin (TX),Stray,Normal,Dog,Neutered Male,6 years,Labrador Retriever Mix,Black
29069,A700407,Beaux,2019-11-29T16:57:00.000,2019-11-29T16:57:00.000,906 East Leslie Circle in Austin (TX),Stray,Normal,Dog,Neutered Male,5 years,Labrador Retriever Mix,Black
30034,A700407,Beaux,2019-11-09T12:59:00.000,2019-11-09T12:59:00.000,St.Johns And Blessing Avenue in Austin (TX),Stray,Normal,Dog,Neutered Male,5 years,Labrador Retriever Mix,Black
45694,A700407,Beaux,2019-02-16T10:41:00.000,2019-02-16T10:41:00.000,Coronado Hills in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Labrador Retriever Mix,Black
62812,A700407,Beaux,2018-02-16T17:35:00.000,2018-02-16T17:35:00.000,5800 Wellington in Austin (TX),Stray,Normal,Dog,Neutered Male,3 years,Labrador Retriever Mix,Black
81143,A700407,Beaux,2017-01-24T18:17:00.000,2017-01-24T18:17:00.000,1111 Rutland Drive in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Labrador Retriever Mix,Black
97697,A700407,Beaux,2016-02-22T13:21:00.000,2016-02-22T13:21:00.000,1807 W Rundberg in Austin (TX),Stray,Normal,Dog,Neutered Male,1 year,Labrador Retriever Mix,Black
111299,A700407,Beaux,2015-05-30T11:01:00.000,2015-05-30T11:01:00.000,Austin (TX),Stray,Normal,Dog,Neutered Male,10 months,Labrador Retriever Mix,Black
114188,A700407,Beaux,2015-04-13T16:20:00.000,2015-04-13T16:20:00.000,6501 Ridge Oak in Austin (TX),Stray,Normal,Dog,Intact Male,9 months,Labrador Retriever Mix,Black


In [14]:
df_o.animal_id.value_counts()

A721033    33
A718223    14
A718877    12
A706536    11
A700407     9
           ..
A785882     1
A783829     1
A784017     1
A785742     1
A659834     1
Name: animal_id, Length: 126274, dtype: int64

In [15]:
def join_tables(df_o, df_i):
    df_i = df_i.drop_duplicates(subset='animal_id_i', keep=False)
    df_o = df_o.drop_duplicates(subset='animal_id', keep=False)
    df = df_o.merge(df_i, how='inner', left_on='animal_id', right_on='animal_id_i')
    df = df.drop(columns=['animal_id_i'])
    return df

df = join_tables(df_o, df_i)

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113984 entries, 0 to 113983
Data columns (total 23 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   animal_id           113984 non-null  object
 1   name                73216 non-null   object
 2   datetime            113984 non-null  object
 3   monthyear           113984 non-null  object
 4   date_of_birth       113984 non-null  object
 5   outcome_type        113969 non-null  object
 6   animal_type         113984 non-null  object
 7   sex_upon_outcome    113983 non-null  object
 8   age_upon_outcome    113960 non-null  object
 9   breed               113984 non-null  object
 10  color               113984 non-null  object
 11  outcome_subtype     59084 non-null   object
 12  name_i              73216 non-null   object
 13  datetime_i          113984 non-null  object
 14  datetime2_i         113984 non-null  object
 15  found_location_i    113984 non-null  object
 16  in

<a href='#top'>Back to top</a>
## Function to produce single dataframe <a name='make_dataframe'></a>

In [16]:
def get_pet_dataframe():
    df_o, df_i = get_pet_data()
    df_i = rename_intake(df_i)
    df = join_tables(df_o, df_i)
    return df

df = get_pet_dataframe()

Returning saved csv files.


<a href='#top'>Back to top</a>
## Handling the datetime columns <a name='datetime'></a>

first convert to datetime dtype

In [17]:
df[['animal_id', 'datetime', 'monthyear', 'datetime_i', 'datetime2_i']]

Unnamed: 0,animal_id,datetime,monthyear,datetime_i,datetime2_i
0,A859339,2022-06-29T13:37:00.000,2022-06-29T13:37:00.000,2022-06-12T14:45:00.000,2022-06-12T14:45:00.000
1,A860179,2022-06-29T12:36:00.000,2022-06-29T12:36:00.000,2022-06-23T17:57:00.000,2022-06-23T17:57:00.000
2,A860475,2022-06-29T12:13:00.000,2022-06-29T12:13:00.000,2022-06-29T08:05:00.000,2022-06-29T08:05:00.000
3,A860471,2022-06-29T11:46:00.000,2022-06-29T11:46:00.000,2022-06-29T07:56:00.000,2022-06-29T07:56:00.000
4,A860434,2022-06-29T11:32:00.000,2022-06-29T11:32:00.000,2022-06-28T15:42:00.000,2022-06-28T15:42:00.000
...,...,...,...,...,...
113979,A664258,2013-10-01T12:27:00.000,2013-10-01T12:27:00.000,2013-10-01T11:15:00.000,2013-10-01T11:15:00.000
113980,A648744,2013-10-01T12:27:00.000,2013-10-01T12:27:00.000,2013-10-01T11:15:00.000,2013-10-01T11:15:00.000
113981,A664236,2013-10-01T10:44:00.000,2013-10-01T10:44:00.000,2013-10-01T08:33:00.000,2013-10-01T08:33:00.000
113982,A664237,2013-10-01T10:44:00.000,2013-10-01T10:44:00.000,2013-10-01T08:33:00.000,2013-10-01T08:33:00.000


In [18]:
df['datetime'] = pd.to_datetime(df['datetime'])
df['monthyear'] = pd.to_datetime(df['monthyear'])
df['dateime_i'] = pd.to_datetime(df['datetime_i'])
df['datetime2_i'] = pd.to_datetime(df['datetime2_i'])

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113984 entries, 0 to 113983
Data columns (total 24 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   animal_id           113984 non-null  object        
 1   name                73216 non-null   object        
 2   datetime            113984 non-null  datetime64[ns]
 3   monthyear           113984 non-null  datetime64[ns]
 4   date_of_birth       113984 non-null  object        
 5   outcome_type        113969 non-null  object        
 6   animal_type         113984 non-null  object        
 7   sex_upon_outcome    113983 non-null  object        
 8   age_upon_outcome    113960 non-null  object        
 9   breed               113984 non-null  object        
 10  color               113984 non-null  object        
 11  outcome_subtype     59084 non-null   object        
 12  name_i              73216 non-null   object        
 13  datetime_i          113984 no

It doesn't look like there are any mis-matches in the two tables

In [20]:
df[df['datetime'] != df['monthyear']]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i,dateime_i


In [21]:
df[df['datetime_i'] != df['datetime2_i']]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i,dateime_i


Therefore, make two columns: `outcome_date` and `intake_date` that contains the day of each

In [22]:
def make_date_columns(df):
    df['datetime'] = pd.to_datetime(df['datetime'])
    df['monthyear'] = pd.to_datetime(df['monthyear'])
    df['datetime_i'] = pd.to_datetime(df['datetime_i'])
    df['datetime2_i'] = pd.to_datetime(df['datetime2_i'])
    df['outcome_date'] = df['monthyear'].dt.strftime('%m %d, %Y')
    df['intake_date'] = df['monthyear'].dt.strftime('%m %d, %Y')
    df = df.drop(columns = ['datetime', 'monthyear', 'datetime_i', 'datetime2_i'])
    return df

make_date_columns(get_pet_dataframe()).info()

Returning saved csv files.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 113984 entries, 0 to 113983
Data columns (total 21 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   animal_id           113984 non-null  object
 1   name                73216 non-null   object
 2   date_of_birth       113984 non-null  object
 3   outcome_type        113969 non-null  object
 4   animal_type         113984 non-null  object
 5   sex_upon_outcome    113983 non-null  object
 6   age_upon_outcome    113960 non-null  object
 7   breed               113984 non-null  object
 8   color               113984 non-null  object
 9   outcome_subtype     59084 non-null   object
 10  name_i              73216 non-null   object
 11  found_location_i    113984 non-null  object
 12  intake_type_i       113984 non-null  object
 13  intake_condition_i  113984 non-null  object
 14  animal_type_i       113984 non-null  object
 15  sex_upon_intake_i   1139

<a href='#top'>Back to top</a>

## Checking data integrity of select columns <a name='integrity'></a>

all the animal types are the same

In [23]:
df[df['animal_type'] != df['animal_type_i']]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i,dateime_i


Mis-matched names are `NaN`.

In [24]:
df[df['name'] != df['name_i']][['name', 'name_i']].value_counts(dropna=False)

name  name_i
NaN   NaN       40768
dtype: int64

All the breeds are consistent

In [25]:
df[df['breed'] != df['breed_i']]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i,dateime_i


color is consistent

In [26]:
df[df['color'] != df['color_i']]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i,dateime_i


<a href='#top'>Back to top</a>

## Considering nulls <a name='nulls'></a>

name column can be filled with string:

In [27]:
df[df['name'] != df['name_i']][['name', 'name_i']].value_counts(dropna=False)

name  name_i
NaN   NaN       40768
dtype: int64

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113984 entries, 0 to 113983
Data columns (total 24 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   animal_id           113984 non-null  object        
 1   name                73216 non-null   object        
 2   datetime            113984 non-null  datetime64[ns]
 3   monthyear           113984 non-null  datetime64[ns]
 4   date_of_birth       113984 non-null  object        
 5   outcome_type        113969 non-null  object        
 6   animal_type         113984 non-null  object        
 7   sex_upon_outcome    113983 non-null  object        
 8   age_upon_outcome    113960 non-null  object        
 9   breed               113984 non-null  object        
 10  color               113984 non-null  object        
 11  outcome_subtype     59084 non-null   object        
 12  name_i              73216 non-null   object        
 13  datetime_i          113984 no

There are only fifteen `NaN`.  These can be dropped safely

In [29]:
df.outcome_type.value_counts(dropna=False)

Adoption           49533
Transfer           37752
Return to Owner    15252
Euthanasia          8915
Died                1276
Disposal             628
Rto-Adopt            532
Missing               55
Relocate              25
NaN                   15
Lost                   1
Name: outcome_type, dtype: int64

Replace outcome_subtype NaN's with no subtype. Consider the cross tab for 

In [30]:
df.outcome_subtype.value_counts(dropna=False)

NaN                    54900
Partner                31386
Foster                 11019
Rabies Risk             4093
Suffering               3463
SCRP                    2941
Snr                     2832
In Kennel                669
Out State                570
Aggressive               414
Offsite                  338
Medical                  318
In Foster                315
At Vet                   282
Behavior                 125
Enroute                   90
Field                     79
Underage                  36
In Surgery                27
Court/Investigation       25
Prc                       11
In State                  11
Customer S                11
Barn                      10
Possible Theft             7
Emer                       6
Emergency                  6
Name: outcome_subtype, dtype: int64

In [31]:
pd.crosstab(df.outcome_subtype, df.outcome_type)

outcome_type,Adoption,Died,Euthanasia,Missing,Return to Owner,Transfer
outcome_subtype,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aggressive,0,0,414,0,0,0
At Vet,0,90,192,0,0,0
Barn,3,0,0,0,0,7
Behavior,0,0,125,0,0,0
Court/Investigation,0,0,25,0,0,0
Customer S,0,0,0,0,11,0
Emer,0,0,0,0,0,6
Emergency,0,6,0,0,0,0
Enroute,0,90,0,0,0,0
Field,0,0,0,0,79,0


SCRP is the Stray Cat Return Program.  A spay and release program for cats.

In [32]:
df[df.outcome_subtype == 'SCRP'].animal_type.value_counts()

Cat    2941
Name: animal_type, dtype: int64

Drop the one NaN

In [33]:
df.sex_upon_outcome.value_counts(dropna=False)

Neutered Male    36136
Spayed Female    33915
Intact Female    16435
Intact Male      16331
Unknown          11166
NaN                  1
Name: sex_upon_outcome, dtype: int64

In [34]:
df.age_upon_outcome.value_counts(dropna=False)

1 year       18689
2 years      16956
2 months     16000
3 months      6079
1 month       5749
3 years       5734
4 months      3969
4 years       3215
5 years       3141
5 months      2922
6 months      2878
3 weeks       2383
2 weeks       2354
6 years       2063
4 weeks       2008
8 years       1937
7 years       1810
8 months      1738
10 years      1694
10 months     1494
7 months      1388
9 months      1070
1 weeks       1038
9 years       1019
12 years       848
1 week         794
11 years       600
11 months      565
13 years       534
3 days         436
2 days         402
14 years       391
1 day          343
15 years       335
6 days         262
4 days         255
0 years        205
5 days         173
5 weeks        152
16 years       143
17 years        84
18 years        52
19 years        25
NaN             24
20 years        21
22 years         6
28 years         1
30 years         1
23 years         1
21 years         1
24 years         1
25 years         1
Name: age_up

In [35]:
df.sex_upon_intake_i.value_counts(dropna=False)

Intact Male      40623
Intact Female    39824
Neutered Male    11844
Unknown          11166
Spayed Female    10526
NaN                  1
Name: sex_upon_intake_i, dtype: int64

## Function to fill nulls and drop nulls

In [36]:
def null_fill_and_drop(df):
    df.name = df.name.fillna('no name')
    df.outcome_subtype = df.outcome_subtype.fillna('no subtype')
    df = df.drop(columns=['name_i', 'breed_i', 'color_i', 'animal_type_i'])
    df = df.dropna()
    return df

df = null_fill_and_drop(get_pet_dataframe())

Returning saved csv files.


In [37]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113944 entries, 0 to 113983
Data columns (total 19 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   animal_id           113944 non-null  object
 1   name                113944 non-null  object
 2   datetime            113944 non-null  object
 3   monthyear           113944 non-null  object
 4   date_of_birth       113944 non-null  object
 5   outcome_type        113944 non-null  object
 6   animal_type         113944 non-null  object
 7   sex_upon_outcome    113944 non-null  object
 8   age_upon_outcome    113944 non-null  object
 9   breed               113944 non-null  object
 10  color               113944 non-null  object
 11  outcome_subtype     113944 non-null  object
 12  datetime_i          113944 non-null  object
 13  datetime2_i         113944 non-null  object
 14  found_location_i    113944 non-null  object
 15  intake_type_i       113944 non-null  object
 16  in

<a href='#top'>Back to top</a>

## Fixing the age columns <a name='age'></a>

The age column contains a variety of strings that can be turned into a number.  The smallest unit present is days.  Therefore convert all into days. 



In [38]:
df[['age_upon_outcome']].value_counts()[:5]

age_upon_outcome
1 year              18684
2 years             16953
2 months            16000
3 months             6079
1 month              5747
dtype: int64

In [39]:
def convert_age_column(df):
    new_data = []
    multipliers = {
        'day': 1,
        'days': 1,
        'week':7,
        'weeks':7,
        'month': 30.5,
        'months':30.5,
        'year':365.25,
        'years':365.25
    }
    for i, row in df.iterrows():
        outcome_age_split = row['age_upon_outcome'].split()
        outcome_age_calc = int(outcome_age_split[0])*multipliers[outcome_age_split[1]]
        intake_age_split = row['age_upon_intake_i'].split()
        intake_age_calc = int(intake_age_split[0])*multipliers[intake_age_split[1]]
        datum_calc = {
            'animal_id':row['animal_id'],
            'age_at_outcome':outcome_age_calc,
            'age_at_intake':intake_age_calc
        }
        new_data.append(datum_calc)
    df_calc = pd.DataFrame(new_data)
    df = df.merge(df_calc)
    return df.drop(columns = ['age_upon_outcome', 'age_upon_intake_i'])

df = convert_age_column(df)

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113944 entries, 0 to 113943
Data columns (total 19 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   animal_id           113944 non-null  object 
 1   name                113944 non-null  object 
 2   datetime            113944 non-null  object 
 3   monthyear           113944 non-null  object 
 4   date_of_birth       113944 non-null  object 
 5   outcome_type        113944 non-null  object 
 6   animal_type         113944 non-null  object 
 7   sex_upon_outcome    113944 non-null  object 
 8   breed               113944 non-null  object 
 9   color               113944 non-null  object 
 10  outcome_subtype     113944 non-null  object 
 11  datetime_i          113944 non-null  object 
 12  datetime2_i         113944 non-null  object 
 13  found_location_i    113944 non-null  object 
 14  intake_type_i       113944 non-null  object 
 15  intake_condition_i  113944 non-nul

<a href='#top'>Back to Top</a>

## Trying out the scripts from `.py` files <a name='scripts'></a>


In [41]:
import wrangle

In [42]:
df = wrangle.get_pet_dataframe()
df

Returning saved csv files.


Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,...,datetime_i,datetime2_i,found_location_i,intake_type_i,intake_condition_i,animal_type_i,sex_upon_intake_i,age_upon_intake_i,breed_i,color_i
0,A859339,*Bodhi,2022-06-29T13:37:00.000,2022-06-29T13:37:00.000,2021-02-14T00:00:00.000,Adoption,Dog,Spayed Female,1 year,German Shepherd,...,2022-06-12T14:45:00.000,2022-06-12T14:45:00.000,1156 W Cesar Chavez Street in Austin (TX),Stray,Normal,Dog,Spayed Female,1 year,German Shepherd,Black/Brown
1,A860179,Alloy,2022-06-29T12:36:00.000,2022-06-29T12:36:00.000,2018-06-23T00:00:00.000,Transfer,Dog,Neutered Male,4 years,German Shepherd Mix,...,2022-06-23T17:57:00.000,2022-06-23T17:57:00.000,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,4 years,German Shepherd Mix,Black/Brown
2,A860475,,2022-06-29T12:13:00.000,2022-06-29T12:13:00.000,2022-05-26T00:00:00.000,Transfer,Cat,Intact Female,4 weeks,Domestic Shorthair,...,2022-06-29T08:05:00.000,2022-06-29T08:05:00.000,On Ih 35 Between Exit 240 And 241 in Austin (TX),Stray,Injured,Cat,Intact Female,4 weeks,Domestic Shorthair,Black
3,A860471,A860471,2022-06-29T11:46:00.000,2022-06-29T11:46:00.000,2022-05-16T00:00:00.000,Euthanasia,Cat,Intact Male,,Domestic Shorthair,...,2022-06-29T07:56:00.000,2022-06-29T07:56:00.000,13776 Us 183 in Austin (TX),Stray,Normal,Cat,Intact Male,1 month,Domestic Shorthair,Brown Tabby/White
4,A860434,,2022-06-29T11:32:00.000,2022-06-29T11:32:00.000,2021-06-28T00:00:00.000,Euthanasia,Other,Unknown,1 year,Bat,...,2022-06-28T15:42:00.000,2022-06-28T15:42:00.000,8350 Bluff Springs Road Apt 1515 in Austin (TX),Wildlife,Normal,Other,Unknown,1 year,Bat,Brown
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113979,A664258,Sylvio,2013-10-01T12:27:00.000,2013-10-01T12:27:00.000,2006-10-01T00:00:00.000,Return to Owner,Dog,Neutered Male,7 years,Weimaraner Mix,...,2013-10-01T11:15:00.000,2013-10-01T11:15:00.000,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Dog,Neutered Male,7 years,Weimaraner Mix,Silver
113980,A648744,Claire,2013-10-01T12:27:00.000,2013-10-01T12:27:00.000,2012-03-04T00:00:00.000,Return to Owner,Dog,Spayed Female,1 year,Anatol Shepherd Mix,...,2013-10-01T11:15:00.000,2013-10-01T11:15:00.000,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Dog,Spayed Female,1 year,Anatol Shepherd Mix,White/Tricolor
113981,A664236,,2013-10-01T10:44:00.000,2013-10-01T10:44:00.000,2013-09-24T00:00:00.000,Transfer,Cat,Unknown,1 week,Domestic Shorthair Mix,...,2013-10-01T08:33:00.000,2013-10-01T08:33:00.000,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White
113982,A664237,,2013-10-01T10:44:00.000,2013-10-01T10:44:00.000,2013-09-24T00:00:00.000,Transfer,Cat,Unknown,1 week,Domestic Shorthair Mix,...,2013-10-01T08:33:00.000,2013-10-01T08:33:00.000,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White


In [43]:
df = wrangle.prepare_pet_dataframe(df)
df

Unnamed: 0,animal_id,name,outcome_type,animal_type,sex_upon_outcome,breed,color,outcome_subtype,found_location,intake_type,intake_condition,sex_upon_intake,outcome_date,intake_date,age_at_outcome,age_at_intake
0,A859339,*Bodhi,Adoption,Dog,Spayed Female,German Shepherd,Black/Brown,no subtype,1156 W Cesar Chavez Street in Austin (TX),Stray,Normal,Spayed Female,2022-06-29,2022-06-12,365,365
1,A860179,Alloy,Transfer,Dog,Neutered Male,German Shepherd Mix,Black/Brown,Partner,Austin (TX),Owner Surrender,Normal,Neutered Male,2022-06-29,2022-06-23,1461,1461
2,A860475,no name,Transfer,Cat,Intact Female,Domestic Shorthair,Black,Partner,On Ih 35 Between Exit 240 And 241 in Austin (TX),Stray,Injured,Intact Female,2022-06-29,2022-06-29,28,28
3,A860434,no name,Euthanasia,Other,Unknown,Bat,Brown,Rabies Risk,8350 Bluff Springs Road Apt 1515 in Austin (TX),Wildlife,Normal,Unknown,2022-06-29,2022-06-28,365,365
4,A854024,Charles,Transfer,Cat,Neutered Male,Domestic Medium Hair,Orange Tabby,Snr,16600 Sydney Carol in Austin (TX),Stray,Injured,Intact Male,2022-06-29,2022-03-29,730,730
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113939,A664258,Sylvio,Return to Owner,Dog,Neutered Male,Weimaraner Mix,Silver,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Neutered Male,2013-10-01,2013-10-01,2557,2557
113940,A648744,Claire,Return to Owner,Dog,Spayed Female,Anatol Shepherd Mix,White/Tricolor,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Spayed Female,2013-10-01,2013-10-01,365,365
113941,A664236,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7
113942,A664237,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7


In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 113944 entries, 0 to 113943
Data columns (total 16 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   animal_id         113944 non-null  object        
 1   name              113944 non-null  object        
 2   outcome_type      113944 non-null  object        
 3   animal_type       113944 non-null  object        
 4   sex_upon_outcome  113944 non-null  object        
 5   breed             113944 non-null  object        
 6   color             113944 non-null  object        
 7   outcome_subtype   113944 non-null  object        
 8   found_location    113944 non-null  object        
 9   intake_type       113944 non-null  object        
 10  intake_condition  113944 non-null  object        
 11  sex_upon_intake   113944 non-null  object        
 12  outcome_date      113944 non-null  datetime64[ns]
 13  intake_date       113944 non-null  datetime64[ns]
 14  age_

In [45]:
df.rename(columns = {'found_location_i':'found_location',
                    'intake_type_i':'intake_type',
                    'intake_condition_i': 'intake_condition',
                    'sex_upon_intake_i': 'sex_upon_intake'})

Unnamed: 0,animal_id,name,outcome_type,animal_type,sex_upon_outcome,breed,color,outcome_subtype,found_location,intake_type,intake_condition,sex_upon_intake,outcome_date,intake_date,age_at_outcome,age_at_intake
0,A859339,*Bodhi,Adoption,Dog,Spayed Female,German Shepherd,Black/Brown,no subtype,1156 W Cesar Chavez Street in Austin (TX),Stray,Normal,Spayed Female,2022-06-29,2022-06-12,365,365
1,A860179,Alloy,Transfer,Dog,Neutered Male,German Shepherd Mix,Black/Brown,Partner,Austin (TX),Owner Surrender,Normal,Neutered Male,2022-06-29,2022-06-23,1461,1461
2,A860475,no name,Transfer,Cat,Intact Female,Domestic Shorthair,Black,Partner,On Ih 35 Between Exit 240 And 241 in Austin (TX),Stray,Injured,Intact Female,2022-06-29,2022-06-29,28,28
3,A860434,no name,Euthanasia,Other,Unknown,Bat,Brown,Rabies Risk,8350 Bluff Springs Road Apt 1515 in Austin (TX),Wildlife,Normal,Unknown,2022-06-29,2022-06-28,365,365
4,A854024,Charles,Transfer,Cat,Neutered Male,Domestic Medium Hair,Orange Tabby,Snr,16600 Sydney Carol in Austin (TX),Stray,Injured,Intact Male,2022-06-29,2022-03-29,730,730
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113939,A664258,Sylvio,Return to Owner,Dog,Neutered Male,Weimaraner Mix,Silver,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Neutered Male,2013-10-01,2013-10-01,2557,2557
113940,A648744,Claire,Return to Owner,Dog,Spayed Female,Anatol Shepherd Mix,White/Tricolor,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Spayed Female,2013-10-01,2013-10-01,365,365
113941,A664236,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7
113942,A664237,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7


In [51]:
def make_target_column(df):
    new_data = []
    for i, row in df.iterrows():
        if row['outcome_type'] == 'Adoption':
            target_string = 'Adoption'
        elif row['outcome_type'] == 'Transfer':
            target_string = 'Transfer'
        elif row['outcome_type'] == 'Return to Owner':
            target_string = 'Return to Owner'
        else:
            target_string = 'Other'
        datum_calc = {
            'animal_id':row['animal_id'],
            'target_outcome':target_string
        }
        new_data.append(datum_calc)
    df_calc = pd.DataFrame(new_data)
    df = df.merge(df_calc)
    return df

make_target_column(df)

Unnamed: 0,animal_id,name,outcome_type,animal_type,sex_upon_outcome,breed,color,outcome_subtype,found_location,intake_type,intake_condition,sex_upon_intake,outcome_date,intake_date,age_at_outcome,age_at_intake,target_outcome
0,A859339,*Bodhi,Adoption,Dog,Spayed Female,German Shepherd,Black/Brown,no subtype,1156 W Cesar Chavez Street in Austin (TX),Stray,Normal,Spayed Female,2022-06-29,2022-06-12,365,365,Adoption
1,A860179,Alloy,Transfer,Dog,Neutered Male,German Shepherd Mix,Black/Brown,Partner,Austin (TX),Owner Surrender,Normal,Neutered Male,2022-06-29,2022-06-23,1461,1461,Transfer
2,A860475,no name,Transfer,Cat,Intact Female,Domestic Shorthair,Black,Partner,On Ih 35 Between Exit 240 And 241 in Austin (TX),Stray,Injured,Intact Female,2022-06-29,2022-06-29,28,28,Transfer
3,A860434,no name,Euthanasia,Other,Unknown,Bat,Brown,Rabies Risk,8350 Bluff Springs Road Apt 1515 in Austin (TX),Wildlife,Normal,Unknown,2022-06-29,2022-06-28,365,365,Other
4,A854024,Charles,Transfer,Cat,Neutered Male,Domestic Medium Hair,Orange Tabby,Snr,16600 Sydney Carol in Austin (TX),Stray,Injured,Intact Male,2022-06-29,2022-03-29,730,730,Transfer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113939,A664258,Sylvio,Return to Owner,Dog,Neutered Male,Weimaraner Mix,Silver,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Neutered Male,2013-10-01,2013-10-01,2557,2557,Return to Owner
113940,A648744,Claire,Return to Owner,Dog,Spayed Female,Anatol Shepherd Mix,White/Tricolor,no subtype,Fm 1626/Manchaca Rd in Travis (TX),Stray,Normal,Spayed Female,2013-10-01,2013-10-01,365,365,Return to Owner
113941,A664236,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7,Transfer
113942,A664237,no name,Transfer,Cat,Unknown,Domestic Shorthair Mix,Orange/White,Partner,Abia in Austin (TX),Stray,Normal,Unknown,2013-10-01,2013-10-01,7,7,Transfer


In [58]:
df[df['outcome_type'] == 'Disposal']

Unnamed: 0,animal_id,name,outcome_type,animal_type,sex_upon_outcome,breed,color,outcome_subtype,found_location,intake_type,intake_condition,sex_upon_intake,outcome_date,intake_date,age_at_outcome,age_at_intake
110,A860288,no name,Disposal,Bird,Unknown,Grackle,Black,no subtype,7201 Levander Loop in Austin (TX),Wildlife,Injured,Unknown,2022-06-26,2022-06-25,730,730
227,A860153,no name,Disposal,Other,Unknown,Bat,Brown,no subtype,Austin (TX),Wildlife,Injured,Unknown,2022-06-24,2022-06-23,730,730
588,A859196,A859196,Disposal,Cat,Intact Male,Domestic Shorthair,Brown Tabby,no subtype,1304 Kirkwood Road in Austin (TX),Stray,Injured,Intact Male,2022-06-14,2022-06-10,30,30
1842,A856412,no name,Disposal,Cat,Intact Female,Domestic Shorthair,Gray/White,no subtype,2918 Castro St in Austin (TX),Stray,Injured,Intact Female,2022-05-06,2022-05-02,730,730
1930,A856179,no name,Disposal,Cat,Intact Male,Domestic Medium Hair,Black,no subtype,23Rd And Rio Grande in Austin (TX),Stray,Normal,Intact Male,2022-05-03,2022-04-29,30,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113514,A665255,no name,Disposal,Other,Unknown,Bat Mix,Brown,no subtype,715 8Th St in Austin (TX),Wildlife,Normal,Unknown,2013-10-15,2013-10-15,730,730
113725,A664923,no name,Disposal,Other,Unknown,Bat Mix,Brown,no subtype,43 Rainey in Austin (TX),Wildlife,Normal,Unknown,2013-10-11,2013-10-10,730,730
113726,A664786,no name,Disposal,Other,Unknown,Bat Mix,Brown,no subtype,2901 Gardne St. in Austin (TX),Wildlife,Sick,Unknown,2013-10-10,2013-10-08,183,183
113770,A664853,no name,Disposal,Other,Unknown,Bat Mix,Brown,no subtype,901 E. Ben White in Austin (TX),Wildlife,Sick,Unknown,2013-10-10,2013-10-10,183,183
