### Apply to Animal Shelter Data

Use an `apply` to change the dates from strings to datetime objects. Similarly, use an apply to change the ages of the animals from strings to floats.

In [6]:
import pandas as pd
animal_outcomes = pd.read_csv('https://data.austintexas.gov/api/views/9t4d-g238/rows.csv?accessType=DOWNLOAD')

In [7]:
# Your code here
animal_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 114111 entries, 0 to 114110
Data columns (total 12 columns):
Animal ID           114111 non-null object
Name                78241 non-null object
DateTime            114111 non-null object
MonthYear           114111 non-null object
Date of Birth       114111 non-null object
Outcome Type        114104 non-null object
Outcome Subtype     51693 non-null object
Animal Type         114111 non-null object
Sex upon Outcome    114107 non-null object
Age upon Outcome    114083 non-null object
Breed               114111 non-null object
Color               114111 non-null object
dtypes: object(12)
memory usage: 10.4+ MB


In [44]:
import datetime
animal_outcomes.DateTime = animal_outcomes.DateTime.apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %I:%M:%S %p'))

In [5]:
pd.to_datetime(animal_outcomes.DateTime)

0        2019-02-17 11:44:00
1        2016-02-13 17:59:00
2        2014-03-18 11:47:00
3        2014-10-18 18:52:00
4        2014-08-05 16:59:00
                 ...        
114100   2017-10-18 13:27:00
114101   2018-03-01 18:28:00
114102   2018-06-23 11:59:00
114103   2018-05-21 12:59:00
114104   2018-03-12 13:27:00
Name: DateTime, Length: 114105, dtype: datetime64[ns]

In [7]:
animal_outcomes['Age upon Outcome'].head()

0     2 years
1    4 months
2      6 days
3    2 months
4    2 months
Name: Age upon Outcome, dtype: object

In [13]:
animal_outcomes['Age upon Outcome'].isna().sum()

28

In [45]:
import numpy as np
def age_string_to_days_old(age):
    if age is np.NaN:
        return age
    qty, unit = age.split(' ')
    qty = int(qty)
    if 'day' in unit:
        return qty
    elif 'week' in unit:
        return qty * 7
    elif 'month' in unit:
        return qty * 30
    elif 'year' in unit:
        return qty * 365
    return np.NaN
animal_outcomes['Age upon Outcome'].apply(age_string_to_days_old)

0          730.0
1          120.0
2            6.0
3           60.0
4           60.0
           ...  
114100     365.0
114101    1460.0
114102      60.0
114103    1825.0
114104     240.0
Name: Age upon Outcome, Length: 114105, dtype: float64

In [46]:
animal_outcomes['age'] = animal_outcomes['Age upon Outcome'].apply(age_string_to_days_old)

## 3. Methods for Re-Organizing DataFrames
#### `.groupby()`

Those of you familiar with SQL have probably used the GROUP BY command. Pandas has this, too.

The `.groupby()` method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [18]:
uci = pd.read_csv('data/heart.csv')

In [24]:
uci.groupby('age')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x115be83d0>

#### `.groups` and `.get_group()`

In [29]:
uci.groupby('age').groups[60]

Int64Index([82, 136, 147, 174, 176, 186, 193, 194, 201, 207, 237], dtype='int64')

In [30]:
uci.groupby('age').get_group(60) # .tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
82,60,0,2,102,318,0,1,160,0,0.0,2,1,2,1
136,60,0,2,120,178,1,1,96,0,0.0,2,0,2,1
147,60,0,3,150,240,0,1,171,0,0.9,2,0,2,1
174,60,1,0,130,206,0,0,132,1,2.4,1,2,3,0
176,60,1,0,117,230,1,1,160,1,1.4,2,2,3,0
186,60,1,0,130,253,0,1,144,1,1.4,2,1,3,0
193,60,1,0,145,282,0,0,142,1,2.8,1,2,3,0
194,60,1,2,140,185,0,0,155,0,3.0,1,0,2,0
201,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
207,60,0,0,150,258,0,0,157,0,2.6,1,2,3,0


In [31]:
# .getgroup gets table of element you want based on the column

### Aggregating

In [32]:
uci.groupby('sex').std()

Unnamed: 0_level_0,age,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,9.409396,0.972427,19.311119,65.088946,0.332455,0.55715,20.047969,0.422503,1.119844,0.593736,0.881026,0.44129,0.435286
1,8.883803,1.059064,16.658246,42.782392,0.366955,0.510754,24.130882,0.484505,1.174632,0.627378,1.074082,0.659949,0.498626


Exercise: Tell me the average cholesterol level for those with heart disease.

In [34]:
# Your code here!
uci.groupby('target').get_group(1).chol.mean()

242.23030303030302

In [40]:
uci.loc[uci.target ==1].chol.mean()

242.23030303030302

### Apply to Animal Shelter Data

#### Task 1
- Use a groupby to show the average age of the different kinds of animal types.
- What about by animal types **and** gender?
 

In [58]:
animal_outcomes.groupby('Breed').age.mean()

Breed
Abyssinian                                        217.500000
Abyssinian Mix                                    960.000000
Affenpinscher Mix                                 958.125000
Afghan Hound Mix                                  730.000000
Afghan Hound/German Shepherd                      730.000000
                                                    ...     
Yorkshire Terrier/Shih Tzu                       1095.000000
Yorkshire Terrier/Soft Coated Wheaten Terrier     730.000000
Yorkshire Terrier/Standard Poodle                 486.666667
Yorkshire Terrier/Toy Poodle                     1173.500000
Yorkshire Terrier/Yorkshire Terrier              1825.000000
Name: age, Length: 2537, dtype: float64

In [69]:
animal_outcomes.groupby(['Breed', 'Sex upon Outcome']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,year,month
Breed,Sex upon Outcome,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Abyssinian,Intact Female,365.000000,2019.000000,7.000000
Abyssinian,Spayed Female,70.000000,2019.000000,6.666667
Abyssinian,Unknown,365.000000,2019.000000,7.000000
Abyssinian Mix,Intact Female,365.000000,2019.000000,4.000000
Abyssinian Mix,Neutered Male,395.000000,2016.000000,6.500000
...,...,...,...,...
Yorkshire Terrier/Toy Poodle,Intact Female,90.000000,2019.000000,6.000000
Yorkshire Terrier/Toy Poodle,Intact Male,1460.000000,2018.000000,4.666667
Yorkshire Terrier/Toy Poodle,Neutered Male,1703.333333,2017.666667,5.333333
Yorkshire Terrier/Toy Poodle,Spayed Female,718.333333,2019.000000,7.333333


#### Task 2:
- Create new columns `year` and `month` by using a lambda function x.year on date
- Use `groupby` and `.size()` to tell me how many animals are adopted by month

In [70]:
# Your code here
animal_outcomes['year'] = animal_outcomes['DateTime'].apply(lambda x: x.year)
animal_outcomes['month'] = animal_outcomes['DateTime'].apply(lambda x: x.month)
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,age,year,month
0,A789027,Lennie,2019-02-17 11:44:00,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream,730.0,2019,2
1,A720371,Moose,2016-02-13 17:59:00,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff,120.0,2016,2
2,A674754,,2014-03-18 11:47:00,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby,6.0,2014,3
3,A689724,*Donatello,2014-10-18 18:52:00,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black,60.0,2014,10
4,A680969,*Zeus,2014-08-05 16:59:00,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby,60.0,2014,8


In [73]:
animal_outcomes.groupby('month').count()['Animal ID']

month
1      8209
2      6882
3      8230
4      8208
5      9999
6     10888
7     11050
8     10451
9      9701
10    10995
11     9701
12     9791
Name: Animal ID, dtype: int64

## 4. Reshaping a DataFrame

### `.pivot()`

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [4]:
uci.pivot(values='sex', columns='target').head()

NameError: name 'uci' is not defined

### Methods for Combining DataFrames: `.join()`, `.merge()`, `.concat()`, `.melt()`

### `.join()`

In [10]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns=['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns=['age', 'HP'])

In [11]:
toy1.join(toy2.set_index('age'),
          on='age',
          lsuffix='_A',
          rsuffix='_B').head()

Unnamed: 0,age,HP_A,HP_B
0,63,142,100
1,33,47,200


### `.merge()`

In [12]:
ds_chars = pd.read_csv('data/ds_chars.csv', index_col=0)

In [13]:
states = pd.read_csv('data/states.csv', index_col=0)

In [14]:
ds_chars.merge(states,
               left_on='home_state',
               right_on='state',
               how='inner')

Unnamed: 0,name,HP,home_state,state,nickname,capital
0,greg,200,WA,WA,evergreen,Olympia
1,miles,200,WA,WA,evergreen,Olympia
2,alan,170,TX,TX,alamo,Austin
3,rachel,200,TX,TX,alamo,Austin
4,alison,300,DC,DC,district,Washington


### `pd.concat()`

Exercise: Look up the documentation on pd.concat (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) and use it to concatenate ds_chars and states.
<br/>
Your result should still have only five rows!

In [15]:
pd.concat([ds_chars, states])

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


Unnamed: 0,HP,capital,home_state,name,nickname,state
0,200.0,,WA,greg,,
1,200.0,,WA,miles,,
2,170.0,,TX,alan,,
3,300.0,,DC,alison,,
4,200.0,,TX,rachel,,
0,,Olympia,,,evergreen,WA
1,,Austin,,,alamo,TX
2,,Washington,,,district,DC
3,,Columbus,,,buckeye,OH
4,,Salem,,,beaver,OR


In [16]:
x = 5

### `pd.melt()`

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [17]:
ds_chars.head()

Unnamed: 0,name,HP,home_state
0,greg,200,WA
1,miles,200,WA
2,alan,170,TX
3,alison,300,DC
4,rachel,200,TX


In [18]:
pd.melt(ds_chars,
        id_vars=['name'],
        value_vars=['HP', 'home_state'])

Unnamed: 0,name,variable,value
0,greg,HP,200
1,miles,HP,200
2,alan,HP,170
3,alison,HP,300
4,rachel,HP,200
5,greg,home_state,WA
6,miles,home_state,WA
7,alan,home_state,TX
8,alison,home_state,DC
9,rachel,home_state,TX


## Bringing it all together with the Animal Shelter Data

Join the data from the [Austin Animal Shelter Intake dataset](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) to the outcomes dataset by Animal ID.

Use the dates from each dataset to see how long animals spend in the shelter. Does it differ by time of year? By outcome?

The Url for the Intake Dataset is here: https://data.austintexas.gov/api/views/wter-evkm/rows.csv?accessType=DOWNLOAD

_Hints_ :
- import and clean the intake dataset first
- use apply/applymap/lambda to change the variables to their proper format in the intake data
- rename the columns in the intake dataset *before* joining
- create a new days-in-shelter variable
- Notice that some values in "days_in_shelter" column are NaN or values < 0 (remove these rows using the "<" operator and ~is.na())
- Use group_by to get some interesting information about the dataset

Make sure to export and save your cleaned dataset. We will use it in a later lecture!

use the notation `df.to_csv()` to write the `df` to a csv. Read more about the `to_csv()` documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)