### Apply to Animal Shelter Data

Use an `apply` to change the dates from strings to datetime objects. Similarly, use an apply to change the ages of the animals from strings to floats.

In [36]:
import pandas as pd
import numpy as np
animal_outcomes = pd.read_csv('https://data.austintexas.gov/api/views/9t4d-g238/rows.csv?accessType=DOWNLOAD')

In [31]:
# Your code here
import datetime

In [32]:
animal_outcomes.DateTime.apply(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p"))

0        2019-02-17 11:44:00
1        2016-02-13 17:59:00
2        2014-03-18 11:47:00
3        2014-10-18 18:52:00
4        2014-08-05 16:59:00
                 ...        
114100   2017-10-18 13:27:00
114101   2018-03-01 18:28:00
114102   2018-06-23 11:59:00
114103   2018-05-21 12:59:00
114104   2018-03-12 13:27:00
Name: DateTime, Length: 114105, dtype: datetime64[ns]

In [15]:
pd.to_datetime(animal_outcomes.DateTime) # Other way to implement datetime, but it takes a long time. Big O Notation is too long

0        2019-02-17 11:44:00
1        2016-02-13 17:59:00
2        2014-03-18 11:47:00
3        2014-10-18 18:52:00
4        2014-08-05 16:59:00
                 ...        
114100   2017-10-18 13:27:00
114101   2018-03-01 18:28:00
114102   2018-06-23 11:59:00
114103   2018-05-21 12:59:00
114104   2018-03-12 13:27:00
Name: DateTime, Length: 114105, dtype: datetime64[ns]

In [39]:
# ages of the animals from strs to floats
animal_outcomes["Age upon Outcome"].head()

0    4 months
1      6 days
2    2 months
3    2 months
4     2 years
Name: Age upon Outcome, dtype: object

In [43]:
def age_str_to_days_old(age):
    if age is np.NaN: # if values is not a number, return age
        return age
    quant, unit = age.split(" ") # Look for a space and separate that into a list that has diff values 
    quant = int(quant)
    if "day" in unit:
        return quant
    elif "week" in unit:
        return quant * 7
    elif "month" in unit:
        return quant * 30
    elif "year" in unit:
        return quant * 365
    
    return np.NaN

animal_outcomes["Age upon Outcome"].apply(age_str_to_days_old)

0          120.0
1            6.0
2           60.0
3           60.0
4          730.0
           ...  
114101     365.0
114102    1460.0
114103      60.0
114104    1825.0
114105     240.0
Name: Age upon Outcome, Length: 114106, dtype: float64

## 3. Methods for Re-Organizing DataFrames
#### `.groupby()`

Those of you familiar with SQL have probably used the GROUP BY command. Pandas has this, too.

The `.groupby()` method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [44]:
uci = pd.read_csv('data/heart.csv')

In [45]:
uci.groupby('sex')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11c1d0040>

#### `.groups` and `.get_group()`

In [61]:
uci.groupby('sex').groups

{0: Int64Index([  2,   4,   6,  11,  14,  15,  16,  17,  19,  25,  28,  30,  35,
              36,  38,  39,  40,  43,  48,  49,  50,  53,  54,  59,  60,  65,
              67,  69,  74,  75,  82,  84,  85,  88,  89,  93,  94,  96, 102,
             105, 107, 108, 109, 110, 112, 115, 118, 119, 120, 122, 123, 124,
             125, 127, 128, 129, 130, 131, 134, 135, 136, 140, 142, 143, 144,
             146, 147, 151, 153, 154, 155, 161, 167, 181, 182, 190, 204, 207,
             213, 215, 216, 220, 223, 241, 246, 252, 258, 260, 263, 266, 278,
             289, 292, 296, 298, 302],
            dtype='int64'),
 1: Int64Index([  0,   1,   3,   5,   7,   8,   9,  10,  12,  13,
             ...
             288, 290, 291, 293, 294, 295, 297, 299, 300, 301],
            dtype='int64', length=207)}

In [57]:
uci.groupby('sex').get_group(0) # .tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
6,56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
11,48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
14,58,0,3,150,283,1,0,162,0,1.0,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
289,55,0,0,128,205,0,2,130,1,2.0,1,1,3,0
292,58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
296,63,0,0,124,197,0,1,136,1,0.0,1,0,2,0
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0


### Aggregating

In [62]:
uci.groupby('sex').std()  # Computes the standard deviation of the given data

Unnamed: 0_level_0,age,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,9.409396,0.972427,19.311119,65.088946,0.332455,0.55715,20.047969,0.422503,1.119844,0.593736,0.881026,0.44129,0.435286
1,8.883803,1.059064,16.658246,42.782392,0.366955,0.510754,24.130882,0.484505,1.174632,0.627378,1.074082,0.659949,0.498626


Exercise: Tell me the average cholesterol level for those with heart disease.

In [64]:
# Your code here!
uci.groupby("target").get_group(1).chol.mean()  # Going to target section, grouping it by targets that have 1, and getting the mean of chol

242.23030303030302

In [67]:
uci.loc[uci.target == 1].chol.mean()  # Second way using .loc

242.23030303030302

In [68]:
uci.loc[uci["target"] == 1].chol.mean()  # Second way using .loc, just changed the input of target

242.23030303030302

### Apply to Animal Shelter Data

#### Task 1
- Use a groupby to show the average age of the different kinds of animal types.
- What about by animal types **and** gender?
 

In [85]:
animal_outcomes

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
1,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
2,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
3,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby
4,A684617,,07/27/2014 09:00:00 AM,07/27/2014 09:00:00 AM,07/26/2012,Transfer,SCRP,Cat,Intact Female,2 years,Domestic Shorthair Mix,Black
...,...,...,...,...,...,...,...,...,...,...,...,...
114101,A760365,,10/18/2017 01:27:00 PM,10/18/2017 01:27:00 PM,10/17/2016,Transfer,Partner,Cat,Intact Female,1 year,Domestic Shorthair Mix,Silver Tabby
114102,A767465,Loco,03/01/2018 06:28:00 PM,03/01/2018 06:28:00 PM,03/01/2014,Return to Owner,,Dog,Neutered Male,4 years,Chihuahua Shorthair Mix,Black/Cream
114103,A774386,,06/23/2018 11:59:00 AM,06/23/2018 11:59:00 AM,04/07/2018,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Brown Tabby
114104,A772554,Muneca,05/21/2018 12:59:00 PM,05/21/2018 12:59:00 PM,11/01/2012,Return to Owner,,Dog,Spayed Female,5 years,Norfolk Terrier Mix,Tan


In [106]:
animal_outcomes.groupby("Animal Type").mean().age

DataError: No numeric types to aggregate

In [105]:
animal_outcomes.groupby(["Animal Type", "Sex upon Outcome"]).mean()

DataError: No numeric types to aggregate

#### Task 2:
- Create new columns `year` and `month` by using a lambda function x.year on date
- Use `groupby` and `.size()` to tell me how many animals are adopted by month

In [108]:
# Your code here
animal_outcomes["year"] = animal_outcomes.DateTime.apply(lambda x: x.year)
animal_outcomes["month"] = animal_outcomes.DateTime.apply(lambda x: x.month)

AttributeError: 'str' object has no attribute 'year'

## 4. Reshaping a DataFrame

### `.pivot()`

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [110]:
uci.pivot(values='sex', columns='target').head()

target,0,1
0,,1.0
1,,1.0
2,,0.0
3,,1.0
4,,0.0


### Methods for Combining DataFrames: `.join()`, `.merge()`, `.concat()`, `.melt()`

### `.join()`

In [111]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns=['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns=['age', 'HP'])

In [123]:
print(toy1.join(toy2.set_index('age'), on='age', lsuffix='_A', rsuffix='_B').head())

   age  HP_A  HP_B
0   63   142   100
1   33    47   200


### `.merge()`

In [113]:
ds_chars = pd.read_csv('data/ds_chars.csv', index_col=0)

In [114]:
states = pd.read_csv('data/states.csv', index_col=0)

In [115]:
ds_chars.merge(states,
               left_on='home_state',
               right_on='state',
               how='inner')

Unnamed: 0,name,HP,home_state,state,nickname,capital
0,greg,200,WA,WA,evergreen,Olympia
1,miles,200,WA,WA,evergreen,Olympia
2,alan,170,TX,TX,alamo,Austin
3,rachel,200,TX,TX,alamo,Austin
4,alison,300,DC,DC,district,Washington


### `pd.concat()`

Exercise: Look up the documentation on pd.concat (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) and use it to concatenate ds_chars and states.
<br/>
Your result should still have only five rows!

In [116]:
pd.concat([ds_chars, states])

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  pd.concat([ds_chars, states])


Unnamed: 0,HP,capital,home_state,name,nickname,state
0,200.0,,WA,greg,,
1,200.0,,WA,miles,,
2,170.0,,TX,alan,,
3,300.0,,DC,alison,,
4,200.0,,TX,rachel,,
0,,Olympia,,,evergreen,WA
1,,Austin,,,alamo,TX
2,,Washington,,,district,DC
3,,Columbus,,,buckeye,OH
4,,Salem,,,beaver,OR


### `pd.melt()`

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [117]:
ds_chars.head()

Unnamed: 0,name,HP,home_state
0,greg,200,WA
1,miles,200,WA
2,alan,170,TX
3,alison,300,DC
4,rachel,200,TX


In [118]:
pd.melt(ds_chars,
        id_vars=['name'],
        value_vars=['HP', 'home_state'])

Unnamed: 0,name,variable,value
0,greg,HP,200
1,miles,HP,200
2,alan,HP,170
3,alison,HP,300
4,rachel,HP,200
5,greg,home_state,WA
6,miles,home_state,WA
7,alan,home_state,TX
8,alison,home_state,DC
9,rachel,home_state,TX


## Bringing it all together with the Animal Shelter Data

Join the data from the [Austin Animal Shelter Intake dataset](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) to the outcomes dataset by Animal ID.

Use the dates from each dataset to see how long animals spend in the shelter. Does it differ by time of year? By outcome?

The Url for the Intake Dataset is here: https://data.austintexas.gov/api/views/wter-evkm/rows.csv?accessType=DOWNLOAD

_Hints_ :
- import and clean the intake dataset first
- use apply/applymap/lambda to change the variables to their proper format in the intake data
- rename the columns in the intake dataset *before* joining
- create a new days-in-shelter variable
- Notice that some values in "days_in_shelter" column are NaN or values < 0 (remove these rows using the "<" operator and ~is.na())
- Use group_by to get some interesting information about the dataset

Make sure to export and save your cleaned dataset. We will use it in a later lecture!

use the notation `df.to_csv()` to write the `df` to a csv. Read more about the `to_csv()` documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

In [None]:
#code here
