# Module 1 - Manipulating data with Pandas (continued)
## Pandas Part 2

![austin](http://www.austintexas.gov/sites/default/files/aac_logo.jpg)

## Scenario:
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and get more information about planning. In this lecture, we are continue to look at a real data set collected by [Austin Animal Center](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) over several years and use our Pandas skills from last class and learn some new ones in order to explore this data further.

#### *Our goals today are to be able to*:  

Use the pandas library to:

- Get summary info about a dataset and its variables
  - Apply and use info, describe and dtypes
  - Use `mean`, `min`, `max`, and `value_counts` 
- Use `apply` and `applymap` to transform columns and create new values

- Explain lambda functions and use them to use an apply on a DataFrame
- Explain what a `groupby` object is and split a DataFrame using `groupby`
- Reshape a DataFrame using joins, merges, pivoting, stacking, and melting


## Getting started

Before we look at the animal shelter data, let's practice on a simpler dataset.
Read about this dataset here: https://www.kaggle.com/ronitf/heart-disease-uci
![heart-data](images/heartbloodpres.jpeg)

The dataset is most often used to practice classification algorithms. Can one develop a model to predict the likelihood of heart disease based on other measurable characteristics? We will return to that specific question in a few weeks, but for now we wish to use the dataset to practice some pandas methods.

### 1. Get summary info about a dataset and its variables

Applying and using `info`, `describe`, `mean`, `min`, `max`, `apply`, and `applymap` from the Pandas library

The Pandas library has several useful tools built in. Let's explore some of them.

In [1]:
!pwd
!ls -al

/Users/yongweigao/Code/datascience/hbs-ds-060120/module-1/day-6-pandas-2
total 40
drwxr-xr-x  6 yongweigao  staff    192 Jun  8 10:42 [34m.[m[m
drwxr-xr-x  9 yongweigao  staff    288 Jun  8 10:33 [34m..[m[m
drwxr-xr-x  3 yongweigao  staff     96 Jun  8 10:42 [34m.ipynb_checkpoints[m[m
drwxr-xr-x  3 yongweigao  staff     96 Jun  8 10:33 [34mdata[m[m
drwxr-xr-x  4 yongweigao  staff    128 Jun  8 10:33 [34mimages[m[m
-rw-r--r--  1 yongweigao  staff  16953 Jun  8 10:42 manipulating_data_with_pandas.ipynb


In [1]:
import pandas as pd
uci = pd.read_csv('data/heart.csv')

In [2]:
uci.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


#### The `.columns` and `.shape` Attributes

In [3]:
uci.columns

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')

In [4]:
uci.shape

(303, 14)

#### The `.info() `and `.describe()` and `.dtypes` methods

Pandas DataFrames have many useful methods! Let's look at `.info()` , `.describe()`, and `dtypes`.

In [5]:
# Call the .info() method on our dataset. What do you observe?

uci.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.2 KB


In [6]:
# Call the .describe() method on our dataset. What do you observe?

uci.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [7]:
# Use the code below. How does the output differ from info() ?
uci.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal          int64
target        int64
dtype: object

#### `.mean()`, .`min()`,` .max()`, `.sum()`

The methods `.mean()`, `.min()`, and `.max()` will perform just the way you think they will!

Note that these are methods both for Series and for DataFrames.

In [8]:
uci.ca.mean()

0.7293729372937293

#### The Axis Variable

In [9]:
uci.mean(axis = 0) # Try [shift] + [tab] here!

age          54.366337
sex           0.683168
cp            0.966997
trestbps    131.623762
chol        246.264026
fbs           0.148515
restecg       0.528053
thalach     149.646865
exang         0.326733
oldpeak       1.039604
slope         1.399340
ca            0.729373
thal          2.313531
target        0.544554
dtype: float64

#### .`value_counts()`

For a DataFrame _Series_, the `.value_counts()` method will tell you how many of each value you've got.

In [10]:
uci['age'].value_counts()[:10]
uci.age.value_counts()

58    19
57    17
54    16
59    14
52    13
51    12
62    11
44    11
60    11
56    11
64    10
41    10
63     9
67     9
55     8
45     8
42     8
53     8
61     8
65     8
43     8
66     7
50     7
48     7
46     7
49     5
47     5
39     4
35     4
68     4
70     4
40     3
71     3
69     3
38     3
34     2
37     2
77     1
76     1
74     1
29     1
Name: age, dtype: int64

Exercise: What are the different values for restecg?

In [11]:
# Your code here!
len(uci.restecg.value_counts())

3

### Apply to Animal Shelter Data
Using `.info()` and `.describe()` and `dtypes` what observations can we make about the data?

What are the breed value counts?

How about age counts for dogs?

In [62]:
animal_outcomes = pd.read_csv('https://data.austintexas.gov/api/views/9t4d-g238/rows.csv?accessType=DOWNLOAD')

In [13]:
animal_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118007 entries, 0 to 118006
Data columns (total 12 columns):
Animal ID           118007 non-null object
Name                81037 non-null object
DateTime            118007 non-null object
MonthYear           118007 non-null object
Date of Birth       118007 non-null object
Outcome Type        117999 non-null object
Outcome Subtype     53644 non-null object
Animal Type         118007 non-null object
Sex upon Outcome    118003 non-null object
Age upon Outcome    117955 non-null object
Breed               118007 non-null object
Color               118007 non-null object
dtypes: object(12)
memory usage: 10.8+ MB


In [14]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A794011,Chunk,05/08/2019 06:20:00 PM,05/08/2019 06:20:00 PM,05/02/2017,Rto-Adopt,,Cat,Neutered Male,2 years,Domestic Shorthair Mix,Brown Tabby/White
1,A776359,Gizmo,07/18/2018 04:02:00 PM,07/18/2018 04:02:00 PM,07/12/2017,Adoption,,Dog,Neutered Male,1 year,Chihuahua Shorthair Mix,White/Brown
2,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
3,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
4,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black


In [15]:
animal_outcomes.describe()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
count,118007,81037,118007,118007,118007,117999,53644,118007,118003,117955,118007,118007
unique,105488,18949,97248,97248,6852,9,22,5,5,50,2571,585
top,A721033,Max,04/18/2016 12:00:00 AM,04/18/2016 12:00:00 AM,04/21/2014,Adoption,Partner,Dog,Neutered Male,1 year,Domestic Shorthair Mix,Black/White
freq,33,531,39,39,117,51979,29272,67135,41428,21238,30780,12402


What are the breed `value_counts`?
What's the top breed for adopted dogs?

How about outcome counts for dogs?




In [16]:
animal_outcomes.Name.value_counts()[:10]

Max         531
Bella       500
Luna        461
Rocky       358
Daisy       345
Princess    328
Charlie     316
Coco        310
Lucy        304
Blue        297
Name: Name, dtype: int64

In [17]:
animal_outcomes.Breed.value_counts()[:10]

Domestic Shorthair Mix       30780
Pit Bull Mix                  8305
Labrador Retriever Mix        6645
Chihuahua Shorthair Mix       6175
Domestic Shorthair            5229
Domestic Medium Hair Mix      3105
German Shepherd Mix           2900
Bat Mix                       1747
Domestic Longhair Mix         1527
Australian Cattle Dog Mix     1456
Name: Breed, dtype: int64

In [18]:
animal_outcomes['Animal Type'].unique()

array(['Cat', 'Dog', 'Other', 'Bird', 'Livestock'], dtype=object)

In [19]:
animal_outcomes.loc[(animal_outcomes['Animal Type'] == 'Dog')&(animal_outcomes['Outcome Type'] == 'Adoption'),
                   'Breed'].value_counts()[:2]

Labrador Retriever Mix    3417
Pit Bull Mix              3234
Name: Breed, dtype: int64

### 2.  Changing data

#### DataFrame.applymap() and Series.map()

The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [21]:
uci.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [20]:
def successor(x):
    return x + 1

In [22]:
uci.applymap(successor).head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,64,2,4,146,234,2,1,151,1,3.3,1,1,2,2
1,38,2,3,131,251,1,2,188,1,4.5,1,1,3,2
2,42,1,2,131,205,1,1,173,1,2.4,3,1,3,2
3,57,2,2,121,237,1,2,179,1,1.8,3,1,3,2
4,58,1,1,121,355,1,2,164,2,1.6,3,1,3,2


The `.map()` or `.apply` method takes a function as input that it will then apply to every entry in the Series.

In [24]:
uci['age'].map(successor).head(10)


0    64
1    38
2    42
3    57
4    58
5    58
6    57
7    45
8    53
9    58
Name: age, dtype: int64

In [27]:
uci['sex_name'] = uci['sex'].map({0:'male',1:'female'})

In [29]:
uci.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target,sex_name
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0,male
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0,female
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0,female
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0,female
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0,male


#### Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [30]:
uci['oldpeak'].map(lambda x: round(x))[:4]

0    2
1    4
2    1
3    1
Name: oldpeak, dtype: int64

Exercise: Use an anonymous function to turn the entries in age to strings

In [31]:
uci['age'].map(str)[:4]

0    63
1    37
2    41
3    56
Name: age, dtype: object

### Apply to Animal Shelter Data

Use an `apply` to change the dates from strings to datetime objects. Similarly, use an apply to change the ages of the animals from strings to floats.

In [33]:
animal_outcomes.head()
animal_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118007 entries, 0 to 118006
Data columns (total 12 columns):
Animal ID           118007 non-null object
Name                81037 non-null object
DateTime            118007 non-null object
MonthYear           118007 non-null object
Date of Birth       118007 non-null object
Outcome Type        117999 non-null object
Outcome Subtype     53644 non-null object
Animal Type         118007 non-null object
Sex upon Outcome    118003 non-null object
Age upon Outcome    117955 non-null object
Breed               118007 non-null object
Color               118007 non-null object
dtypes: object(12)
memory usage: 10.8+ MB


In [39]:
# Your code here
animal_outcomes['DateTime'] = pd.to_datetime(animal_outcomes['DateTime'])
animal_outcomes['MonthYear'] = pd.to_datetime(animal_outcomes['MonthYear'])

In [40]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A794011,Chunk,2019-05-08 18:20:00,2019-05-08 18:20:00,05/02/2017,Rto-Adopt,,Cat,Neutered Male,2 years,Domestic Shorthair Mix,Brown Tabby/White
1,A776359,Gizmo,2018-07-18 16:02:00,2018-07-18 16:02:00,07/12/2017,Adoption,,Dog,Neutered Male,1 year,Chihuahua Shorthair Mix,White/Brown
2,A720371,Moose,2016-02-13 17:59:00,2016-02-13 17:59:00,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
3,A674754,,2014-03-18 11:47:00,2014-03-18 11:47:00,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
4,A689724,*Donatello,2014-10-18 18:52:00,2014-10-18 18:52:00,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black


In [41]:
animal_outcomes['Date of Birth'] = pd.to_datetime(animal_outcomes['Date of Birth'])

In [47]:
import datetime
datetime.date.today() - datetime.date(2010, 5, 15)

TypeError: unsupported operand type(s) for -: 'datetime.date' and 'datetime.datetime'

In [51]:
def calculate_age(val):
    return round((datetime.datetime.now() - val).days / 365, 2)

In [52]:
calculate_age(datetime.datetime(1776,7,4))

244.09

In [53]:
animal_outcomes['Date of Birth'].map(calculate_age)

0          3.10
1          2.91
2          4.67
3          6.25
4          5.86
5          6.02
6          7.87
7         10.39
8          6.00
9          0.84
10         6.01
11         2.10
12         4.15
13         4.06
14        11.39
15        12.64
16         3.28
17         3.23
18        10.89
19         1.09
20         3.31
21         6.50
22         4.75
23         1.31
24         4.01
25         3.37
26         7.19
27         4.61
28        10.24
29         5.15
          ...  
117977     0.18
117978     2.45
117979     0.19
117980     0.19
117981     0.17
117982     0.18
117983     1.05
117984     4.05
117985     2.04
117986     0.17
117987     0.87
117988     0.47
117989     0.11
117990     0.19
117991     2.58
117992     1.01
117993     0.93
117994     0.18
117995    13.87
117996     1.03
117997     0.04
117998     1.02
117999     0.83
118000    11.96
118001    13.87
118002     0.21
118003     3.13
118004     1.56
118005     0.49
118006     1.24
Name: Date of Birth, Len

In [54]:
animal_outcomes

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A794011,Chunk,2019-05-08 18:20:00,2019-05-08 18:20:00,2017-05-02,Rto-Adopt,,Cat,Neutered Male,2 years,Domestic Shorthair Mix,Brown Tabby/White
1,A776359,Gizmo,2018-07-18 16:02:00,2018-07-18 16:02:00,2017-07-12,Adoption,,Dog,Neutered Male,1 year,Chihuahua Shorthair Mix,White/Brown
2,A720371,Moose,2016-02-13 17:59:00,2016-02-13 17:59:00,2015-10-08,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
3,A674754,,2014-03-18 11:47:00,2014-03-18 11:47:00,2014-03-12,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
4,A689724,*Donatello,2014-10-18 18:52:00,2014-10-18 18:52:00,2014-08-01,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
5,A680969,*Zeus,2014-08-05 16:59:00,2014-08-05 16:59:00,2014-06-03,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby
6,A684617,,2014-07-27 09:00:00,2014-07-27 09:00:00,2012-07-26,Transfer,SCRP,Cat,Intact Female,2 years,Domestic Shorthair Mix,Black
7,A742354,Artemis,2017-01-22 11:56:00,2017-01-22 11:56:00,2010-01-20,Return to Owner,,Cat,Neutered Male,7 years,Domestic Shorthair Mix,Blue/White
8,A681036,,2014-06-11 17:11:00,2014-06-11 17:11:00,2014-06-09,Transfer,Partner,Cat,Intact Male,2 days,Domestic Shorthair Mix,Brown Tabby
9,A803149,*Birch,2019-08-31 16:26:00,2019-08-31 16:26:00,2019-08-08,Transfer,Partner,Cat,Intact Male,3 weeks,Domestic Shorthair,Brown Tabby


In [63]:
#make all columns lower case, and remove spaces
animal_outcomes.columns = [x.lower().replace(' ', '_') for x in animal_outcomes.columns]
animal_outcomes.columns


Index(['animal_id', 'name', 'datetime', 'monthyear', 'date_of_birth',
       'outcome_type', 'outcome_subtype', 'animal_type', 'sex_upon_outcome',
       'age_upon_outcome', 'breed', 'color'],
      dtype='object')

In [65]:
#remove null value  from 
animal_outcomes.dropna(subset=['outcome_type'])

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color
0,A794011,Chunk,05/08/2019 06:20:00 PM,05/08/2019 06:20:00 PM,05/02/2017,Rto-Adopt,,Cat,Neutered Male,2 years,Domestic Shorthair Mix,Brown Tabby/White
1,A776359,Gizmo,07/18/2018 04:02:00 PM,07/18/2018 04:02:00 PM,07/12/2017,Adoption,,Dog,Neutered Male,1 year,Chihuahua Shorthair Mix,White/Brown
2,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
3,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
4,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
5,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby
6,A684617,,07/27/2014 09:00:00 AM,07/27/2014 09:00:00 AM,07/26/2012,Transfer,SCRP,Cat,Intact Female,2 years,Domestic Shorthair Mix,Black
7,A742354,Artemis,01/22/2017 11:56:00 AM,01/22/2017 11:56:00 AM,01/20/2010,Return to Owner,,Cat,Neutered Male,7 years,Domestic Shorthair Mix,Blue/White
8,A681036,,06/11/2014 05:11:00 PM,06/11/2014 05:11:00 PM,06/09/2014,Transfer,Partner,Cat,Intact Male,2 days,Domestic Shorthair Mix,Brown Tabby
9,A803149,*Birch,08/31/2019 04:26:00 PM,08/31/2019 04:26:00 PM,08/08/2019,Transfer,Partner,Cat,Intact Male,3 weeks,Domestic Shorthair,Brown Tabby


In [67]:
#verify 
animal_outcomes.shape

(118010, 12)

In [70]:
print((animal_outcomes.datetime == animal_outcomes.monthyear).sum())
animal_outcomes  = animal_outcomes.drop(columns = 'monthyear')


118010


In [80]:
animal_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118010 entries, 0 to 118009
Data columns (total 11 columns):
animal_id           118010 non-null object
name                81039 non-null object
datetime            118010 non-null object
date_of_birth       118010 non-null object
outcome_type        118001 non-null object
outcome_subtype     53645 non-null object
animal_type         118010 non-null object
sex_upon_outcome    118006 non-null object
age_upon_outcome    117957 non-null object
breed               118010 non-null object
color               118010 non-null object
dtypes: object(11)
memory usage: 9.9+ MB


In [79]:
animal_outcomes.age_upon_outcome.head()

0     2 years
1      1 year
2    4 months
3      6 days
4    2 months
Name: age_upon_outcome, dtype: object

In [81]:
def convert_to_days_old(val):
    number, unit = val.split(' ')
    number = int(number)
    if 'year' in unit:
        return 365*number
    if 'month' in unit:
        return 30*number
    if 'week' in unit:
        return 7*number
    if 'day' in unit:
        return number
    return 'unknow'

In [82]:
animal_outcomes.age_upon_outcome.map(convert_to_days_old)

AttributeError: 'float' object has no attribute 'split'

## 3. Methods for Re-Organizing DataFrames
#### `.groupby()`

Those of you familiar with SQL have probably used the GROUP BY command. Pandas has this, too.

The `.groupby()` method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [None]:
uci.groupby('sex')

#### `.groups` and `.get_group()`

In [None]:
uci.groupby('sex').groups

In [None]:
uci.groupby('sex').get_group(0) # .tail()

### Aggregating

In [None]:
uci.groupby('sex').std()

Exercise: Tell me the average cholesterol level for those with heart disease.

In [None]:
# Your code here!


### Apply to Animal Shelter Data

#### Task 1
- Use a groupby to show the average age of the different kinds of animal types.
- What about by animal types **and** gender?
 

#### Task 2:
- Create new columns `year` and `month` by using a lambda function x.year on date
- Use `groupby` and `.size()` to tell me how many animals are adopted by month

In [None]:
# Your code here

## 4. Reshaping a DataFrame

### `.pivot()`

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [None]:
uci.pivot(values='sex', columns='target').head()

### Methods for Combining DataFrames: `.join()`, `.merge()`, `.concat()`, `.melt()`

### `.join()`

In [None]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns = ['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns = ['age', 'HP'])

In [None]:
toy1.join(toy2.set_index('age'),
          on = 'age',
          lsuffix = '_A',
          rsuffix = '_B').head()

### `.merge()`

In [None]:
ds_chars = pd.read_csv('data/ds_chars.csv', index_col = 0)

In [None]:
states = pd.read_csv('data/states.csv', index_col = 0)

In [None]:
ds_chars.merge(states,
               left_on='home_state',
               right_on = 'state',
               how = 'inner')

### `pd.concat()`

Exercise: Look up the documentation on pd.concat (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) and use it to concatenate ds_chars and states.
<br/>
Your result should still have only five rows!

In [None]:
pd.concat([ds_chars, states], sort=False)

### `pd.melt()`

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [None]:
ds_chars.head()

In [None]:
pd.melt(ds_chars,
        id_vars=['name'],
        value_vars=['HP', 'home_state'])

## Bringing it all together with the Animal Shelter Data

Join the data from the [Austin Animal Shelter Intake dataset](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) to the outcomes dataset by Animal ID.

Use the dates from each dataset to see how long animals spend in the shelter. Does it differ by time of year? By outcome?

_Hints_ :
- import and clean the intake dataset first
- use `apply`/`applymap`/`lambda` to change the variables to their proper format in the intake data
- rename the columns in the intake dataset *before* joining
- create a new `days-in-shelter` column
- Notice that some values in `days_in_shelter` are `NaN` or values < 0 (remove these rows using the "<" operator and `isna()` or `dropna()`)
- Use `groupby` to get aggregate information about the dataset (your choice)

To save your dataset:
Use the notation `df.to_csv()` or `df.to_excel()` to write the `df` to a csv. Read more about the `to_csv()` documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

In [None]:
#code here

In [8]:
names = ["Benjamin", "Bernadette", "Brian", "Betty", "Bella", "Brunhilda", "Bruno"]
def is_short(l_name):
    if len(l_name) < 8:
        return True
    else:
        return False
short_names = filter(is_short, names)
list(short_names)

['Brian', 'Betty', 'Bella', 'Bruno']

In [13]:
short_names = filter(lambda x: True if (len(x)<8) else False, names)
list(short_names)

['Brian', 'Betty', 'Bella', 'Bruno']