# Module 1 - Manipulating data with Pandas
## Pandas Part 2

![austin](http://www.austintexas.gov/sites/default/files/aac_logo.jpg)

## Scenario:
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and get more information about planning. In this lecture, we are continue to look at a real data set collected by Austin Animal Center over several years and use our pandas skills from the last lecture and learn some new ones in order to explore this data further.

#### _Our goals today are to be able to_: <br/>

Use the pandas library to:

- Get summary info about a dataset and its variables
  - Apply and use info, describe and dtypes
  - Use mean, min, max, and value_counts 
- Use apply and applymap to transform columns and create new values

- Explain lambda functions and use them to use an apply on a DataFrame
- Explain what a groupby object is and split a DataFrame using a groupby
- Reshape a DataFrame using joins, merges, pivoting, stacking, and melting


## Getting started

Let's take a moment to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). What kinds of questions can we ask this data and what kinds of information can we get back?

In pairs and as a class, let's generate ideas.

## Switch gears

Before we answer those questions about the animal shelter data, let's practice on a simpler dataset.
Read about this dataset here: https://www.kaggle.com/ronitf/heart-disease-uci
![heart-data](images/heartbloodpres.jpeg)

The dataset is most often used to practice classification algorithms. Can one develop a model to predict the likelihood of heart disease based on other measurable characteristics? We will return to that specific question in a few weeks, but for now we wish to use the dataset to practice some pandas methods.

### 1. Get summary info about a dataset and its variables

Applying and using `info`, `describe`, `mean`, `min`, `max`, `apply`, and `applymap` from the Pandas library

The Pandas library has several useful tools built in. Let's explore some of them.

In [2]:
!pwd # check location. chdir = change directory
!ls -al

/c/Users/Rocio/Desktop/Class_files/dc-ds-071519/Module-1/week-2/day-6-pandas-part-2


pwd: ignoring non-option arguments


total 417
drwxr-xr-x 1 Rocio 197609      0 Jul 23 12:32 .
drwxr-xr-x 1 Rocio 197609      0 Jul 23 12:13 ..
drwxr-xr-x 1 Rocio 197609      0 Jul 23 12:32 .ipynb_checkpoints
-rw-r--r-- 1 Rocio 197609  11325 Jul 23 12:13 heart.csv
-rw-r--r-- 1 Rocio 197609  18136 Jul 23 12:13 manipulating_data_with_pandas.ipynb
-rw-r--r-- 1 Rocio 197609 196047 Jul 23 12:13 manipulating_data_with_pandas_sol.ipynb
-rw-r--r-- 1 Rocio 197609 188157 Jul 23 12:13 manipulating_data_with_pandas-Copy1.ipynb
-rw-r--r-- 1 Rocio 197609   3356 Jul 23 12:13 pre_process_animal_shelter_data.py
-rw-r--r-- 1 Rocio 197609    136 Jul 23 12:13 states.csv


In [3]:
import pandas as pd
uci = pd.read_csv('heart.csv')

In [92]:
uci.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


#### The `.columns` and `.shape` Attributes

In [None]:
uci.columns

In [None]:
uci.shape

#### The `.info() `and `.describe()` and `.dtypes` methods

Pandas DataFrames have many useful methods! Let's look at `.info()` , `.describe()`, and `dtypes`.

In [None]:
# Call the .info() method on our dataset. What do you observe?

uci.info()

In [None]:
# Call the .describe() method on our dataset. What do you observe?

#uci.describe()
uci.describe().T # .T = Transpose output

In [None]:
# Use the code below. How does the output differ from info() ?
uci.dtypes # ONLY SHOWS NUMERIC, NOT CATEGORICAL

In [None]:
type(uci.age)

#### `.mean()`, .`min()`,` .max()`, `.sum()`

The methods `.mean()`, `.min()`, and `.max()` will perform just the way you think they will!

Note that these are methods both for Series and for DataFrames.

In [None]:
uci.ca.mean()

#### The Axis Variable

In [None]:
uci.sum() # Try [shift] + [tab] here!

In [None]:
uci.mean()

#### .`value_counts()`

For a DataFrame _Series_, the `.value_counts()` method will tell you how many of each value you've got.

In [None]:
uci['age'].value_counts()[:10] #Top ten, sorted by value

Exercise: What are the different values for restecg?

In [None]:
uci['restecg'].value_counts()


### Apply to Animal Shelter Data
Using `.info()` and `.describe()` and `dtypes` what observations can we make about the data?

What are the breed value counts?

How about age counts for dogs?

In [4]:
animal_outcomes = pd.read_csv('https://data.austintexas.gov/api/views/9t4d-g238/rows.csv?accessType=DOWNLOAD')

In [100]:
animal_outcomes.describe()

Unnamed: 0,age_in_days,age_years,year,month
count,104492.0,104492.0,104492.0,104492.0
mean,811.304119,2.222751,2016.132211,6.595328
std,1077.076051,2.950893,1.730878,3.315509
min,-376.0,-1.030137,2013.0,1.0
25%,95.0,0.260274,2015.0,4.0
50%,371.0,1.016438,2016.0,7.0
75%,1028.0,2.816438,2018.0,9.0
max,9137.0,25.032877,2019.0,12.0


In [102]:
animal_outcomes['Animal Type'].value_counts()

Dog          59507
Cat          38974
Other         5517
Bird           478
Livestock       16
Name: Animal Type, dtype: int64

In [51]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,date_outcome,dob,age
0,A800396,,07/22/2019 09:01:00 AM,07/22/2019 09:01:00 AM,07/20/2017,Euthanasia,Rabies Risk,Other,Unknown,2 years,Raccoon,Gray,2019-07-22,2017-07-20,732 days
1,A800130,Kolby,07/21/2019 10:56:00 PM,07/21/2019 10:56:00 PM,05/01/2019,Adoption,,Dog,Spayed Female,2 months,Boxer,Brown,2019-07-21,2019-05-01,81 days
2,A799457,Hazel,07/21/2019 10:55:00 PM,07/21/2019 10:55:00 PM,07/08/2013,,,Dog,Spayed Female,6 years,Pit Bull,Tan/White,2019-07-21,2013-07-08,2204 days
3,A800069,,07/21/2019 07:57:00 PM,07/21/2019 07:57:00 PM,06/02/2019,Transfer,Partner,Cat,Intact Male,1 month,Domestic Shorthair,Orange Tabby,2019-07-21,2019-06-02,49 days
4,A795483,*Herb,07/21/2019 07:15:00 PM,07/21/2019 07:15:00 PM,04/21/2019,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair,Orange Tabby/White,2019-07-21,2019-04-21,91 days


In [34]:
animal_outcomes.Breed.value_counts()

Domestic Shorthair Mix                         29921
Pit Bull Mix                                    7934
Labrador Retriever Mix                          6181
Chihuahua Shorthair Mix                         5984
Domestic Medium Hair Mix                        3018
German Shepherd Mix                             2685
Bat Mix                                         1741
Domestic Shorthair                              1689
Domestic Longhair Mix                           1487
Australian Cattle Dog Mix                       1345
Siamese Mix                                     1202
Bat                                              990
Dachshund Mix                                    973
Boxer Mix                                        876
Border Collie Mix                                855
Miniature Poodle Mix                             799
Siberian Husky Mix                               611
Catahoula Mix                                    608
Australian Shepherd Mix                       

In [35]:
animal_outcomes[animal_outcomes['Animal Type'] == "Dog"].Breed.value_counts()

Pit Bull Mix                                         7934
Labrador Retriever Mix                               6181
Chihuahua Shorthair Mix                              5984
German Shepherd Mix                                  2685
Australian Cattle Dog Mix                            1345
Dachshund Mix                                         973
Boxer Mix                                             876
Border Collie Mix                                     855
Miniature Poodle Mix                                  799
Siberian Husky Mix                                    611
Catahoula Mix                                         608
Staffordshire Mix                                     605
Australian Shepherd Mix                               605
Rat Terrier Mix                                       589
Yorkshire Terrier Mix                                 586
Beagle Mix                                            523
Great Pyrenees Mix                                    520
Miniature Schn

In [None]:
animal_outcomes[(animal_outcomes['Animal Type'] == "Dog") & (animal_outcomes['Outcome Type'] == "Adoption")].Breed.value_counts()

What are the breed `value_counts`?
What's the top breed for adopted dogs?

How about outcome counts for dogs?




### 2.  Changing data

#### DataFrame.applymap() and Series.map()

The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [None]:
def successor(x):
    return x + 1

In [None]:
uci.head(1)

In [None]:
uci.applymap(successor).head()

The `.map()` method takes a function as input that it will then apply to every entry in the Series.

In [None]:
uci['age'].map(successor).tail(10)

#### Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [None]:
uci['oldpeak'].map(lambda x: round(x))[:4]

Exercise: Use an anonymous function to turn the entries in age to strings

In [None]:
uci.age.map(lambda x: str(x)).head(2)

### Apply to Animal Shelter Data

Use an `apply` to change the dates from strings to datetime objects. Similarly, use an apply to change the ages of the animals from strings to floats.

In [None]:
# Your code here

In [6]:
animal_outcomes.DateTime.dtype

dtype('O')

In [1]:
animal_outcomes["date_outcome"] = animal_outcomes.DateTime.map(lambda x : pd.to_datetime(x[:10], format = '%m/%d/%Y'))

NameError: name 'animal_outcomes' is not defined

In [84]:
animal_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104443 entries, 0 to 104442
Data columns (total 15 columns):
Animal ID           104443 non-null object
Name                71690 non-null object
DateTime            104443 non-null object
MonthYear           104443 non-null object
Date of Birth       104443 non-null object
Outcome Type        104436 non-null object
Outcome Subtype     47572 non-null object
Animal Type         104443 non-null object
Sex upon Outcome    104441 non-null object
Age upon Outcome    104429 non-null object
Breed               104443 non-null object
Color               104443 non-null object
date_outcome        104443 non-null datetime64[ns]
dob                 104443 non-null datetime64[ns]
age                 104443 non-null timedelta64[ns]
dtypes: datetime64[ns](2), object(12), timedelta64[ns](1)
memory usage: 12.0+ MB


In [90]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,date_outcome,dob,age
0,A800396,,07/22/2019 09:01:00 AM,07/22/2019 09:01:00 AM,07/20/2017,Euthanasia,Rabies Risk,Other,Unknown,2 years,Raccoon,Gray,2019-07-22,2017-07-20,732 days
1,A800130,Kolby,07/21/2019 10:56:00 PM,07/21/2019 10:56:00 PM,05/01/2019,Adoption,,Dog,Spayed Female,2 months,Boxer,Brown,2019-07-21,2019-05-01,81 days
2,A799457,Hazel,07/21/2019 10:55:00 PM,07/21/2019 10:55:00 PM,07/08/2013,,,Dog,Spayed Female,6 years,Pit Bull,Tan/White,2019-07-21,2013-07-08,2204 days
3,A800069,,07/21/2019 07:57:00 PM,07/21/2019 07:57:00 PM,06/02/2019,Transfer,Partner,Cat,Intact Male,1 month,Domestic Shorthair,Orange Tabby,2019-07-21,2019-06-02,49 days
4,A795483,*Herb,07/21/2019 07:15:00 PM,07/21/2019 07:15:00 PM,04/21/2019,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair,Orange Tabby/White,2019-07-21,2019-04-21,91 days


In [8]:
animal_outcomes['Age upon Outcome']

0           1 month
1          3 months
2            1 year
3          2 months
4           2 years
5           9 years
6           6 years
7          10 years
8          3 months
9          8 months
10         3 months
11         3 months
12        11 months
13         3 months
14         4 months
15          3 years
16          2 years
17         4 months
18           1 year
19         5 months
20         4 months
21         2 months
22         3 months
23         2 months
24         6 months
25          9 years
26          5 years
27         4 months
28         3 months
29          2 years
            ...    
104461      3 years
104462     14 years
104463      2 years
104464     14 years
104465     9 months
104466     10 years
104467      3 weeks
104468      3 weeks
104469      1 weeks
104470      1 weeks
104471       1 week
104472     4 months
104473      1 month
104474      3 years
104475      7 years
104476       1 year
104477      2 years
104478      3 years
104479       1 year


In [6]:
animal_outcomes["dob"] = animal_outcomes\
['Date of Birth'].map(lambda x : pd.to_datetime(x, format = '%m/%d/%Y'))

In [7]:
animal_outcomes['age_days'] = (animal_outcomes.date_outcome - animal_outcomes.dob).dt.days

In [8]:
animal_outcomes['age_years'] = animal_outcomes['age_days']/365

In [38]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,year,date_outcome,dob,month,age,age_in_days,age_years
0,A798909,,07/22/2019 09:34:00 PM,07/22/2019 09:34:00 PM,06/02/2019,Adoption,,Cat,Spayed Female,1 month,Domestic Shorthair,Black,2019,2019-07-22,2019-06-02,7,50 days,50,0.136986
1,A798933,,07/22/2019 09:33:00 PM,07/22/2019 09:33:00 PM,04/17/2019,Adoption,,Cat,Neutered Male,3 months,Domestic Shorthair,Blue Tabby/White,2019,2019-07-22,2019-04-17,7,96 days,96,0.263014
2,A800552,,07/22/2019 07:34:00 PM,07/22/2019 07:34:00 PM,07/22/2018,Euthanasia,Rabies Risk,Other,Unknown,1 year,Bat,Brown,2019,2019-07-22,2018-07-22,7,365 days,365,1.0
3,A799925,,07/22/2019 07:27:00 PM,07/22/2019 07:27:00 PM,05/01/2019,Adoption,,Dog,Neutered Male,2 months,Labrador Retriever/German Shepherd,Black,2019,2019-07-22,2019-05-01,7,82 days,82,0.224658
4,A800293,Brixton,07/22/2019 07:26:00 PM,07/22/2019 07:26:00 PM,07/18/2017,Adoption,,Dog,Neutered Male,2 years,Pit Bull,Brown/White,2019,2019-07-22,2017-07-18,7,734 days,734,2.010959


In [82]:
animal_outcomes.DateTime.data()

  """Entry point for launching an IPython kernel.


TypeError: 'memoryview' object is not callable

In [137]:
# Dont use
animal_outcomes['age'] = ((animal_outcomes['age'].astype('int64')/86400000000000)) ## Change timedelta64 to int64

In [138]:
animal_outcomes.head(1)

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,date_outcome,dob,age
0,A798405,*Benny,07/22/2019 10:38:00 AM,07/22/2019 10:38:00 AM,06/25/2018,Adoption,,Cat,Neutered Male,1 year,Domestic Shorthair,Black/White,2019-07-22,2018-06-25,392.0


## 3. Methods for Re-Organizing DataFrames
#### `.groupby()`

Those of you familiar with SQL have probably used the GROUP BY command. Pandas has this, too.

The `.groupby()` method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [17]:
uci.groupby('sex')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001A54334CB38>

#### `.groups` and `.get_group()`

In [18]:
uci.groupby('sex').groups #Two groups are 0 and 1

{0: Int64Index([  2,   4,   6,  11,  14,  15,  16,  17,  19,  25,  28,  30,  35,
              36,  38,  39,  40,  43,  48,  49,  50,  53,  54,  59,  60,  65,
              67,  69,  74,  75,  82,  84,  85,  88,  89,  93,  94,  96, 102,
             105, 107, 108, 109, 110, 112, 115, 118, 119, 120, 122, 123, 124,
             125, 127, 128, 129, 130, 131, 134, 135, 136, 140, 142, 143, 144,
             146, 147, 151, 153, 154, 155, 161, 167, 181, 182, 190, 204, 207,
             213, 215, 216, 220, 223, 241, 246, 252, 258, 260, 263, 266, 278,
             289, 292, 296, 298, 302],
            dtype='int64'),
 1: Int64Index([  0,   1,   3,   5,   7,   8,   9,  10,  12,  13,
             ...
             288, 290, 291, 293, 294, 295, 297, 299, 300, 301],
            dtype='int64', length=207)}

In [28]:
uci.groupby('sex').get_group(0) # .tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
6,56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
11,48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
14,58,0,3,150,283,1,0,162,0,1.0,2,0,2,1
15,50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
16,58,0,2,120,340,0,1,172,0,0.0,2,0,2,1
17,66,0,3,150,226,0,1,114,0,2.6,0,0,2,1
19,69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
25,71,0,1,160,302,0,1,162,0,0.4,2,2,2,1


### Aggregating

In [42]:
uci.groupby('sex').std()

Unnamed: 0_level_0,age,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,9.409396,0.972427,19.311119,65.088946,0.332455,0.55715,20.047969,0.422503,1.119844,0.593736,0.881026,0.44129,0.435286
1,8.883803,1.059064,16.658246,42.782392,0.366955,0.510754,24.130882,0.484505,1.174632,0.627378,1.074082,0.659949,0.498626


Exercise: Tell me the average cholesterol level for those with heart disease.

In [None]:
# Your code here!


### Apply to Animal Shelter Data

#### Task 1
- Use a groupby to show the average age of the different kinds of animal types.
- What about by animal types **and** gender?
 

In [140]:
animal_outcomes.groupby(['Animal Type', 'Sex upon Outcome']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,age
Animal Type,Sex upon Outcome,Unnamed: 2_level_1
Bird,Intact Female,824.4
Bird,Intact Male,564.667
Bird,Unknown,406.365
Cat,Intact Female,335.6
Cat,Intact Male,229.874
Cat,Neutered Male,722.122
Cat,Spayed Female,737.802
Cat,Unknown,186.856
Dog,Intact Female,838.046
Dog,Intact Male,902.391


In [141]:
animal_outcomes.groupby('Animal Type')['age'].mean()

Animal Type
Bird         511.893
Cat          539.721
Dog         1023.865
Livestock    419.688
Other        463.718
Name: age, dtype: float64

In [142]:
animal_outcomes.groupby(['Animal Type', 'Sex upon Outcome']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,age
Animal Type,Sex upon Outcome,Unnamed: 2_level_1
Bird,Intact Female,824.4
Bird,Intact Male,564.667
Bird,Unknown,406.365
Cat,Intact Female,335.6
Cat,Intact Male,229.874
Cat,Neutered Male,722.122
Cat,Spayed Female,737.802
Cat,Unknown,186.856
Dog,Intact Female,838.046
Dog,Intact Male,902.391


In [70]:
pd.set_option('display.float_format', lambda x: '%.3f' % x) # remove scientific thing

#### Task 2:
- Create new columns `year` and `month` by using a lambda function x.year on date
- Use `groupby` and `.size()` to tell me how many animals are adopted by month

In [153]:
animal_outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,date_outcome,dob,age,year,month
0,A798405,*Benny,07/22/2019 10:38:00 AM,07/22/2019 10:38:00 AM,06/25/2018,Adoption,,Cat,Neutered Male,1 year,Domestic Shorthair,Black/White,2019-07-22,2018-06-25,392.0,2019,7
1,A800396,,07/22/2019 09:01:00 AM,07/22/2019 09:01:00 AM,07/20/2017,Euthanasia,Rabies Risk,Other,Unknown,2 years,Raccoon,Gray,2019-07-22,2017-07-20,732.0,2019,7
2,A800130,Kolby,07/21/2019 10:56:00 PM,07/21/2019 10:56:00 PM,05/01/2019,Adoption,,Dog,Spayed Female,2 months,Boxer,Brown,2019-07-21,2019-05-01,81.0,2019,7
3,A799457,Hazel,07/21/2019 10:55:00 PM,07/21/2019 10:55:00 PM,07/08/2013,,,Dog,Spayed Female,6 years,Pit Bull,Tan/White,2019-07-21,2013-07-08,2204.0,2019,7
4,A800069,,07/21/2019 07:57:00 PM,07/21/2019 07:57:00 PM,06/02/2019,Transfer,Partner,Cat,Intact Male,1 month,Domestic Shorthair,Orange Tabby,2019-07-21,2019-06-02,49.0,2019,7


In [10]:
animal_outcomes["year"] = animal_outcomes.DateTime.map

In [9]:
animal_outcomes["year"] = animal_outcomes.date_outcome.map(lambda x : x.year)

In [10]:
animal_outcomes["month"] = animal_outcomes.date_outcome.map(lambda x : x.month)

In [15]:
animal_outcomes[animal_outcomes['Outcome Type'] == 'Adoption']\
.groupby(['year','month']).size()

year  month
2013  10        606
      11        552
      12        684
2014  1         518
      2         437
      3         483
      4         439
      5         507
      6         660
      7         907
      8         816
      9         607
      10        582
      11        519
      12        652
2015  1         540
      2         484
      3         472
      4         402
      5         629
      6         706
      7         896
      8         721
      9         630
      10        583
      11        658
      12        656
2016  1         599
      2         539
      3         540
               ... 
2017  2         647
      3         440
      4         510
      5         702
      6         746
      7         867
      8         851
      9         686
      10        709
      11        575
      12        649
2018  1         572
      2         508
      3         592
      4         471
      5         620
      6         790
      7         808
      8 

In [12]:
animal_outcomes.groupby(['Animal Type'])['age_days'].mean()

Animal Type
Bird          511.893305
Cat           539.390612
Dog          1024.058715
Livestock     419.687500
Other         463.700381
Name: age_days, dtype: float64

In [13]:
animal_outcomes.groupby(['Animal Type'])['age_years'].mean()

Animal Type
Bird         1.402447
Cat          1.477782
Dog          2.805640
Livestock    1.149829
Other        1.270412
Name: age_years, dtype: float64

## 4. Reshaping a DataFrame

### `.pivot()`

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [154]:
uci.pivot(values = 'sex', columns = 'target').head()

target,0,1
0,,1.0
1,,1.0
2,,0.0
3,,1.0
4,,0.0


In [39]:
animal_outcomes.pivot_table(values = 'age_years', index ='Animal Type', columns = 'Outcome Type')
# Set rows by index, values by values, and columns by columns

Outcome Type,Adoption,Died,Disposal,Euthanasia,Missing,Relocate,Return to Owner,Rto-Adopt,Transfer
Animal Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bird,1.436348,1.207828,1.003767,1.315504,1.052055,2.882496,1.29025,,1.437576
Cat,1.30813,1.194283,2.043955,3.209395,1.385898,,4.339341,3.653078,1.164983
Dog,2.083231,2.819843,2.448532,4.68475,2.285985,,4.109438,3.580149,2.380233
Livestock,1.089954,0.041096,,,,,0.447489,,1.745662
Other,1.360477,0.999807,1.144907,1.309469,0.564384,0.832055,2.519048,,1.085887


### Methods for Combining DataFrames: `.join()`, `.merge()`, `.concat()`, `.melt()`

### `.join()`

In [22]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns = ['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns = ['age', 'HP'])
toy2

Unnamed: 0,age,HP
0,63,100
1,33,200


In [20]:
toy1.join(toy2.set_index('age'),
          on = 'age',
          lsuffix = '_A',
          rsuffix = '_B').head()
#Toy 1 is left data, toy 2 is right data. They join at the 'age' column.  

Unnamed: 0,age,HP_A,HP_B
0,63,142,100
1,33,47,200


### `.merge()`

In [23]:
ds_chars = pd.read_csv('ds_chars.csv', index_col = 0)

FileNotFoundError: [Errno 2] File b'ds_chars.csv' does not exist: b'ds_chars.csv'

In [None]:
states = pd.read_csv('states.csv', index_col = 0)

In [None]:
ds_chars.merge(states,
               left_on='home_state',
               right_on = 'state',
               how = 'inner')
# Merge defaults to inner, join defaults to left

### `pd.concat()`

Exercise: Look up the documentation on pd.concat (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) and use it to concatenate ds_chars and states.
<br/>
Your result should still have only five rows!

In [24]:
pd.concat([ds_chars, states])

NameError: name 'ds_chars' is not defined

### `pd.melt()`

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [None]:
ds_chars.head()

In [None]:
pd.melt(ds_chars,
        id_vars=['name'],
        value_vars=['HP', 'home_state'])

## Bringing it all together with the Animal Shelter Data

Join the data from the [Austin Animal Shelter Intake dataset](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) to the outcomes dataset by Animal ID.

Use the dates from each dataset to see how long animals spend in the shelter. Does it differ by time of year? By outcome?

The Url for the Intake Dataset is here: https://data.austintexas.gov/api/views/wter-evkm/rows.csv?accessType=DOWNLOAD

_Hints_ :
- import and clean the intake dataset first
- use apply/applymap/lambda to change the variables to their proper format in the intake data
- rename the columns in the intake dataset *before* joining
- create a new days-in-shelter variable
- Notice that some values in "days_in_shelter" column are NaN or values < 0 (remove these rows using the "<" operator and ~is.na())
- Use group_by to get some interesting information about the dataset

Make sure to export and save your cleaned dataset. We will use it in a later lecture!

use the notation `df.to_csv()` to write the `df` to a csv. Read more about the `to_csv()` documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

In [14]:
intake_data = pd.read_csv('https://data.austintexas.gov/api/views/wter-evkm/rows.csv?accessType=DOWNLOAD')

In [62]:
intake_data.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,date_income
0,A800561,,07/23/2019 08:52:00 AM,07/23/2019 08:52:00 AM,11209 Metric Blvd in Austin (TX),Stray,Injured,Dog,Neutered Male,6 years,Miniature Pinscher Mix,Black/Tan,2019-07-23
1,A800557,,07/23/2019 07:48:00 AM,07/23/2019 07:48:00 AM,1109 S Pleasant Valley in Austin (TX),Stray,Normal,Cat,Unknown,1 year,Domestic Shorthair,Black,2019-07-23
2,A800556,,07/23/2019 07:20:00 AM,07/23/2019 07:20:00 AM,4434 Frontier Trail in Austin (TX),Stray,Injured,Cat,Intact Female,1 year,Domestic Shorthair,Tortie,2019-07-23
3,A660391,,07/22/2019 06:54:00 PM,07/22/2019 06:54:00 PM,Pedernales And East 6Th Street in Austin (TX),Stray,Normal,Dog,Intact Male,7 years,Pit Bull Mix,Brown,2019-07-22
4,A739213,Zorro,07/22/2019 06:46:00 PM,07/22/2019 06:46:00 PM,Manor (TX),Public Assist,Normal,Dog,Intact Male,4 years,Australian Shepherd Mix,Black/Gray,2019-07-22


In [15]:
intake_data["date_intake"] = intake_data.DateTime.map(lambda x : pd.to_datetime(x[:10], format = '%m/%d/%Y'))

In [70]:
intake_data.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,date_income,year_income,month_income,year,month
0,A800561,,07/23/2019 08:52:00 AM,07/23/2019 08:52:00 AM,11209 Metric Blvd in Austin (TX),Stray,Injured,Dog,Neutered Male,6 years,Miniature Pinscher Mix,Black/Tan,2019-07-23,2019,7,2019,7
1,A800557,,07/23/2019 07:48:00 AM,07/23/2019 07:48:00 AM,1109 S Pleasant Valley in Austin (TX),Stray,Normal,Cat,Unknown,1 year,Domestic Shorthair,Black,2019-07-23,2019,7,2019,7
2,A800556,,07/23/2019 07:20:00 AM,07/23/2019 07:20:00 AM,4434 Frontier Trail in Austin (TX),Stray,Injured,Cat,Intact Female,1 year,Domestic Shorthair,Tortie,2019-07-23,2019,7,2019,7
3,A660391,,07/22/2019 06:54:00 PM,07/22/2019 06:54:00 PM,Pedernales And East 6Th Street in Austin (TX),Stray,Normal,Dog,Intact Male,7 years,Pit Bull Mix,Brown,2019-07-22,2019,7,2019,7
4,A739213,Zorro,07/22/2019 06:46:00 PM,07/22/2019 06:46:00 PM,Manor (TX),Public Assist,Normal,Dog,Intact Male,4 years,Australian Shepherd Mix,Black/Gray,2019-07-22,2019,7,2019,7


In [16]:
intake_data["year"] = intake_data.date_intake.map(lambda x : x.year)

In [17]:
intake_data["month"] = intake_data.date_intake.map(lambda x : x.month)

In [18]:
animals_merged = pd.merge(intake_data, animal_outcomes, on=['Animal ID', 'year'], how = 'left', suffixes = ('_intake', '_outcome'))

In [19]:
animals_merged.head()

Unnamed: 0,Animal ID,Name_intake,DateTime_intake,MonthYear_intake,Found Location,Intake Type,Intake Condition,Animal Type_intake,Sex upon Intake,Age upon Intake,...,Animal Type_outcome,Sex upon Outcome,Age upon Outcome,Breed_outcome,Color_outcome,date_outcome,dob,age_days,age_years,month_outcome
0,A800566,,07/23/2019 10:47:00 AM,07/23/2019 10:47:00 AM,6603 Mesa Drive in Austin (TX),Stray,Normal,Cat,Unknown,2 years,...,,,,,,NaT,NaT,,,
1,A800561,,07/23/2019 08:52:00 AM,07/23/2019 08:52:00 AM,11209 Metric Blvd in Austin (TX),Stray,Injured,Dog,Neutered Male,6 years,...,,,,,,NaT,NaT,,,
2,A800556,,07/23/2019 07:20:00 AM,07/23/2019 07:20:00 AM,4434 Frontier Trail in Austin (TX),Stray,Injured,Cat,Intact Female,1 year,...,,,,,,NaT,NaT,,,
3,A660391,,07/22/2019 06:54:00 PM,07/22/2019 06:54:00 PM,Pedernales And East 6Th Street in Austin (TX),Stray,Normal,Dog,Intact Male,7 years,...,,,,,,NaT,NaT,,,
4,A739213,Zorro,07/22/2019 06:46:00 PM,07/22/2019 06:46:00 PM,Manor (TX),Public Assist,Normal,Dog,Intact Male,4 years,...,,,,,,NaT,NaT,,,


In [79]:
df['days_in_shelter'] =(df.date_outcome-df.date_income).dt.days

In [80]:
df['days_in_shelter']

0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
5          NaN
6          0.0
7          NaN
8          NaN
9          NaN
10         NaN
11         NaN
12         NaN
13         NaN
14         NaN
15         NaN
16         NaN
17         NaN
18       -36.0
19         NaN
20         NaN
21         NaN
22         NaN
23         NaN
24         NaN
25         NaN
26         NaN
27         NaN
28         NaN
29         NaN
          ... 
119607     0.0
119608    10.0
119609     6.0
119610     1.0
119611    12.0
119612    47.0
119613     0.0
119614    34.0
119615     7.0
119616    13.0
119617    11.0
119618     0.0
119619     0.0
119620    18.0
119621    17.0
119622    91.0
119623     0.0
119624     0.0
119625     0.0
119626     0.0
119627     4.0
119628    23.0
119629     9.0
119630     0.0
119631     0.0
119632     0.0
119633     0.0
119634     0.0
119635     0.0
119636     0.0
Name: days_in_shelter, Length: 119637, dtype: float64

In [1]:
df1 = df[(df.days_in_shelter >=0) &\
                 (animal_shelter_df.year == 2018)].\
pivot_table(values='days_in_shelter',\
            index=['Animal Type_intake', 'rounded_age'],\
           columns = 'Outcome Type')

NameError: name 'df' is not defined