# More Pandas

![more_pandas](https://media.giphy.com/media/H0Qi5W2KzU5UI/giphy.gif)

### Scenario
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center over several years and use our pandas skills from the last lecture and learn some new ones in order to explore these data further.




#### Our goals in this notebook are to be able to: <br/>

- Apply and use `.map()` and `.applymap()` and `.apply()` from the Pandas library
- Briefly review lambda functions and use them in coordination with above functions
- Explain what a groupby object is and split a DataFrame using `.groupby()`

#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). 

Let's take a look at the data:

In [1]:
import numpy as np
import pandas as pd
import requests

url = 'https://data.austintexas.gov/resource/9t4d-g238.json'
response = requests.get(url)
animals = pd.DataFrame(response.json())
animals.head()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color
0,A817210,Serenity,2020-05-18T09:31:00.000,2020-05-18T09:31:00.000,2008-05-10T00:00:00.000,Euthanasia,Suffering,Cat,Neutered Male,12 years,Domestic Shorthair,White
1,A817506,,2020-05-17T16:12:00.000,2020-05-17T16:12:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Cat,Spayed Female,2 years,Siamese,White/Gray
2,A817505,,2020-05-17T15:36:00.000,2020-05-17T15:36:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Bird,Unknown,,Chicken,White/Brown
3,A817063,*Violeta,2020-05-17T15:30:00.000,2020-05-17T15:30:00.000,2019-05-07T00:00:00.000,Adoption,,Dog,Spayed Female,1 year,German Shepherd Mix,Brown/Black
4,A792391,Max,2020-05-17T15:29:00.000,2020-05-17T15:29:00.000,2016-05-15T00:00:00.000,Return to Owner,,Dog,Intact Male,4 years,American Bulldog Mix,White/Brown Brindle


![hive mind](https://media.giphy.com/media/l0MYttFGk98Y4e4h2/giphy.gif)


What kinds of questions can we ask these data and what kinds of information can we get back?
Start filling in the [group question doc](https://docs.google.com/document/d/15VFVdzx1-oFHuzal9xcNvXgr9QMtwbqIzpIzpzfzhFM/edit) together.  You can either add an individual question, or contribute to filling out another students question.  

# Quick Exploration

In [2]:
animals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
animal_id           1000 non-null object
name                625 non-null object
datetime            1000 non-null object
monthyear           1000 non-null object
date_of_birth       1000 non-null object
outcome_type        998 non-null object
outcome_subtype     762 non-null object
animal_type         1000 non-null object
sex_upon_outcome    1000 non-null object
age_upon_outcome    1000 non-null object
breed               1000 non-null object
color               1000 non-null object
dtypes: object(12)
memory usage: 93.9+ KB


In [3]:
animals.describe()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color
count,1000,625,1000,1000,1000,998,762,1000,1000,1000,1000,1000
unique,996,566,854,854,598,7,13,4,5,42,207,110
top,A815987,Milo,2020-03-18T13:36:00.000,2020-03-18T13:36:00.000,2020-01-24T00:00:00.000,Adoption,Foster,Dog,Neutered Male,2 years,Domestic Shorthair,Black/White
freq,2,4,7,7,12,386,344,594,284,207,184,115


In [4]:
# Let's apply a bit of what we learned on Friday and investigate na's
animals.isna().sum()

animal_id             0
name                375
datetime              0
monthyear             0
date_of_birth         0
outcome_type          2
outcome_subtype     238
animal_type           0
sex_upon_outcome      0
age_upon_outcome      0
breed                 0
color                 0
dtype: int64

In [5]:
animals['name'] = animals['name'].fillna('unnamed')

In [6]:
animals.fillna('no_type_or_subtype', inplace=True)

In [7]:
animals.isna().sum()

animal_id           0
name                0
datetime            0
monthyear           0
date_of_birth       0
outcome_type        0
outcome_subtype     0
animal_type         0
sex_upon_outcome    0
age_upon_outcome    0
breed               0
color               0
dtype: int64

### 1. Applying and using map and applymap from the Pandas library

The built in **map** operator takes a function and applies it to every element of an iterable

Map is an example of Python's implementation of [functional programming](https://docs.python.org/3/howto/functional.html), which we won't spend much lecture time on, except a brief description of the difference between OOP and functional.  In OOP, objects have changing state.  A DataFrame object has an attribute shape which outputs the number of rows and columns.  Dropping rows changes the underlying object and its attribute: the shape attribute has changed to reflect the reduced number of rows.  Functional programming relies on functions (like we've learned to write in Python) which take inputs and produce outputs, instead of relying on changing state of objects.  Some functional programming languages are Haskell and Lisp. 

In [8]:
def divisible_by_5(number):
    
    '''
    Parameter: an integer
    return numbers divisible by five
    '''
    
    if number % 5 == 0:
        return True
    else:
        return False

numbers = [17,29,30045, 125]

list(map(divisible_by_5, numbers))

[False, False, True, True]

The Pandas library has several similar methods associated with Dataframes and Series. Let's explore them.

# DataFrame.applymap(),  Series.apply(), and DataFrame.apply()

## DataFrame.applymap()
The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [9]:
def long_string(string):
    '''
    Parameter: a string
    returns: a boolean denoting whether it is 
    longer than 10 characters
    '''
    
    if len(string) > 10:
        return True
    else:
        return False

In [10]:
animals_strings = animals.applymap(str)
animals_strings.applymap(long_string)

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color
0,False,False,True,True,True,False,False,False,True,False,True,False
1,False,False,True,True,True,False,False,False,True,False,False,False
2,False,False,True,True,True,False,False,False,False,False,False,True
3,False,False,True,True,True,False,True,False,True,False,True,True
4,False,False,True,True,True,True,True,False,True,False,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...
995,False,False,True,True,True,False,True,False,True,False,True,True
996,False,False,True,True,True,False,True,False,True,False,False,True
997,False,False,True,True,True,False,True,False,False,False,False,False
998,False,False,True,True,True,False,True,False,True,False,True,False


## Series.apply()

The **.apply()** method takes a function as input that it will then apply to every entry in the Series.

Let's write a function to consolodate sex_upon_outcome to male, female, or unknown   

First, explore the unique values:

In [11]:
animals['sex_upon_outcome'].unique()

array(['Neutered Male', 'Spayed Female', 'Unknown', 'Intact Male',
       'Intact Female'], dtype=object)

In [12]:
# we could also us np.unique() with the return_counts parameter

np.unique(animals['sex_upon_outcome'], return_counts=True)


(array(['Intact Female', 'Intact Male', 'Neutered Male', 'Spayed Female',
        'Unknown'], dtype=object), array([142, 156, 284, 275, 143]))

In [13]:
def male_or_female(value_from_series):
    
    """
    This is a docstring...
    
    Parameter: a value from the age_upon_outcome series
    in the Austin Animal Shelter dataset.  
    There are five possible values:
    Spayed Female', 'Unknown', 'Intact Female', 'Intact Male',
       'Neutered Male'
       
    Returns:
    female, male, unknown
    
    """
    
    if 'female' in value_from_series.lower():
        return 'female'
    
    #Add a space before male to ensure that female is not included
    elif ' male' in value_from_series.lower():
        return 'male'
    
    else:
        return 'unknown'
    
    
animals['sex_upon_outcome'].apply(male_or_female)   

0         male
1       female
2      unknown
3       female
4         male
        ...   
995     female
996       male
997    unknown
998       male
999       male
Name: sex_upon_outcome, Length: 1000, dtype: object

In [14]:
# We can now make a new column, sex:
    
animals['sex'] = animals['sex_upon_outcome'].apply(male_or_female)
animals.head()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex
0,A817210,Serenity,2020-05-18T09:31:00.000,2020-05-18T09:31:00.000,2008-05-10T00:00:00.000,Euthanasia,Suffering,Cat,Neutered Male,12 years,Domestic Shorthair,White,male
1,A817506,unnamed,2020-05-17T16:12:00.000,2020-05-17T16:12:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Cat,Spayed Female,2 years,Siamese,White/Gray,female
2,A817505,unnamed,2020-05-17T15:36:00.000,2020-05-17T15:36:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Bird,Unknown,,Chicken,White/Brown,unknown
3,A817063,*Violeta,2020-05-17T15:30:00.000,2020-05-17T15:30:00.000,2019-05-07T00:00:00.000,Adoption,no_type_or_subtype,Dog,Spayed Female,1 year,German Shepherd Mix,Brown/Black,female
4,A792391,Max,2020-05-17T15:29:00.000,2020-05-17T15:29:00.000,2016-05-15T00:00:00.000,Return to Owner,no_type_or_subtype,Dog,Intact Male,4 years,American Bulldog Mix,White/Brown Brindle,male


Now let's have some fun.  Let's convert age upon outcome to days, using map():

In [15]:
# First, checkout what happens when we split on a space

list(animals['age_upon_outcome'].str.split(' '))

[['12', 'years'],
 ['2', 'years'],
 ['NULL'],
 ['1', 'year'],
 ['4', 'years'],
 ['4', 'years'],
 ['1', 'weeks'],
 ['12', 'years'],
 ['1', 'day'],
 ['9', 'years'],
 ['9', 'months'],
 ['1', 'year'],
 ['5', 'months'],
 ['1', 'year'],
 ['NULL'],
 ['5', 'years'],
 ['2', 'years'],
 ['2', 'years'],
 ['1', 'month'],
 ['3', 'days'],
 ['5', 'days'],
 ['5', 'days'],
 ['5', 'days'],
 ['2', 'years'],
 ['1', 'year'],
 ['1', 'year'],
 ['NULL'],
 ['NULL'],
 ['2', 'years'],
 ['1', 'year'],
 ['10', 'years'],
 ['3', 'months'],
 ['3', 'months'],
 ['1', 'year'],
 ['1', 'year'],
 ['2', 'years'],
 ['3', 'years'],
 ['1', 'year'],
 ['9', 'years'],
 ['1', 'year'],
 ['1', 'year'],
 ['4', 'years'],
 ['3', 'years'],
 ['1', 'year'],
 ['10', 'months'],
 ['6', 'years'],
 ['10', 'months'],
 ['2', 'years'],
 ['5', 'months'],
 ['2', 'months'],
 ['2', 'years'],
 ['3', 'years'],
 ['1', 'year'],
 ['4', 'days'],
 ['3', 'months'],
 ['5', 'months'],
 ['1', 'year'],
 ['6', 'months'],
 ['6', 'months'],
 ['2', 'years'],
 ['1', '

# Pair program #1: 
Take 5 minutes to fill in the function below with code that converts age upon outcome to days upon outcome.

In [16]:
# check what values we have for time frame
unit_values = [age[0] if age[0] == 'NULL' 
               else age[1] for age in 
               animals['age_upon_outcome'].str.split(' ')]
set(unit_values)

{'NULL', 'day', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'}

Now, fill in the definition below to convert the ages to days

In [21]:

def age_to_days(age):
    
    '''
    params: age upon outcome of shelter animal. 
    A number followed by a unit of time 
    'NULL', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'
    
    returns: days old at outcome
    '''
    
    age_split = age.split(' ')
    
    
    if len(age_split) == 1:
        return 'NULL'
    
    elif age_split[1] in ['month', 'months']:
        return int(age_split[0]) * 30
    
    elif age_split[1] in ['year', 'years']:
        return int(age_split[0]) * 365
    
    elif age_split[1] in ['week', 'weeks']:
        return int(age_split[0]) * 7
                   
    else:
        return int(age_split[0])
    
    
    
animals['days_upon_outcome'] = animals['age_upon_outcome'].map(age_to_days)

animals.head()


Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome
0,A817210,Serenity,2020-05-18T09:31:00.000,2020-05-18T09:31:00.000,2008-05-10T00:00:00.000,Euthanasia,Suffering,Cat,Neutered Male,12 years,Domestic Shorthair,White,male,4380.0
1,A817506,unnamed,2020-05-17T16:12:00.000,2020-05-17T16:12:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Cat,Spayed Female,2 years,Siamese,White/Gray,female,730.0
2,A817505,unnamed,2020-05-17T15:36:00.000,2020-05-17T15:36:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Bird,Unknown,,Chicken,White/Brown,unknown,
3,A817063,*Violeta,2020-05-17T15:30:00.000,2020-05-17T15:30:00.000,2019-05-07T00:00:00.000,Adoption,no_type_or_subtype,Dog,Spayed Female,1 year,German Shepherd Mix,Brown/Black,female,365.0
4,A792391,Max,2020-05-17T15:29:00.000,2020-05-17T15:29:00.000,2016-05-15T00:00:00.000,Return to Owner,no_type_or_subtype,Dog,Intact Male,4 years,American Bulldog Mix,White/Brown Brindle,male,1460.0


## DataFrame.apply()
DataFrame.apply() takes a function as a parameter, and applies it every element of an axis.  It is especially useful if we want to use logic that compares multiple column values.

In [22]:
animals.head()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome
0,A817210,Serenity,2020-05-18T09:31:00.000,2020-05-18T09:31:00.000,2008-05-10T00:00:00.000,Euthanasia,Suffering,Cat,Neutered Male,12 years,Domestic Shorthair,White,male,4380.0
1,A817506,unnamed,2020-05-17T16:12:00.000,2020-05-17T16:12:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Cat,Spayed Female,2 years,Siamese,White/Gray,female,730.0
2,A817505,unnamed,2020-05-17T15:36:00.000,2020-05-17T15:36:00.000,2018-05-17T00:00:00.000,Euthanasia,Suffering,Bird,Unknown,,Chicken,White/Brown,unknown,
3,A817063,*Violeta,2020-05-17T15:30:00.000,2020-05-17T15:30:00.000,2019-05-07T00:00:00.000,Adoption,no_type_or_subtype,Dog,Spayed Female,1 year,German Shepherd Mix,Brown/Black,female,365.0
4,A792391,Max,2020-05-17T15:29:00.000,2020-05-17T15:29:00.000,2016-05-15T00:00:00.000,Return to Owner,no_type_or_subtype,Dog,Intact Male,4 years,American Bulldog Mix,White/Brown Brindle,male,1460.0


In [24]:
dog_days_std = animals.groupby('animal_type').std().loc['Dog']
dog_days_mean = animals.groupby('animal_type').mean().loc['Dog']
dog_days_mean


KeyError: 'animal_type'

In [None]:
dog_days_mean = animals.groupby('animal_type').mean().loc['Dog']
dog_days_std = animals.groupby('animal_type').std().loc['Dog']
(animals.days_upon_outcome > int(dog_days_mean + dog_days_std*2)).value_counts()

In [27]:
# let's make a boolean column that crease a boolean for old dogs that get adopted 

def old_dogs_adopted(row):

    '''
    Parameter: Row from the Austin Animal Shelter
    Returns: Boolean signifying records of old dogs that were adopted
    '''
    
    if (row['animal_type'] == 'Dog')\
            and (row['outcome_type'] =='Adoption')\
            and (row['days_upon_outcome'] > 2000):  #int(dog_days_mean + dog_days_std*2) this is what he had initially 
        return True

    else:
        return False

animals['old_adopted_dogs'] = animals.apply(old_dogs_adopted, axis = 1)  # axis = 1 means every row

In [29]:
animals[animals.old_adopted_dogs == True]

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome,old_adopted_dogs
30,A815992,*Sadie,2020-05-16T13:18:00.000,2020-05-16T13:18:00.000,2010-04-02T00:00:00.000,Adoption,Foster,Dog,Spayed Female,10 years,Labrador Retriever/German Shepherd,Tan,female,3650,True
38,A813497,Tucker,2020-05-16T10:36:00.000,2020-05-16T10:36:00.000,2011-02-11T00:00:00.000,Adoption,Foster,Dog,Intact Male,9 years,English Coonhound,Black,male,3285,True
45,A723494,Banksy,2020-05-15T12:41:00.000,2020-05-15T12:41:00.000,2014-04-02T00:00:00.000,Adoption,Foster,Dog,Spayed Female,6 years,Labrador Retriever Mix,Black/White,female,2190,True
78,A781510,Diesel,2020-05-13T12:13:00.000,2020-05-13T12:13:00.000,2013-09-30T00:00:00.000,Adoption,Foster,Dog,Neutered Male,6 years,Boxer Mix,Brown/White,male,2190,True
199,A754762,Ivy,2020-05-05T09:19:00.000,2020-05-05T09:19:00.000,2013-07-24T00:00:00.000,Adoption,Foster,Dog,Spayed Female,6 years,German Shepherd,Black/Brown,female,2190,True
200,A815198,*Donut,2020-05-05T09:02:00.000,2020-05-05T09:02:00.000,2010-04-06T00:00:00.000,Adoption,Foster,Dog,Spayed Female,10 years,Rat Terrier Mix,Tricolor,female,3650,True
226,A814790,*Lebron,2020-05-02T11:51:00.000,2020-05-02T11:51:00.000,2013-03-06T00:00:00.000,Adoption,Foster,Dog,Neutered Male,7 years,Boxer Mix,Black/White,male,2555,True
227,A813842,Pixi,2020-05-02T10:56:00.000,2020-05-02T10:56:00.000,2011-02-19T00:00:00.000,Adoption,Foster,Dog,Spayed Female,9 years,Pembroke Welsh Corgi Mix,Tan,female,3285,True
242,A815300,Budda,2020-05-01T07:59:00.000,2020-05-01T07:59:00.000,2014-03-13T00:00:00.000,Adoption,Foster,Dog,Neutered Male,6 years,American Bulldog Mix,Tan/White,male,2190,True
272,A813383,Gibson,2020-04-29T10:29:00.000,2020-04-29T10:29:00.000,2012-11-09T00:00:00.000,Adoption,Foster,Dog,Neutered Male,7 years,Catahoula Mix,Tan/White,male,2555,True


## Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [33]:
# Here is an example of a lambda function that splits off the number from the animal_id.
animals.animal_id.apply(lambda x: int(x.split('A')[1]))

0      817210
1      817506
2      817505
3      817063
4      792391
        ...  
995    815316
996    815171
997    814785
998    810250
999    801585
Name: animal_id, Length: 1000, dtype: int64

### Student Screen Share (without answer directly below)
Use a lambda function to convert days days upon outcome to weeks upon outcome <br>


In [34]:
np.random.seed(42)
student_list = ['Amanda', 'Chum', 'Dann', 'Jacob', 'Jason', 'Johnhoy',  'Matt', 
'Maximilian', 'Adam', 'Ethan', 'Karim', 'Leana', 'Luluva']
np.random.choice(student_list)

'Matt'

In [45]:
# your code here
animals_no_null = animals[animals.days_upon_outcome != 'NULL']
animals_no_null['days_upon_outcome'].apply(lambda x: int(x)/7)

0      625.714286
1      104.285714
3       52.142857
4      208.571429
5      208.571429
          ...    
995      8.571429
996      4.285714
997    104.285714
998     52.142857
999     52.142857
Name: days_upon_outcome, Length: 977, dtype: float64

## Methods for Re-Organizing DataFrames: .groupby()

Those of you familiar with SQL have probably used the GROUP BY command. (And if you haven't, you'll see it very soon!) Pandas has this, too.

The .groupby() method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [46]:
animals.groupby('animal_type')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fe4beb8>

Notice the object type [DataFrameGroupBy](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) object. 

We can group by multiple columns, and also return a DataFrameGroupBy object

In [47]:
animals.groupby(['animal_type', 'outcome_type'])

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fe4bd30>

#### .groups and .get_group()

In [48]:
# This retuns each group indexed by the group name: I.E. 'Bird', along with the row indices of each value
animals.groupby('animal_type').groups

{'Bird': Int64Index([2, 66, 407, 721, 722], dtype='int64'),
 'Cat': Int64Index([  0,   1,   6,   8,  11,  13,  15,  18,  19,  20,
             ...
             917, 918, 945, 946, 981, 982, 984, 985, 991, 992],
            dtype='int64', length=272),
 'Dog': Int64Index([  3,   4,   5,   7,   9,  10,  12,  17,  24,  25,
             ...
             983, 986, 987, 988, 989, 990, 993, 994, 995, 996],
            dtype='int64', length=594),
 'Other': Int64Index([ 14,  16,  39,  69,  79, 123, 137, 152, 153, 183,
             ...
             955, 961, 975, 976, 977, 978, 979, 997, 998, 999],
            dtype='int64', length=129)}

Once we know we are working with a type of object, it opens up a suite of attributes and methods. One attribute we can look at is groups.

In [51]:
animals.groupby('animal_type').get_group('Bird').shape

(5, 15)

In [None]:
# Once we know the group indices, we can return the groups using those indices.

#### Groupby Methods and Aggregating

In [52]:
# Same goes for multi index groupbys
animal_outcome = animals.groupby(['animal_type', 'outcome_type'])
animal_outcome.groups


{('Bird', 'Adoption'): Int64Index([407, 721, 722], dtype='int64'),
 ('Bird', 'Euthanasia'): Int64Index([2, 66], dtype='int64'),
 ('Cat',
  'Adoption'): Int64Index([ 48,  57,  58,  65,  86, 104, 107, 109, 133, 139, 182, 185, 195,
             202, 203, 205, 208, 236, 238, 239, 240, 241, 243, 252, 253, 254,
             257, 280, 301, 302, 321, 323, 328, 333, 346, 347, 354, 357, 364,
             365, 369, 383, 389, 416, 417, 418, 419, 420, 421, 431, 432, 434,
             436, 482, 486, 487, 492, 493, 498, 499, 500, 503, 512, 513, 517,
             539, 550, 561, 572, 589, 613, 615, 619, 620, 624, 634, 638, 640,
             647, 649, 650, 656, 659, 684, 685, 686, 694, 702, 703, 704, 705,
             719, 727, 752, 753, 757, 790, 842, 913, 992],
            dtype='int64'),
 ('Cat', 'Died'): Int64Index([11, 53, 197, 264, 551, 714], dtype='int64'),
 ('Cat', 'Disposal'): Int64Index([63, 814], dtype='int64'),
 ('Cat',
  'Euthanasia'): Int64Index([  0,   1,  26,  27,  73,  96, 136, 171, 210

In [53]:
# animal_outcome.groups is a dictionary, so we can access the group names using keys()
animal_outcome.groups.keys()

dict_keys([('Bird', 'Adoption'), ('Bird', 'Euthanasia'), ('Cat', 'Adoption'), ('Cat', 'Died'), ('Cat', 'Disposal'), ('Cat', 'Euthanasia'), ('Cat', 'Return to Owner'), ('Cat', 'Rto-Adopt'), ('Cat', 'Transfer'), ('Cat', 'no_type_or_subtype'), ('Dog', 'Adoption'), ('Dog', 'Died'), ('Dog', 'Euthanasia'), ('Dog', 'Return to Owner'), ('Dog', 'Rto-Adopt'), ('Dog', 'Transfer'), ('Other', 'Adoption'), ('Other', 'Died'), ('Other', 'Disposal'), ('Other', 'Euthanasia'), ('Other', 'Transfer'), ('Other', 'no_type_or_subtype')])

In [54]:
# We can then get a specific group, such as Cats that were adopted
animal_outcome.get_group(('Cat', 'Adoption'))

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome,old_adopted_dogs
48,A813092,*Charmin,2020-05-15T10:52:00.000,2020-05-15T10:52:00.000,2019-12-11T00:00:00.000,Adoption,Foster,Cat,Spayed Female,5 months,Domestic Shorthair,Lynx Point,female,150,False
57,A812578,*Frida,2020-05-14T10:37:00.000,2020-05-14T10:37:00.000,2019-10-24T00:00:00.000,Adoption,Foster,Cat,Spayed Female,6 months,Domestic Shorthair,Brown Tabby,female,180,False
58,A812579,*Diego,2020-05-14T10:37:00.000,2020-05-14T10:37:00.000,2019-10-24T00:00:00.000,Adoption,Foster,Cat,Neutered Male,6 months,Domestic Shorthair,Blue/White,male,180,False
65,A801639,*Jambalaya,2020-05-14T08:55:00.000,2020-05-14T08:55:00.000,2019-06-07T00:00:00.000,Adoption,Foster,Cat,Neutered Male,11 months,Domestic Shorthair,Orange Tabby,male,330,False
86,A776428,*Ms. Kitty,2020-05-12T15:49:00.000,2020-05-12T15:49:00.000,2018-06-13T00:00:00.000,Adoption,Foster,Cat,Spayed Female,1 year,Domestic Medium Hair,Tortie,female,365,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
757,A814762,Paisley,2020-03-25T08:08:00.000,2020-03-25T08:08:00.000,2012-03-12T00:00:00.000,Adoption,Foster,Cat,Spayed Female,8 years,Domestic Shorthair,Calico,female,2920,False
790,A807933,*Benny,2020-03-24T14:19:00.000,2020-03-24T14:19:00.000,2019-06-01T00:00:00.000,Adoption,Foster,Cat,Neutered Male,9 months,Domestic Shorthair,Brown Tabby,male,270,False
842,A796479,Kate,2020-03-21T09:08:00.000,2020-03-21T09:08:00.000,2017-06-02T00:00:00.000,Adoption,Foster,Cat,Spayed Female,2 years,Domestic Longhair,Blue Cream,female,730,False
913,A801170,*Clarence,2020-03-18T07:33:00.000,2020-03-18T07:33:00.000,2011-08-01T00:00:00.000,Adoption,Foster,Cat,Neutered Male,8 years,Domestic Shorthair,Black/White,male,2920,False


In [55]:
# Other methods
animal_outcome.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,animal_id,name,datetime,monthyear,date_of_birth,outcome_subtype,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome,old_adopted_dogs
animal_type,outcome_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Bird,Adoption,A816196,unnamed,2020-04-16T11:37:00.000,2020-04-16T11:37:00.000,2019-04-09T00:00:00.000,Foster,Unknown,1 year,Parakeet,Green/Yellow,unknown,365.0,False
Bird,Euthanasia,A817505,unnamed,2020-05-17T15:36:00.000,2020-05-17T15:36:00.000,2018-05-17T00:00:00.000,Suffering,Unknown,,Chicken,White/Brown,unknown,,False
Cat,Adoption,A813092,*Charmin,2020-05-15T10:52:00.000,2020-05-15T10:52:00.000,2019-12-11T00:00:00.000,Foster,Spayed Female,5 months,Domestic Shorthair,Lynx Point,female,150.0,False
Cat,Died,A817198,*Moon,2020-05-17T13:04:00.000,2020-05-17T13:04:00.000,2019-05-09T00:00:00.000,In Foster,Intact Female,1 year,Siamese,Blue Point,female,365.0,False
Cat,Disposal,A817348,Bat Girl,2020-05-14T10:08:00.000,2020-05-14T10:08:00.000,2017-05-14T00:00:00.000,no_type_or_subtype,Intact Female,3 years,Domestic Shorthair,Brown/Black,female,1095.0,False
Cat,Euthanasia,A817210,Serenity,2020-05-18T09:31:00.000,2020-05-18T09:31:00.000,2008-05-10T00:00:00.000,Suffering,Neutered Male,12 years,Domestic Shorthair,White,male,4380.0,False
Cat,Return to Owner,A817336,unnamed,2020-05-17T11:20:00.000,2020-05-17T11:20:00.000,2015-05-13T00:00:00.000,no_type_or_subtype,Spayed Female,5 years,Domestic Shorthair,Gray Tabby,female,1825.0,False
Cat,Rto-Adopt,A816636,Sugar,2020-05-17T12:03:00.000,2020-05-17T12:03:00.000,2019-05-14T00:00:00.000,no_type_or_subtype,Neutered Male,1 year,Domestic Shorthair,Brown Tabby,male,365.0,False
Cat,Transfer,A817482,unnamed,2020-05-17T14:49:00.000,2020-05-17T14:49:00.000,2020-05-05T00:00:00.000,Partner,Intact Female,1 weeks,Domestic Shorthair,Calico,female,7.0,False
Cat,no_type_or_subtype,A816753,*Marx,2020-04-26T16:23:00.000,2020-04-26T16:23:00.000,2020-03-26T00:00:00.000,no_type_or_subtype,Intact Male,,Domestic Medium Hair,Black/White,male,,False


In [56]:
animal_outcome.last()

Unnamed: 0_level_0,Unnamed: 1_level_0,animal_id,name,datetime,monthyear,date_of_birth,outcome_subtype,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome,old_adopted_dogs
animal_type,outcome_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Bird,Adoption,A815400,Midori,2020-03-27T06:57:00.000,2020-03-27T06:57:00.000,2019-03-15T00:00:00.000,Foster,Unknown,1 year,Parakeet,Green/Yellow,unknown,365.0,False
Bird,Euthanasia,A817345,unnamed,2020-05-13T17:37:00.000,2020-05-13T17:37:00.000,2019-05-13T00:00:00.000,Suffering,Unknown,1 year,Dove,Gray/White,unknown,365.0,False
Cat,Adoption,A815356,Butters,2020-03-15T17:37:00.000,2020-03-15T17:37:00.000,2019-09-14T00:00:00.000,no_type_or_subtype,Neutered Male,6 months,Siamese,Flame Point,male,180.0,False
Cat,Died,A815765,unnamed,2020-03-27T09:52:00.000,2020-03-27T09:52:00.000,2018-03-25T00:00:00.000,In Kennel,Intact Male,2 years,Domestic Shorthair,Brown Tabby,male,730.0,False
Cat,Disposal,A815716,unnamed,2020-03-24T09:21:00.000,2020-03-24T09:21:00.000,2017-03-23T00:00:00.000,no_type_or_subtype,Intact Male,3 years,Domestic Longhair,Gray Tabby,male,1095.0,False
Cat,Euthanasia,A815338,Beast,2020-03-18T15:11:00.000,2020-03-18T15:11:00.000,2017-03-14T00:00:00.000,Suffering,Neutered Male,3 years,Domestic Shorthair,Orange Tabby,male,1095.0,False
Cat,Return to Owner,A815409,Milo,2020-03-15T17:44:00.000,2020-03-15T17:44:00.000,2010-03-15T00:00:00.000,no_type_or_subtype,Neutered Male,10 years,Domestic Shorthair,Orange Tabby,male,3650.0,False
Cat,Rto-Adopt,A815029,Elena,2020-03-23T11:37:00.000,2020-03-23T11:37:00.000,2017-03-22T00:00:00.000,no_type_or_subtype,Spayed Female,3 years,Domestic Shorthair,Calico,female,1095.0,False
Cat,Transfer,A815438,unnamed,2020-03-15T18:28:00.000,2020-03-15T18:28:00.000,2020-03-12T00:00:00.000,Partner,Intact Female,3 days,Domestic Shorthair,Brown Tabby,female,3.0,False
Cat,no_type_or_subtype,A816753,*Marx,2020-04-26T16:23:00.000,2020-04-26T16:23:00.000,2020-03-26T00:00:00.000,no_type_or_subtype,Intact Male,,Domestic Medium Hair,Black/White,male,,False


In [59]:
animals.groupby('animal_type').mean()  #this is meaningless

Unnamed: 0_level_0,old_adopted_dogs
animal_type,Unnamed: 1_level_1
Bird,0.0
Cat,0.0
Dog,0.065657
Other,0.0


Once again, as we will see in SQL, in order to return a groupby objects are intended to be used with aggregation. In SQL, we will see that our queries that include GROUP BY require aggregation performed on columns.

We can use sum, mean, count, max, min... Find a list of common aggregations [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)

In [60]:
# Group by multiple indices
animals.groupby(['animal_type', 'outcome_type']).get_group(('Cat', 'Transfer')).describe()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,sex,days_upon_outcome,old_adopted_dogs
count,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110
unique,110,26,68,68,63,1,3,1,5,23,7,30,3,22,1
top,A815085,unnamed,2020-04-25T17:17:00.000,2020-04-25T17:17:00.000,2020-04-10T00:00:00.000,Transfer,Partner,Cat,Intact Female,2 years,Domestic Shorthair,Brown Tabby,female,730,False
freq,1,85,5,5,6,110,99,110,40,21,82,14,51,21,110
