# More Pandas

![more_pandas](https://media.giphy.com/media/H0Qi5W2KzU5UI/giphy.gif)

### Introduction
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center over several years and use our pandas skills from the last lecture and learn some new ones in order to explore this data further.

#### Our goals in this notebook are to be able to: <br/>

- Apply and use `.map()` and `.applymap()` from the Pandas library
- Explain lambda functions and use them on a DataFrame
- Explain what a groupby object is and split a DataFrame using `.groupby()`


#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). What kinds of questions can we ask this data and what kinds of information can we get back?

Let's take a look at the data:

In [None]:
import numpy as np
import pandas as pd
import requests

url = 'https://data.austintexas.gov/resource/9t4d-g238.json'
response = requests.get(url)
animals = pd.DataFrame(response.json())

In [None]:
animals.isnull()

In [None]:
animals.isnull().sum()

In [None]:
#animals.fillna('null')

In [None]:
animals.fillna(np.nan)

### 1. Applying and using map and applymap from the Pandas library

The Pandas library has several useful tools built in. Let's explore some of them.

#### DataFrame.applymap() and Series.map()

The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [None]:
animals.applymap(str).head()

The **.map()** method takes a function as input that it will then apply to every entry in the Series.

Let's write a function to consolodate sex_upon_outcome to male, female, or unknown   

First, explore the unique values:

In [None]:
animals['sex_upon_outcome'].unique()

In [None]:
# we could also us np.unique() with the return_counts parameter

np.unique(animals['sex_upon_outcome'], return_counts=True)


In [None]:
def male_or_female(value_from_series):
    
    """
    This is a docstring...
    
    Parameter: a value from the age_upon_outcome series
    in the Austin Animal Shelter dataset.  
    There are five possible values:
    Spayed Female', 'Unknown', 'Intact Female', 'Intact Male',
       'Neutered Male'
       
    Returns:
    female, male, unknown
    
    """
    
    if 'female' in value_from_series.lower():
        return 'female'
    
    
    #Add a space before male to ensure that female is not included
    elif ' male' in value_from_series.lower():
        return 'male'
    
    else:
        return 'unknown'
    
    
animals['sex_upon_outcome'].map(male_or_female)   

In [None]:
#We can now make a new column, sex:
    
animals['sex'] = animals['sex_upon_outcome'].map(male_or_female)
animals.head()

Now let's have some fun.  Let's convert age upon outcome to days, using map():

In [None]:
# First, checkout what happens when we split on a space

list(animals['age_upon_outcome'].str.split(' '))

In [None]:
# check what values we have for time frame
unit_values = [age[0] if age[0] == 'NULL' else age[1] for age in animals['age_upon_outcome'].str.split(' ')]
set(unit_values)

Now, fill in the definition below to convert the ages to days

In [None]:

def age_to_days(age):
    
    '''
    params: age upon outcome of shelter animal. 
    A number followed by a unit of time 
    'NULL', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'
    
    returns: days old at outcome
    '''
    
    age_split = age.split(' ')
    
    if len(age_split) == 1:
        return 'NULL'
    
    elif ... :
        return
    
    elif ... :
         pass
    
    elif ... :
         pass
    
    else:
         pass
    
    
animals['age_upon_outcome'].map(age_to_days)


In [None]:
#__SOLUTION__
def age_to_days(age):
    
    '''
    params: age upon outcome of shelter animal. 
    A number followed by a unit of time 
    'NULL', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'
    
    returns: days old at outcome
    '''
    
    age_split = age.split(' ')
    
    if len(age_split) == 1:
        return np.nan
    
    elif age_split[1] == 'days':
        return int(age_split[0])
    
    elif 'month' in age_split[1]:
        return 30*int(age_split[0])
    
    elif 'week' in age_split[0]:
        return 7 * int(age_split[0])
    
    else:
        return 365 * int(age_split[0])

animals['days_at_outcome'] = animals['age_upon_outcome'].map(age_to_days)

animals.head()

#### Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

Let's use an anonymous function to convert days to weeks

In [None]:
animals['days_at_outcome'].map(lambda x: x/7)

# Your turn
Use another lambda function to convert days to dog years <br>
1 dy = 7 hy
Use round() to round to the 

In [None]:
# your code here

In [None]:
#__SOLUTION__

animals['days_at_outcome'].map(lambda x: np.floor(x/365) * 7)

In [None]:
How old is the oldest dog?

In [None]:
How old is the oldest dog

In [None]:
#__SOLUTION__

animals['days_at_outcome'].map(lambda x: np.floor(x/365) * 7).max()

### 2. Methods for Re-Organizing DataFrames: .groupby()

Those of you familiar with SQL have probably used the GROUP BY command. (And if you haven't, you'll see it very soon!) Pandas has this, too.

The .groupby() method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [None]:
animals.groupby('animal_type')

#### .groups and .get_group()

In [None]:
# This retuns each group indexed by the group name: I.E. 'Bird', along with the row indices of each value
animals.groupby('animal_type').groups

In [None]:
animals.groupby('animal_type').get_group('Dog')

#### Aggregating

In [None]:
animals.groupby('animal_type').mean()

#### Datetime Objects

'Datetime' is a special data type for dates. And we can convert an appropriately formatted variable to the datetime type simply by calling `pd.to_datetime()`.

In [None]:
animals['date_of_birth'] = pd.to_datetime(animals['date_of_birth'])

One great thing about converting to date time is it allows you to easily access each part of the date.

In [None]:
# Switch day to month to year and see what happens
[date.day for date in animals['date_of_birth']]

**Exercise: Find the latest date of birth per animal type.**

In [None]:
# First make sure date_of_birth is a datetime type.
# Then group by Animal Type and calculate the max.




In [None]:
#__SOLUTION__
animals.groupby('animal_type')['date_of_birth'].max()