# More Pandas

![more_pandas](https://media.giphy.com/media/H0Qi5W2KzU5UI/giphy.gif)

### Scenario
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center.  The code below will return the last 1000 animal outcomes that have occurred.  We will use our pandas skills from the last lecture and learn some new ones in order to explore these data further.




#### Our goals in this notebook are to be able to: <br/>

- Apply and use `.map()`, `apply()`, and `.applymap()` from the Pandas library
- Introduce lambda functions and use them in coordination with above functions
- Explain what a groupby object is and split a DataFrame using `.groupby()`


#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). 

Let's take a look at the data:

In [None]:
import numpy as np
import pandas as pd
import requests

%load_ext autoreload
%autoreload 2

from src.student_caller import one_random_student, three_random_students
from src.student_list import student_list

In [None]:
url = 'https://data.austintexas.gov/resource/9t4d-g238.json'
response = requests.get(url)
animals = pd.DataFrame(response.json())
animals.head()

In [None]:
animals.info()

One way to become familiar with your data is to start asking questions. In your EDA notebooks, **markdown** will be especially helpful in tracking these questions and your methods of answering the questions.  

For example, a simple first question we might ask, after being presented with the above dataset, would be:

## What is the most commonly adopted animal type in the dataset?

We can then begin thinking about what parts of the DataFrame we need to answer the question.

    What features do we need?
     - 
    What type of logic and calculation do we perform?
     -  
    What type of visualization would help us answer the question?
     -

In [None]:
# Your code here

Questions lead to other questions. For the above example, the visualization begs the question, what Other animals are being adopted?

To find out, we need to know where the type of animal for Other is encoded.   
    
    What features do we need to answer what the most commonly adopted type of animal within the Other category is?
        - 

In [None]:
# Your code here

![hive mind](https://media.giphy.com/media/l0MYttFGk98Y4e4h2/giphy.gif)


What kinds of questions can we ask these data and what kinds of information can we get back?
Start filling in the [group question doc](https://docs.google.com/document/d/1Oq9cHGbKxKzvO9Ep_JAxrRWrLFlTEn0VpJVqEGXNdUQ/edit) together.  You can either add an individual question, or contribute to filling out another students question.  

# Quick Exploration

In [None]:
# Use info to check for na's, datatypes, and shape

In [None]:
# Use describe to gain a bit more detail about certain features. 


In [None]:
# Use value counts to check a categorical feature's distribution

In [None]:
# Use isna() for a more legible output (than .info()) of na distributions of our dataset.

Use fillna to fill animals with no name to 'unnamed'

In [None]:
three_random_students(student_list)

In [None]:
#__SOLUTION__
animals['name'] = animals['name'].fillna('unnamed')

In [None]:
animals.fillna('no_type_or_subtype', inplace=True)

In [None]:
animals.isna().sum()

### 1. Applying and using map and applymap from the Pandas library

The built in **map** operator takes a function and applies it to every element of an iterable

In [None]:
def divisible_by_5(number):
    
    '''
    Parameter: an integer
    return numbers divisible by five
    '''
    
    if number % 5 == 0:
        return True
    else:
        return False

numbers = [17,29,30045, 125]

list(map(divisible_by_5, numbers))


The Pandas library has several similar methods associated with Dataframes and Series. Let's explore them.

# DataFrame.applymap(), Series.map()  Series.apply()

## DataFrame.applymap()
The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [None]:
animals.applymap(type)

# Series.map()

The **.map()** method takes a function as input that it will then apply to every entry in the Series.

Let's map a ternary class set to consolodate sex_upon_outcome to male, female, or unknown   

First, explore the unique values:

In [None]:
animals['sex_upon_outcome'].unique()

In [None]:
# we could also use np.unique() with the return_counts parameter

np.unique(animals['sex_upon_outcome'], return_counts=True)


In [None]:
# Your code here

# Series.apply()

Series.apply() is similar to .map, except it only takes a function as a parameter, whereas .map can take a list, dictionary, or function.  .apply() is meant for more complex functions.

Now let's define a custom function that converts all ages upon outcome to days, and create a new column with .apply():

In [None]:
# First, checkout what happens when we split on a space

list(animals['age_upon_outcome'].str.split(' '))

# Pair program #1: 
Take 10 minutes to fill in the function below with code that converts age upon outcome to days upon outcome.

In [None]:
# check what values we have for time frame
unit_values = [age[0] if age[0] == 'NULL' 
               else age[1] for age in 
               animals['age_upon_outcome'].str.split(' ')]
set(unit_values)

Now, fill in the definition below to convert the ages to days

In [None]:

def age_to_days(age):
    
    '''
    params: age upon outcome of shelter animal. 
    A number followed by a unit of time 
    'NULL', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'
    
    returns: days old at outcome
    '''
    
    age_split = age.split(' ')
    
    if len(age_split) == 1:
        return np.nan
    
    elif ... :
        return
    
    elif ... :
         pass
    
    elif ... :
         pass
    
    else:
         pass
    
    
animals['age_upon_outcome'].apply(age_to_days)


In [None]:
# Let's look at the average age upon outcome of Adopted animals


### Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [None]:
student_list

In [None]:
list(map(lambda x: x + ' is '  + 
                    np.random.choice(['hungry', 'sleepy', 'hangry', 
                                      'super pumped about list comprehensions'],
                                     p=[.325,.325,.325,.025]), 
                 student_list))

# Student Screen Share (without answer directly below)
Use another lambda function to convert days days upon outcome to weeks upon outcome <br>


In [None]:
# Your code here

# Methods for Re-Organizing DataFrames: .groupby()

Those of you familiar with SQL have probably used the GROUP BY command. (And if you haven't, you'll see it very soon!) Pandas has this, too.

The .groupby() method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [None]:
animals.groupby('animal_type').mean()

Notice the object type [DataFrameGroupBy](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) object. 

#### .groups and .get_group()

In [None]:
animals.groupby(['animal_type', 'outcome_type'])

In [None]:
# This retuns each group indexed by the group name: I.E. 'Bird', along with the row indices of each value
animals.groupby('animal_type').groups

Once we know we are working with a type of object, it opens up a suite of attributes and methods. One attribute we can look at is groups.

In [None]:
animals.groupby('animal_type').get_group('Dog')

We can group by multiple columns, and also return a DataFrameGroupBy object

In [None]:
animals.groupby(['animal_type', 'outcome_type'])

In [None]:
animals.groupby(['animal_type', 'outcome_type']).groups.keys()

#### Aggregating

In [None]:
# Just like with single axis groups, we can aggregate on multiple axis
animals.groupby(['animal_type', 'outcome_type']).mean()

In [None]:
# We can then get a specific group, such as Cats that were adopted
animals.groupby(['animal_type', 'outcome_type']).get_group(('Cat', 'Adoption'))

In [None]:
# Other methods
animals.groupby(['animal_type', 'outcome_type']).first()

In [None]:
animals.groupby(['animal_type', 'outcome_type']).last()