# More Pandas

![more_pandas](https://media.giphy.com/media/H0Qi5W2KzU5UI/giphy.gif)

### Scenario
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center over several years and use our pandas skills from the last lecture and learn some new ones in order to explore these data further.




#### Our goals in this notebook are to be able to: <br/>

- Apply and use `.map()` and `.applymap()` and `.apply()` from the Pandas library
- Briefly review lambda functions and use them in coordination with above functions
- Explain what a groupby object is and split a DataFrame using `.groupby()`

#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). 

Let's take a look at the data:

In [None]:
import numpy as np
import pandas as pd
import requests

url = 'https://data.austintexas.gov/resource/9t4d-g238.json'
response = requests.get(url)
animals = pd.DataFrame(response.json())
animals.head()

![hive mind](https://media.giphy.com/media/l0MYttFGk98Y4e4h2/giphy.gif)


What kinds of questions can we ask these data and what kinds of information can we get back?
Start filling in the [group question doc](https://docs.google.com/document/d/15VFVdzx1-oFHuzal9xcNvXgr9QMtwbqIzpIzpzfzhFM/edit) together.  You can either add an individual question, or contribute to filling out another students question.  

# Quick Exploration

In [None]:
animals.info()

In [None]:
animals.describe()

In [None]:
# Let's apply a bit of what we learned on Friday and investigate na's
animals.isna().sum()

In [None]:
animals['name'] = animals['name'].fillna('unnamed')

In [None]:
animals.fillna('no_type_or_subtype', inplace=True)

In [None]:
animals.isna().sum()

### 1. Applying and using map and applymap from the Pandas library

The built in **map** operator takes a function and applies it to every element of an iterable

Map is an example of Python's implementation of [functional programming](https://docs.python.org/3/howto/functional.html), which we won't spend much lecture time on, except a brief description of the difference between OOP and functional.  In OOP, objects have changing state.  A DataFrame object has an attribute shape which outputs the number of rows and columns.  Dropping rows changes the underlying object and its attribute: the shape attribute has changed to reflect the reduced number of rows.  Functional programming relies on functions (like we've learned to write in Python) which take inputs and produce outputs, instead of relying on changing state of objects.  Some functional programming languages are Haskell and Lisp. 

In [None]:
def divisible_by_5(number):
    
    '''
    Parameter: an integer
    return numbers divisible by five
    '''
    
    if number % 5 == 0:
        return True
    else:
        return False

numbers = [17,29,30045, 125]

list(map(divisible_by_5, numbers))

The Pandas library has several similar methods associated with Dataframes and Series. Let's explore them.

# DataFrame.applymap(),  Series.apply(), and DataFrame.apply()

## DataFrame.applymap()
The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [None]:
def long_string(string):
    '''
    Parameter: a string
    returns: a boolean denoting whether it is 
    longer than 10 characters
    '''
    
    if len(string) > 10:
        return True
    else:
        return False

In [None]:
animals_strings = animals.applymap(str)
animals_strings.applymap(long_string)

## Series.apply()

The **.apply()** method takes a function as input that it will then apply to every entry in the Series.

Let's write a function to consolodate sex_upon_outcome to male, female, or unknown   

First, explore the unique values:

In [None]:
animals['sex_upon_outcome'].unique()

In [None]:
# we could also us np.unique() with the return_counts parameter

np.unique(animals['sex_upon_outcome'], return_counts=True)


In [None]:
def male_or_female(value_from_series):
    
    """
    This is a docstring...
    
    Parameter: a value from the age_upon_outcome series
    in the Austin Animal Shelter dataset.  
    There are five possible values:
    Spayed Female', 'Unknown', 'Intact Female', 'Intact Male',
       'Neutered Male'
       
    Returns:
    female, male, unknown
    
    """
    
    if 'female' in value_from_series.lower():
        return 'female'
    
    #Add a space before male to ensure that female is not included
    elif ' male' in value_from_series.lower():
        return 'male'
    
    else:
        return 'unknown'
    
    
animals['sex_upon_outcome'].apply(male_or_female)   

In [None]:
# We can now make a new column, sex:
    
animals['sex'] = animals['sex_upon_outcome'].apply(male_or_female)
animals.head()

Now let's have some fun.  Let's convert age upon outcome to days, using map():

In [None]:
# First, checkout what happens when we split on a space

list(animals['age_upon_outcome'].str.split(' '))

# Pair program #1: 
Take 5 minutes to fill in the function below with code that converts age upon outcome to days upon outcome.

In [None]:
# check what values we have for time frame
unit_values = [age[0] if age[0] == 'NULL' 
               else age[1] for age in 
               animals['age_upon_outcome'].str.split(' ')]
set(unit_values)

Now, fill in the definition below to convert the ages to days

In [None]:

def age_to_days(age):
    
    '''
    params: age upon outcome of shelter animal. 
    A number followed by a unit of time 
    'NULL', 'days', 'month', 'months', 'week', 'weeks', 'year', 'years'
    
    returns: days old at outcome
    '''
    
    age_split = age.split(' ')
    
    if len(age_split) == 1:
        return 'NULL'
    
    elif ... :
        return
    
    elif ... :
         pass
    
    elif ... :
         pass
    
    else:
         pass
    
    
animals['age_upon_outcome'].map(age_to_days)


## DataFrame.apply()
DataFrame.apply() takes a function as a parameter, and applies it every element of an axis.  It is especially useful if we want to use logic that compares multiple column values.

In [None]:
animals.head()

In [None]:
dog_days_std = animals.groupby('animal_type').std().loc['Dog']
dog_days_mean = animals.groupby('animal_type').mean().loc['Dog']
dog_days_mean

In [None]:
dog_days_mean = animals.groupby('animal_type').mean().loc['Dog']
dog_days_std = animals.groupby('animal_type').std().loc['Dog']
(animals.days_upon_outcome > int(dog_days_mean + dog_days_std*2)).value_counts()

In [None]:
# let's make a boolean column that crease a boolean for old dogs that get adopted 

def old_dogs_adopted(row):

    '''
    Parameter: Row from the Austin Animal Shelter
    Returns: Boolean signifying records of old dogs that were adopted
    '''
    
    if (row['animal_type'] == 'Dog')\
            and (row['outcome_type'] =='Adoption')\
            and (row['days_upon_outcome'] > int(dog_days_mean + dog_days_std*2)):
        return True

    else:
        return False

animals['old_adopted_dogs'] = animals.apply(old_dogs_adopted, axis = 1)

In [None]:
animals[animals.old_adopted_dogs == True]

## Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [None]:
# Here is an example of a lambda function that splits off the number from the animal_id.
animals.animal_id.apply(lambda x: x.split('A')[1])

### Student Screen Share (without answer directly below)
Use a lambda function to convert days days upon outcome to weeks upon outcome <br>


In [None]:
np.random.seed(42)
student_list = ['Amanda', 'Chum', 'Dann', 'Jacob', 'Jason', 'Johnhoy',  'Matt', 
'Maximilian', 'Adam', 'Ethan', 'Karim', 'Leana', 'Luluva']
np.random.choice(student_list)

In [None]:
# your code here

## Methods for Re-Organizing DataFrames: .groupby()

Those of you familiar with SQL have probably used the GROUP BY command. (And if you haven't, you'll see it very soon!) Pandas has this, too.

The .groupby() method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [None]:
animals.groupby('animal_type')

Notice the object type [DataFrameGroupBy](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) object. 

We can group by multiple columns, and also return a DataFrameGroupBy object

In [None]:
animals.groupby(['animal_type', 'outcome_type'])

#### .groups and .get_group()

In [None]:
# This retuns each group indexed by the group name: I.E. 'Bird', along with the row indices of each value
animals.groupby('animal_type').groups

Once we know we are working with a type of object, it opens up a suite of attributes and methods. One attribute we can look at is groups.

In [None]:
animals.groupby('animal_type').get_group('Dog')

In [None]:
# Once we know the group indices, we can return the groups using those indices.

#### Groupby Methods and Aggregating

In [None]:
# Same goes for multi index groupbys
animal_outcome = animals.groupby(['animal_type', 'outcome_type'])
animal_outcome.groups


In [None]:
# animal_outcome.groups is a dictionary, so we can access the group names using keys()
animal_outcome.groups.keys()

In [None]:
# We can then get a specific group, such as Cats that were adopted
animal_outcome.get_group(('Cat', 'Adoption'))

In [None]:
# Other methods
animal_outcome.first()

In [None]:
animal_outcome.last()

In [None]:
animals.groupby('animal_type').mean()

Once again, as we will see in SQL, in order to return a groupby objects are intended to be used with aggregation. In SQL, we will see that our queries that include GROUP BY require aggregation performed on columns.

We can use sum, mean, count, max, min... Find a list of common aggregations [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)

In [None]:
# Group by multiple indices
animals.groupby(['animal_type', 'outcome_type']).get_group(('Cat', 'Transfer')).describe()