<img src='../images/gdd-logo.png' width='300px' align='right' style="padding: 15px">

# <font color='#1EB0E0'>Python Packaging</font>

For this assignment we shall add to our package and explore how it works.

Running the cell below will automatically update changes to the project.

In [1]:
%load_ext autoreload
%autoreload 2

You can load the data with the helper function `load_data()`.

*Note that you will need to have the `animal_shelter` package installed for the cell below to work!*

In [2]:
from animal_shelter.data import load_data

animal_outcomes = load_data('../data/train.csv')
animal_outcomes.head()

Unnamed: 0,animal_id,name,date_time,outcome_type,outcome_subtype,animal_type,sex_upon_outcome,age_upon_outcome,breed,color
0,A671945,Hambone,2014-02-12 18:22:00,Return_to_owner,Unknown,Dog,Neutered Male,1 year,Shetland Sheepdog Mix,Brown/White
1,A656520,Emily,2013-10-13 12:44:00,Euthanasia,Suffering,Cat,Spayed Female,1 year,Domestic Shorthair Mix,Cream Tabby
2,A686464,Pearce,2015-01-31 12:28:00,Adoption,Foster,Dog,Neutered Male,2 years,Pit Bull Mix,Blue/White
3,A683430,Unknown,2014-07-11 19:09:00,Transfer,Partner,Cat,Intact Male,3 weeks,Domestic Shorthair Mix,Blue Cream
4,A667013,Unknown,2013-11-15 12:52:00,Transfer,Partner,Dog,Neutered Male,2 years,Lhasa Apso/Miniature Poodle,Tan


We would like to add some features to this dataset that may be helpful to the machine learning algorithms we will use at a latter stage:

- boolean indicator for whether it is a dog
- boolean indicator for whether it has a name
- categorical feature indicating its sex
- categorical feature indicating whether it is neutered
- catergorical feature indicating its hair type
- age upon outcome in days

<font color='#1EB0E0'>*Why would we want to add thesse features?*</font>

We can add all of these features to the dataset with the functions below.

In [3]:
import logging

import numpy as np
import pandas as pd


def add_features(df):
    """Add some features to our data.
    Parameters
    ----------
    df : pandas.DataFrame
        DataFrame with data (see load_data)
    Returns
    -------
    with_features : pandas.DataFrame
        DataFrame with some column features added
    """
    df['is_dog'] = check_is_dog(df['animal_type'])


    # Check if it has a name.
    df['has_name'] = df['name'].str.lower() != 'unknown'


    # Get sex.
    sexUponOutcome = df['sex_upon_outcome']
    sex = pd.Series('unknown', index=sexUponOutcome.index)

    sex.loc[sexUponOutcome.str.endswith('Female')] = 'female'
    sex.loc[sexUponOutcome.str.endswith('Male')] = 'male'
    df['sex'] = sex



    # Check if neutered.
    neutered = sexUponOutcome.str.lower()
    neutered.loc[neutered.str.contains('neutered')] = 'fixed'
    neutered.loc[neutered.str.contains('spayed')] = 'fixed'


    neutered.loc[neutered.str.contains('intact')] = 'intact'
    neutered.loc[~neutered.isin(['fixed', 'intact'])] = 'unknown'


    df['neutered'] = neutered



    # Get hair type.

    hairType = df['breed'].str.lower()
    Valid_hair_types = ['shorthair', 'medium hair', 'longhair']



    for hair in Valid_hair_types:
        is_hair_type = hairType.str.contains(hair)
        hairType[is_hair_type] = hair

    hairType[~hairType.isin(Valid_hair_types)] = 'unknown'


    df['hair_type'] = hairType


    # Age in days upon outcome.

    Split_Age = df['age_upon_outcome'].str.split()
    time = Split_Age.apply(lambda x: x[0] if x[0] != 'Unknown' else np.nan)
    period = Split_Age.apply(lambda x: x[1] if x[0] != 'Unknown' else None)
    period_Mapping = {'year': 365, 'years': 365, 'weeks': 7, 'week': 7,
                      'month': 30, 'months': 30, 'days': 1, 'day': 1}
    days_upon_outcome = time.astype(float) * period.map(period_Mapping)
    df['days_upon_outcome'] = days_upon_outcome



    return df

def check_is_dog(animal_type):
    """Check if the animal is a dog, otherwise return False.
    Parameters
    ----------
    animal_type : pandas.Series
        Type of animal
    Returns
    -------
    result : pandas.Series
        Dog or not
    """
    # Check if it's either a cat or a dog.
    is_cat_dog = animal_type.str.lower().isin(['dog', 'cat'])
    if not is_cat_dog.all():
        print('Found something else but dogs and cats:\n%s',
              animal_type[~is_cat_dog])
        raise RuntimeError("Found pets that are not dogs or cats.")
    is_dog = animal_type.str.lower() == 'dog'
    return is_dog


def check_has_name(name):
    """Check if the animal is not called 'unknown'.
    Parameters
    ----------
    name : pandas.Series
        Animal name
    Returns
    -------
    result : pandas.Series
        Unknown or not.
    """
    return name  # TODO: Replace this.


def get_sex(sex_upon_outcome):
    """Determine if the sex was 'Male', 'Female' or unknown.
    Parameters
    ----------
    sex_upon_outcome : pandas.Series
        Sex and fixed state when coming in
    Returns
    -------
    sex : pandas.Series
        Sex when coming in
    """
    return sex_upon_outcome  # TODO: Replace this.


def get_neutered(sex_upon_outcome):
    """Determine if an animal was intact or not.
    Parameters
    ----------
    sex_upon_outcome : pandas.Series
        Sex and fixed state when coming in
    Returns
    -------
    sex : pandas.Series
        Intact, fixed or unknown
    """
    return sex_upon_outcome  # TODO: Replace this.


def get_hair_type(breed):
    """Get hair type of a breed.
    Parameters
    ----------
    breed : pandas.Series
        Breed of animal
    Returns
    -------
    hair_type : pandas.Series
        Hair type
    """
    return breed  # TODO: Replace this.


def compute_days_upon_outcome(age_upon_outcome):
    """Compute age in days upon outcome.
    Parameters
    ----------
    age_upon_outcome : pandas.Series
        Age as string
    Returns
    -------
    days_upon_outcome : pandas.Series
        Age in days
    """
    return age_upon_outcome  # TODO: Replace this.import logging


There is some bad practice going on in the functions above. For example, the function `add_features` is doing [multiple things](https://blog.codinghorror.com/curlys-law-do-one-thing/).

We will improve the quality of these functions later on, for now we shall focus on expanding a package.

### <mark>Exercise - Expanding the package

Copy these functions from the cell below to the package.
    
Afterwards make sure you can import the `add_features` functions and run it in the cell below to add features to the DataFrame.

In [None]:
# Fix this: Replace ... with the function that should be called

from animal_shelter.features import ...

with_features = ...(animal_outcomes)

In [None]:
# %load solutions/packaging.py
from animal_shelter.features import add_features
from animal_shelter.data import load_data

animal_outcomes = load_data('../data/train.csv')
with_features = add_features(animal_outcomes)
with_features.head()
