# Notebook Best Practices Exercise

# <a id='toc'>Table of Contents</a>

  - ## [Task 1](#task_1): Header Practice and Link Practice
  - ## [Task 2](#task_2): Images
  - ## [Task 3](#task_3): Code Formatting
  - ## [Task 4](#task_4): Packaging functions in .py files
  - ## [Task 5](task_5): Creating a Table of Contents


In [None]:
from src.student_caller import one_random_student
from src.student_list import student_first_names

<a id='task_1'></a>


# Task 1: Header Practice and Link

In [None]:
#1a: convert this cell to a markdown cell the line below into the largest header  
Austin Animal Center Needs Analysis

![Austin-shelter](./images/austin-animal-center.jpg)

In [None]:
one_random_student(student_first_names)

**##1b: convert the line below into the second largest header**

Overview

**##1c: Add a link to the website of the Austin Animal Center `https://www.austintexas.gov/department/aac` in the space indicated below.**

This project analyzes the resource needs of the **insert_link_here** (AAC), which shelters 16,000 animals annually with a [No Kill policy](https://www.austintexas.gov/blog/no-kill-austin). Descriptive analysis of animal intake and outcome data shows that some animals require extended stays and that the number of sheltered animals varies seasonally. The Austin Animal Center can use this analysis to adjust outreach, hiring, and space utilization to improve resource allocation.

<a id='task_2'></a>
[back to TOC](#toc)

# Task 2: Images


## Business Problem
**2a: insert an image before the block of text using the `animals.png` found in the `images` folder**

The Austin Animal Shelter may be able to improve their resource allocation to both reduce costs and ensure that the center has staff and space to care for the animals brought to them. Doing so will allow the Austin Animal Shelter to better serve its clients while also freeing up resources to expand the scope of services they can offer. Using AAC's animal intake and outcome data, I describe patterns in intakes, stays, and exits to anticipate AAC's outreach, space, and staffing needs for supporting sheltered animals.



**2b:Add the same image using the image tag and src attribute: adjust the width to 560**


In [None]:
one_random_student(student_first_names)

## Data Understanding

![img2](./images/pet-resource-center-og.jpg)

The Austin Animal Center has the longest running public dataset of animal rescues in the country. Every animal has a unique ID associated with both their [intake](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and [outcome](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238) data. The data files provide the dates and types of each event, as well as other animal characteristics (e.g. type, sex). 

In [None]:
import pandas as pd
import numpy as np

In [None]:
intakes = pd.read_csv('./data/Austin_Animal_Center_Intakes_082620.csv')
outcomes = pd.read_csv('./data/Austin_Animal_Center_Outcomes_082620.csv')

In [None]:
intakes.info()

In [None]:
outcomes.info()

### Intake Data

The intake dataset includes records from October 2013 to August 2020, and includes a wide variety of intake types, intake conditions, and animal types.

In [None]:
intakes.head()

In [None]:
intakes['Intake Date'] =  pd.to_datetime(intakes['DateTime'])
intakes['Intake Date'].describe()

In [None]:
intakes['Intake Type'].value_counts()

In [None]:
intakes['Intake Condition'].value_counts()

In [None]:
intakes['Animal Type'].value_counts()

### Outcome Data

The outcome dataset also includes records from October 2013 to August 2020, and includes a variety of outcome types.

In [None]:
outcomes.head()

In [None]:
outcomes['Outcome Date'] =  pd.to_datetime(outcomes['DateTime'])
outcomes['Outcome Date'].describe()

In [None]:
outcomes['Outcome Type'].value_counts()

In [None]:
outcomes['Outcome Subtype'].value_counts()[:10]

## Data Preparation

### Data Cleaning

For the intake and outcome files, I make them easier to work with by normalizing column names and dropping unnecessary columns.

In [None]:
# Make column names easier to use
intakes.columns = intakes.columns.str.lower().str.replace(' ', '_')

# Drop unnecessary columns
intakes.drop(columns = ['name', 'datetime', 'monthyear', 'found_location', 'age_upon_intake', 'color'], inplace=True )

In [None]:
# Make column names easier to use
outcomes.columns = outcomes.columns.str.lower().str.replace(' ', '_')

# Drop unnecessary columns
outcomes.drop(columns = ['name', 'datetime', 'monthyear', 'date_of_birth', 'age_upon_outcome', 'sex_upon_outcome', 'animal_type', 'breed', 'color'], inplace=True )

<a id='task_3'></a>
[back to TOC](#toc)


# Task 3: Code formatting in markdown


### Merging Datasets

**3a: Use backticks to reformat the word event_num into code format**  

Before merging the datasets, I create a new event_num variable indexing the count of the intake or outcome for each animal ID. This will allow for a 1:1 merge.

In [None]:
one_random_student(student_first_names)

**3b: Convert the cell below to markdown, then surround the code with triple backticks.**

Include the word `python` after the first trio of backticks.

This use is not as common in final notebooks, but it is helpful to know how to display code blocks in markdown, just in case.

In [None]:
intakes['event_num'] = intakes.sort_values(['intake_date']).groupby(['animal_id']).rank()
outcomes['event_num'] = outcomes.sort_values(['outcome_date']).groupby(['animal_id']).rank()

In [None]:
one_random_student(student_first_names)

Combining the two files yields a single dataset for feature engineering and analysis. I exclude any unmatched outcome or intake data to ensure there are no missing values for the date features. I also exclude mismatched data so that analyses of stay lengths do not end including animals with negative numbers of days in shelter.

In [None]:
# Merge intakes and outcomes on animal id and year
animal_shelter_df  = pd.merge(intakes, 
                              outcomes, 
                              on=['animal_id', 'event_num'], 
                              how='left')

In [None]:
# Filter out animals who have yet to have outcomes and keeps animals where outcome data is later than intake date
animal_shelter_df = animal_shelter_df[(~animal_shelter_df['outcome_date'].isna()) 
                                    & (animal_shelter_df['outcome_date'] > animal_shelter_df['intake_date'])]
    
# Sorts the column names to be alphabetical
animal_shelter_df = animal_shelter_df[animal_shelter_df.columns.sort_values()]

In [None]:
animal_shelter_df.head(3)

### Feature Engineering

I create a `days_in_shelter` feature to analyze the amount of time animals spend at AAC.

In [None]:
animal_shelter_df['days_in_shelter'] = (animal_shelter_df['outcome_date'] - animal_shelter_df['intake_date']).dt.days

I create `year_month` features for aggregating values by month in my analysis.

In [None]:
# NOTE TO STUDENTS: You will learn better methods for handling time series data later in the course

animal_shelter_df['intake_year_month'] = animal_shelter_df['intake_date'].apply(lambda x: str(x.year) + '-' + x.strftime('%m'))
animal_shelter_df['outcome_year_month'] = animal_shelter_df['outcome_date'].apply(lambda x: str(x.year) + '-' + x.strftime('%m'))

In [None]:
animal_shelter_df.head(3)

In [None]:
# Save cleaned dataset as CSV
animal_shelter_df.to_csv('./data/cleaned_animal_center_df.csv')

## Analysis

In [None]:
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline

### Length of Stay

Most animals have short stays at AAC, with a median of 5 days. However, some animals take a very long time to exit - sometimes over 6 months! These extended stays may be partly a result of AAC's No Kill policy.

In [None]:
animal_shelter_df['days_in_shelter'].describe()

In [None]:
# Create plot
stay_length_fig, stay_length_axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

stay_length_axes[0].set_title('Short Stays (< 15 Days)')
stay_length_axes[0].set_ylabel('Number of Animals')
stay_length_axes[0].set_xlabel('Days of Shelter')

stay_length_axes[1].set_title('Long Stays (> 180 Days)')
stay_length_axes[1].set_ylabel('Number of Animals')
stay_length_axes[1].set_xlabel('Days of Shelter')

stay_lengths_low = animal_shelter_df['days_in_shelter'][animal_shelter_df['days_in_shelter'] < 15]
stay_lengths_high = animal_shelter_df['days_in_shelter'][animal_shelter_df['days_in_shelter'] > 180]

stay_length_axes[0].hist(stay_lengths_low, bins=15)
stay_length_axes[1].hist(stay_lengths_high, bins=30)

plt.savefig("./images/stay_distributions.png", dpi=150)
plt.show()

### Animals with Long Stays

AAC primarily shelters cats and dogs - cats have slightly longer stays, on average. However, dogs are more likely to have long stays of over 180 days.

In [None]:
animal_shelter_df[['animal_type','days_in_shelter']].groupby(['animal_type']).agg(['count', 'median', 'mean'])

<a id='task_4'></a>
[back to TOC](#toc)



# Task 4: Packaging Functions in .py Files

Moving functions to .py files can make your notebook look a lot cleaner.  
Doing so can also allow you to use functions in other notebooks and .py files.


In [None]:
# Task 4a: create a file called data_cleaning.py in the src folder

In [None]:
one_random_student(student_first_names)

In [None]:
# Task 4b: copy the stay_length_type into the data_cleaning.py file.

def stay_length_type(days):
    '''
    This function takes in a number of days and returns a descriptive string. 
    This is used to categorize animal stay lengths for further analysis.
    
    Less than 15 days: 'Short'
    15 to 180 days: 'Medium'
    More than 180 days: 'Long'
    '''
    if (days < 15):
        return 'Short'
    elif (days > 180):
        return 'Long'
    else:
        return 'Medium'

In [None]:
one_random_student(student_first_names)

In [None]:
# Task 4c: import the file as you would any other package.


In [None]:
one_random_student(student_first_names)

In [None]:
animal_shelter_df['stay_length'] = animal_shelter_df['days_in_shelter'].apply(stay_length_type)

#Task 4c: Add the following above the import statement.
`%load_ext autoreload`
`%autoreload 2`
Including these two lines is crucial.  Without them, any changes you make in your .py file will not register in the notebook without restarting the server.

[back to TOC](#toc)

# Task 5: Table of Contents
The link to this section in the TOC is not functional.  

Look at how the links are created in the markdown cells for tasks 1-4 as well as the TOC, and create an active link to Task 5.


In [None]:
one_random_student(student_first_names)

In [None]:
stay_length_type_by_animal_type = pd.crosstab(animal_shelter_df['animal_type'], animal_shelter_df['stay_length'], normalize = 'columns')
stay_length_type_by_animal_type

In [None]:
import matplotlib.ticker as mtick

stay_length_type_by_animal_fig, stay_length_type_by_animal_ax = plt.subplots(figsize=(10, 6))

stay_length_type_by_animal_ax.set_title('Stay Lengths by Animal Type')
stay_length_type_by_animal_ax.set_ylabel('Percent of Sheltered Animals')
stay_length_type_by_animal_ax.set_xlabel('Stay Lengths')

stay_lengths = stay_length_type_by_animal_type.columns
cat_lengths = stay_length_type_by_animal_type.loc['Cat']*100
dog_lengths = stay_length_type_by_animal_type.loc['Dog']*100

dog_bar_plt = stay_length_type_by_animal_ax.bar(stay_lengths, dog_lengths )
cat_bar_plt = stay_length_type_by_animal_ax.bar(stay_lengths, cat_lengths, bottom = dog_lengths )

stay_length_type_by_animal_ax.legend([cat_bar_plt, dog_bar_plt], ['Cats', 'Dogs'], loc = 'upper left')
stay_length_type_by_animal_ax.yaxis.set_major_formatter(mtick.PercentFormatter())

plt.savefig("./images/stay_lengths_by_type.png", dpi=150)
plt.show()

### Seasonality of Intake and Exit Volume

The number of animal intakes typically peaks in the spring and bottoms out in the fall and winter. Animal exits follow a similar trend, but tend to lag behind intakes by about one month. This pattern does not seem to hold in 2020, possibly as a result of the COVID-19 pandemic.

In [None]:
months = animal_shelter_df['intake_year_month'].sort_values().unique()

# Counts of Intakes and exits by Month
intakes_by_month = animal_shelter_df[['intake_year_month', 'animal_id']].groupby('intake_year_month').count()
exits_by_month = animal_shelter_df[['outcome_year_month', 'animal_id']].groupby('outcome_year_month').count()

in_out_df = pd.DataFrame(intakes_by_month).rename(columns={'animal_id': 'Number of Intakes'})
in_out_df['Number of Exits'] = exits_by_month

# Create plot
in_out_fig, in_out_ax = plt.subplots(figsize=(10, 6))

in_out_ax.set_title('Intakes and Exits by Month')
in_out_ax.set_ylabel('Number of Animals')
in_out_ax.set_xlabel('Month')

in_plt, out_plt = in_out_ax.plot(in_out_df)

in_out_ax.legend([in_plt, out_plt], ['Number of Intakes', 'Number of Exits'], loc = 'lower left')
plt.xticks(months[::3], rotation = 70)
plt.grid()

plt.savefig("./images/in_out_by_month.png", dpi=150)
plt.grid()
plt.show()

### Seasonality of Sheltered Animal Counts

The total number of sheltered animals typically peaks in May of each year and then hits its lowest point around January. There is often a secondary peak sometime after May before the number of sheltered animals drops rapidly. The number of sheltered animals has dropped precipitously in 2020, likely as a result of COVID-19.

In [None]:
# Net Change in Sheltered Animal Counts
in_out_df['Change in Sheltered Animal Counts'] = in_out_df['Number of Intakes'] - in_out_df['Number of Exits']
in_out_df['Total Sheltered Animal Counts'] = in_out_df['Change in Sheltered Animal Counts'].sort_index().cumsum()
in_out_df['Total Sheltered Animal Counts'] = in_out_df['Total Sheltered Animal Counts'] + in_out_df['Total Sheltered Animal Counts'].min()

# Create Plot
shelter_count_fig, shelter_count_ax = plt.subplots(figsize=(10, 6))

shelter_count_ax.set_title('Sheltered Animal Counts by Month')
shelter_count_ax.set_ylabel('Number of Animals')
shelter_count_ax.set_xlabel('Month')

shelter_count_plt, shelter_change_plt = shelter_count_ax.plot(in_out_df[['Total Sheltered Animal Counts', 'Change in Sheltered Animal Counts']])

shelter_count_ax.legend([shelter_count_plt, shelter_change_plt], ['Total Sheltered Animal Counts', 'Change in Sheltered Animal Counts'], loc = 'upper left')
plt.xticks(months[::3], rotation = 70)
plt.grid()

plt.savefig("./images/sheltered_by_month.png", dpi=150)
plt.grid()
plt.show()

## Conclusions

This analysis leads to three recommendations for improving operations of the Austin Animal Center:

- **Engage in targeted outreach campaigns for dogs that have been sheltered at AAC for more than 30 days.** While most dogs will have been placed after 30 days, this may help reduce the number of dogs that end up having extended stays, potentially requiring many more months of care.
- **Reduce current spending until the numbers of intakes and sheltered animals return to normal.** Given the reduced activity during this period, AAC should consider ways to temporarily reduce costs by changing space utilization or staffing.
- **Hire seasonal staff and rent temporary space for May through December.** To accommodate the high volume of intakes and number of sheltered animals in the spring and fall, AAC should leverage seasonal resources, rather than full-year ones. This will allow AAC to cut back on expenditures during the months when there is lower

### Next Steps

Further analyses could yield additional insights to further improve operations at AAC:

- **Better prediction of animals that are likely to have long stays.** This modeling could use already available data, such as breed and intake condition.
- **Model need for medical support.** This modeling could predict the need for specialized personnel to address animals' medical needs, including neutering, using intake condition and sex data.
- **Predicting undesirable outcomes.** This modeling could identify animals that are more likely to have undesirable outcomes (e.g. Euthanasia) for targeted medical support or outreach.