# Skills challenge \#1
Below are a series of questions. Use the loaded data to answer the questions. You will almost certainly need to import more packages (`pandas`, `numpy`, etc.) to complete these. You are welcome to use any source except for your classmates. So Google away!

You will be graded on both the **correctness** and **cleanliness** of your work. So don't submit poorly written code or your grade will reflect that. Use Markdown describing what you have done. If you get stuck, move on to another part. Most questions don't rely on the answer to earlier questions.

### Imports

In [1]:
import pandas as pd
import numpy as np

### Data loading

In [2]:
df = pd.read_csv('../data/2016_austin_crime.csv')

In [3]:
df.head()

Unnamed: 0,GO Primary Key,Council District,GO Highest Offense Desc,Highest NIBRS/UCR Offense Description,GO Report Date,GO Location,Clearance Status,Clearance Date,GO District,GO Location Zip,GO Census Tract,GO X Coordinate,GO Y Coordinate
0,201610188.0,8.0,AGG ASLT ENHANC STRANGL/SUFFOC,Agg Assault,1-Jan-16,8600 W SH 71 ...,C,12-Jan-16,D,78735.0,19.08,3067322.0,10062796.0
1,201610643.0,9.0,THEFT,Theft,1-Jan-16,219 E 6TH ST ...,C,4-Jan-16,G,78701.0,11.0,3114957.0,10070462.0
2,201610892.0,4.0,AGG ROBBERY/DEADLY WEAPON,Robbery,1-Jan-16,701 W LONGSPUR BLVD ...,N,3-May-16,E,78753.0,18.23,3129181.0,10106923.0
3,201610893.0,9.0,THEFT,Theft,1-Jan-16,404 COLORADO ST ...,N,22-Jan-16,G,78701.0,11.0,3113643.0,10070357.0
4,201611018.0,4.0,SEXUAL ASSAULT W/ OBJECT,Rape,1-Jan-16,,C,10-Mar-16,E,78753.0,18.33,,


### Data description

This data is all the crimes recorded by the Austin PD in 2016. The columns that we are interested are:
- **Council District**: The district in which the crime was committed ([map of districts](https://www.austinchronicle.com/binary/35e1/pols_feature51.jpg))
- **GO Highest Offense Desc**: A text description of the offense using the APD description
- **Highest NIBRS/UCR Offense Description**: A text description using the FBI description
- **GO Report Date**: The date on which the crime was reported
- **Clearance Status**: Whether or not the crime was "cleared" (i.e. the case was closed due to an arrest)
- **Clearance Date**: When the crime was cleared
- **GO Location Zip**: The zip code where the crime occurred

## Tasks

### Data cleaning
**DC1:** Drop all columns that are not in the list above. Save this back as the variable `df`.

**DC2:** Rename the columns to be all lowercase, replace spaces with underscores ("_"), and remove "GO" from all column names. Finally, make sure there are no spaces at the start or finish of a column name. For example, ``'  my_col '`` should be renamed to `'my_col'` (notice that the spaces are gone), and "GO Report Date" should become "report_date". Rename "Highest NIBRS/UCR Offense Description" to "fbi_desc", and "GO Highest Offense Desc" to "apd_desc".

**DC3:** For each column, print how many `None` or `NaN` values are in the column, along with what percentage of the rows are missing. Round the percentage to two decimal places. Your output should look like:

```
col1_name: 20 (0.05%) missing values 
col2_name: 150 (1.56%) missing values 
```

**DC4:** Drop any rows which have any missing values. Save the result back to `df`.

**DC5:** For any column which is a `float`, check if the numbers really are floats (i.e. is there a reason they're a decimal?). If they're not really decimals (for instance, if all of them have .0 at the end), then convert the column to integers.

### Data exporation
**DE1:** Print out each district, along with what percentage of the crimes occurred in it.

**DE2:** Do the same for each zip code.

**DE3:** Print what percentage of crimes were cleared and what percentage were not.

**DE4:** Do the same for crimes by the FBI description (so percentage of each type of crime).

### Bonus questions
**B1:** Create a dictionary (Python `dict`) that has the FBI description as the key and a list of all APD descriptions that map to it as the values. So for example, it may look like `{'Theft': ['THEFT FROM BUILDING', 'THEFT', ...], 'Robbery': ['AGG ROBBERY/DEADLY WEAPON', 'PURSE SNATCHING', ...]}`. 

**B2:** Write a function which allows a person to type in an FBI description, and the function returns a dictionary with the following summary:
- Number of crimes comitted with that description.
- Percentage of crimes committed with that description. Leave it as a float between 0 and 1.
- The percentage of crimes with that description which were "cleared" (clearance status of "C").
- The zip in which the crime occurred most often.
- The district in which the crime occurred most often.

The function should still work even if the person types in the FBI description with incorrect capitalization or spacing. So for instance, if the FBI description is "Theft", then any of the following should still work:
- 'Theft'
- 'THEFT'
- 'theft'
- 'thEFt'
- '    theft'
- '    THeft   '