# Crime Rates in Boston

Imagine that you have been tasked with exploring the crime rates in different districts in the city of Boston through the police dataset. The specific districts we will be looking at is B2, B3, and D14. Some stats about main communities within these districts are shown below:

**Mattapan and Roxbury (Districts B2 and B3)**
- Median income: $19,362
- Total population: 136,191
- Unemployment rate: 13.6%

**Middlesex (District D14)**
- Median income: $59,435
- Total population: 113,021
- Unemployment rate: 3.4%

The findings for this project will be useful in uncovering certain trends of crime throughout Boston and the communities within it. The informaiton here could determine where certain resources will be distributed within the Boston government for reducing crime. Good luck!

## Jupyter Notebook

First things first, let's get some terminology straight.
- The *language* we're working in – Python 3.7 
- The *editor* we're using is Google Colab – The code runs on Google's servers, and shows the results on our browser
- This file is an interactive Python notebook, a `.ipynb` file. These are pretty special, also known as **Jupyter notebooks**. 

Jupyter notebooks have a few special properties that make it ideal for work with data:
 - Code is organized into cells, which can be **code** or **markdown** 
 - We can run the cells in **any order**, try it out!
 - The last item returned in a cell will print automatically, no need to wrap it with `print()`

In [None]:
# Set a variable

In [None]:
# Return it without print

Anything you can do in Python, you can do here! 

1. Write a function that takes a string as input, and does something to it 
2. In a new cell, call the function and test it out

In [None]:
# Write a function here

In [None]:
# Call it here

## Importing packages

We use the `pandas` package to easily work with data as tables.
<br>The `numpy` package allows us to work with some other special data types, like missing values
<br><br>We'll rename these as `pd` and `np`, just so its easier to refer to later on

In [None]:
# Import pandas and numpy

## Importing data

For this semester, we'll typically work with data in *tabular* format, the type you'd be used to in an excel spreadsheet. Data files saved in this format will usually have a `.csv` file ending, short for comma seperated values.

For example, a CSV file could look something like...

```
INCIDENT_NUMBER, OFFENSE_DESCRIPTION, DISTRICT, SHOOTING,
PLTEST005, BURGLARY - RESIDENTIAL, B2 , True
PLTEST003, INVESTIGATE PROPERTY, B2, False
PLTEST002, INVESTIGATE PROPERTY, B2, False
```

To import this, let's use the `pd.read_csv()` function:

In [None]:
# Read in the dataframe
url = 'https://raw.githubusercontent.com/dt3zjy/node/master/week-1/workshop/boston_crime.csv'

Here, we've saved the data to a `dataframe` object named `crimes`

In [None]:
# Check the type

DataFrames contain our data in little "spreadsheet"-like structures. Whatever manipulations you can think of doing to the data, you can likely search how to do 

## Exploring dataframes

Let's take a look at the data. We'll use the function `.head()` to read in the first 5 rows

In [None]:
# Take a peek at the data

How big is the dataset? `.shape` returns a tuple with the dimensions as (rows, columns)

In [None]:
# Show shape

Let's try to understand our data a bit better. 
- How many different crimes are in the dataset? 

In [None]:
# Number of unique

- Which crime happens the most frequently?

In [None]:
# Value counts

Show the most recent crime by sorting the dataframe:

In [None]:
# Sort values

### Subsetting

Subsetting is a super helpful tool. We'll take a look at this more depth in next week, but for now, here are the basics:

We can filter rows from a dataframe based on some condition

- Show crimes that happened on `WASHINGTON ST`

In [None]:
# Subset by Washington St

How would you show only crimes north of the Museum of Fine Arts in Boston (Lat > `42.3394`)

Hint: Same way as matching if statements in python, mirroring the syntax above

In [None]:
# Your turn!

## Data Manipulation

What is the percentage of crimes where a shooting occurred?

In [None]:
# Finding percentage

## Visualization

First things first, let's import the package to help us visualize the data, `plotly`.

If this package isn't yet included, we can install it using `!pip install plotly`. More on this week 5. 

In [None]:
# Import

Note that we're using the sub package of the broader package, called `plotly express`. This simplifies a lot of the more difficult steps

Plotly express has a broad range of options to play with, let's take a look at the documentation. 
<br>Do a quick google search to pull up documentation for `px.histogram` OR run `px.histogram?` in a Jupyter cell

In [None]:
# Find more info on px.histogram

Let's look at the top ten most frequent crimes 

In [None]:
crime_sample = crime.sample(frac=.2)
crime_sample = crime_sample[crime_sample['OFFENSE_DESCRIPTION'].isin(crime_sample['OFFENSE_DESCRIPTION'].value_counts()[0:10].index)]

In [None]:
# Histogram

Look at the stacked histogram. What does the data tell you?

### Geographic Plots

Let's take this data into a geographic plot. These crimes happen at certain PLACES, so let's see if we can find a trend.

- Just because there are more data points at a certain district, does that always mean more crime will happen there?
    - Think about where the data is coming from. Is there another story behind it?

In [None]:
fig = px.scatter_mapbox(crime_sample[crime_sample['DISTRICT']!='B2'], lat='Lat', lon='Long', mapbox_style="stamen-terrain", zoom=10.5,
 color='OFFENSE_DESCRIPTION', hover_name='STREET', hover_data=['STREET','SHOOTING'],
 )
fig.show()

## Skim Over These Articles...
* [Overpolicing in Boston](https://www.wgbh.org/news/local-news/2020/06/12/black-people-made-up-70-percent-of-boston-police-stops-department-data-show)
* [What Policing Costs in Boston](https://www.vera.org/publications/what-policing-costs-in-americas-biggest-cities/boston-ma)

Does this paint a different light on the data?


## So, What's the Takeaway?
* Don't blindly take a dataset, analyze it, and jump to a conclusion
* **Think**: Where is the data coming from? *Why* is this trend being shown through the data?
* Look at the *whole* picture!
