# Misleading Data

Let's take a look at how legit data can be turned into misleading visuals.

## The Dataset

The data used in this notebook was gathered from the (FBI Crime Data Explorer)[https://cde.ucr.cjis.gov/]

__Homicides in the United States__

__file:__ `../data/exports/cleaned_homicide_data.csv` 

Contains 10 years of reported homicides in the United States.

`Year` - The Year
`Total_Count` - Total Count of homicides for the corresponding year

__Robberies in the United States__

__file:__ `../data/exports/cleaned_robbery_data.csv`

Contains 10 years of reported robberies in the United States.

`Year` - The Year
`Total_Count` - Total Count of robberies for the corresponding year

__Note:__

Datasets were cleaned from their original download.

__Cleaning:__

* Modifying column names for consistency (removing white space & modifying casing)
* Filtering Columns - kept the United States column, dropped `United States Clearances` and `United States Percent of Population Coverage`
* Grouped the data by Year to show total count per year.

In [None]:
# Import
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load Data
file = "../data/exports/cleaned_homicide_data.csv"

homicides = pd.read_csv(file)

homicides

# Murders in the United States

## Inspect the Visualization Below 👀

The Cell Below is a visualization based on the data that was imported.

* What stories does the graph tell you?
* How might the everyday person interpret this graph?
* What is wrong with the graph?

In [None]:
# Filtering Data
mask = (homicides["year"] >= 2019) & (homicides["year"] <= 2020)
homicides_filtered = homicides[mask]

# Set Figure
plt.figure(figsize=(10,6))

# Generate Visualization
sns.barplot(
    data=homicides_filtered,
    x="year",
    y="count",
    palette="Set1",
    hue="year",
    legend=False)

plt.ylim(12500, 23000)

# Set Labels
plt.title("Reported Homicides in the United States")
plt.ylabel("Count")
plt.xlabel("Year")

plt.show()

## Modify the Graph

* Modify the code in the cell below to fix this poor excuse of data representation!!
* What will you fix, how will you fix it?
* How does the new graph compare to the previous graph????

In [None]:
############ Modify this Code ############
# Filtering Data
mask = (homicides["year"] >= 2019) & (homicides["year"] <= 2020)
homicides_filtered = homicides[mask]

# Set Figure
plt.figure(figsize=(10,6))

# Generate Visualization
sns.barplot(
    data=homicides_filtered,
    x="year",
    y="count",
    palette="Set1",
    hue="year",
    legend=False)

plt.ylim(12500, 23000)

# Set Labels
plt.title("Reported Homicides in the United States")
plt.ylabel("Count")
plt.xlabel("Year")

plt.show()

# Robberies in the United States

## Inspect the Visualization Below 👀

The Cell Below is a visualization based on the data that was imported.

* What stories does the graph tell you?
* How might the everyday person interpret this graph?
* What is wrong with the graph?

In [None]:
# Load Data
file = "../data/exports/cleaned_robbery_data.csv"

robberies = pd.read_csv(file)

robberies.info()

In [None]:
# Set Figure

# Import Ticker
import matplotlib.ticker as ticker

# Set Filter
mask = (robberies["year"] >= 2015) & (robberies["year"] <= 2016)
robberies_filtered = robberies[mask]

# Set Figure
plt.figure(figsize=(10,6))

# Generate Plot
sns.lineplot(
    data=robberies_filtered,
    x="year",
    y="count",
    color="blue")

plt.ylim(3000, 20000)

# Minor Modification to Keep the X-Axis as Integers
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))

# Set Titles and What not
plt.title("Reported Robberies in the United States")
plt.xlabel("Year")
plt.ylabel("Count")

## Modify the Graph

* FIX THIS GRAPH PLEASE

In [None]:
############ Modify this Code ############

# Set Filter
mask = (robberies["year"] >= 2015) & (robberies["year"] <= 2016)
robberies_filtered = robberies[mask]

# Set Figure
plt.figure(figsize=(10,6))

# Generate Plot
sns.lineplot(
    data=robberies_filtered,
    x="year",
    y="count",
    color="blue")

plt.ylim(3000, 20000)

# Minor Modification to Keep the X-Axis as Integers
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))

# Set Titles and What not
plt.title("Reported Robberies in the United States")
plt.xlabel("Year")
plt.ylabel("Count")
############ Modify this Code ############