![](https://images.unsplash.com/photo-1570541510210-02630d54258b?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1050&q=80)

In the first part of this notebook we'll see what categories of crime exhibited the greatest year-over-year decrease in the lockdown year. In the second part we'll see which month generally has the greatest number of motor vehicle thefts. And as a bonus we'll see which community area of Chicago saw the biggest decrease in crime. But let's start by importing necessary tools from biquery and bq_helper and looking at the description of columns, I'll comment my work in both English and German.

*Im ersten Teil dieses Notebooks werden wir sehen, welche Kriminalitätskategorien die größte Veränderung im Lockdownjahr hatten. Im zweiten Teil antworten wir auf die Frage, welcher Monat die höchste Zahl von Autodiebstähle hat. Zum Schluss sehen wir, welche Chicagos Bezirke den stärksten Rückgang in Kriminalität hatten. Aber wir starten mit dem Importieren von Werkzeuge aus bigquery und bq_helper. Unterwegs mache ich Kommentare auf Deutsch und Englisch.*

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
from google.cloud import bigquery

# Create a "Client" object
client = bigquery.Client()

# Construct a reference to the "chicago_crime" dataset
dataset_ref = client.dataset("chicago_crime", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

# Construct a reference to the "crime" table
table_ref = dataset_ref.table("crime")

# API request - fetch the table
table = client.get_table(table_ref)

# Preview the first five lines of the table
client.list_rows(table, max_results=5).to_dataframe()

In [None]:
import bq_helper
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "chicago_crime")

In [None]:
bq_assistant.table_schema("crime")

In [None]:
def show_amount_of_data_scanned(query):
    # dry_run lets us see how much data the query uses without running it
    dry_run_config = bigquery.QueryJobConfig(dry_run=True)
    query_job = client.query(query, job_config=dry_run_config)
    print('Data processed: {} GB'.format(round(query_job.total_bytes_processed / 10**9, 3)))

# Part 1 / Teil 1

**What categories of crime exhibited the greatest year-over-year decrease in the lockdown year?**

***Welche Kriminalitätskategorien hatten die größte Veränderung zwischen 2019 und 2020?***

I set some thresholds to my first query, for example the number of arrests for any crime category should be over 1000

*Ich setze manche Schwellenwerte zu meiner ersten Abfrage, zum Beispiel die Zahl der Festnahmen muss über 1000 sein.*

In [None]:
category_query = """SELECT
  primary_type,
  description,
  COUNTIF(year = 2019) AS arrests_2019,
  COUNTIF(year = 2020) AS arrests_2020,
  FORMAT('%3.2f', (COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019)*100) AS pct_change_2019_to_2020
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2019,
    2020)
GROUP BY
  primary_type,
  description
HAVING COUNTIF(year = 2019) > 1 AND (COUNTIF(year = 2019) > 1000 OR COUNTIF(year = 2020) > 1000)
ORDER BY
  ABS(COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019) DESC
        """
category = client.query(category_query).result().to_dataframe()

import pandas as pd

pd.set_option("display.max_rows", None, "display.max_columns", None)

category

We see above that there was a great increase in some crime categories, but it seems to have something to do with bureaucracy. Look only at the line 0 and at the line 5, they both deal with the same type of crime. That's why I will work further with the "primary_type" column and not with the "description" column.

*Oben sehen wir, dass in manchen Kategorien die Zahl der Festnahmen im Lockdownjahr stark gestiegen ist. Aber es scheint mehr mit Bürokratie zu tun, vergleichen Sie nur die Zeile Nummer 0 mit der Zeile Nummer 5. Deswegen arbeite ich weiter mit der primary_type Spalte und nicht mit der description Spalte.*

Let's check the number of arrests.

*Sehen wir die Zahl der Festnahmen.*

In [None]:
arrest_query = """SELECT
  COUNTIF(year = 2019) AS arrests_2019,
  COUNTIF(year = 2020) AS arrests_2020
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
    arrest = TRUE
    AND year IN (2019, 2020)
    """

arrest = client.query(arrest_query).result().to_dataframe()
arrest

In [None]:
type_query = """SELECT
  primary_type,
  COUNTIF(year = 2019) AS arrests_2019,
  COUNTIF(year = 2020) AS arrests_2020,
  FORMAT('%3.2f', (COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019)*100) AS pct_change_2019_to_2020
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2019,
    2020)
GROUP BY
  primary_type
HAVING COUNTIF(year = 2019) > 1 AND ((COUNTIF(year = 2019) > 365 OR COUNTIF(year = 2020) > 365))
ORDER BY
  ABS(COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019) DESC
        """
type = client.query(type_query).result().to_dataframe()

type

In [None]:
type.plot.bar(x='primary_type', rot=90)

On the picture above we see that there is a decrease in almost every crime category in the lockdown year. The only exception are Weapons Violations.

*Auf der Grafik sehen wir, dass im Lockdownjahr die Zahl der Festnahmen fast in jeder Kategorie einen deutlichen Rückgang hatte. Verstosse gegen das Waffengesetz sind die einzige Ausnahme.*

# Part 2 / Teil 2

**Which month generally has the greatest number of motor vehicle thefts?**

***Welcher Monat hat die höchste Zahl von Autodiebstähle? ***

In [None]:
month_query = """SELECT
  EXTRACT(MONTH FROM
      date) AS month,
      EXTRACT(YEAR
    FROM
      date) AS year,
  COUNT(*) AS incidents
FROM `bigquery-public-data.chicago_crime.crime`
  WHERE primary_type = 'MOTOR VEHICLE THEFT'
  GROUP BY
    year,
    month
ORDER BY
  incidents DESC
        """

show_amount_of_data_scanned(month_query)

In [None]:
month_query = """SELECT
  EXTRACT(MONTH FROM
      date) AS month,
      EXTRACT(YEAR
    FROM
      date) AS year,
  COUNT(*) AS incidents
FROM `bigquery-public-data.chicago_crime.crime`
  WHERE primary_type = 'MOTOR VEHICLE THEFT'
  GROUP BY
    year,
    month
ORDER BY
  incidents DESC
        """

month = client.query(month_query).result().to_dataframe()

month.head()

In [None]:
month.groupby('month')['incidents'].sum()

In [None]:
month.groupby('month')['incidents'].sum().idxmax()

In [None]:
month_summary = month.groupby('month')['incidents'].sum()

month_summary = month_summary.to_frame()

month_summary

In [None]:
month_summary.reset_index(level=0, inplace=True)

In [None]:
month_summary.sort_values(by='incidents', ascending=False)

In [None]:
import seaborn as sns

sns.barplot(x=month_summary['month'], y=month_summary['incidents'])

We see that July in general has the highest number of motor vehicle thefts, and October 2001 has the highest number of them ever.

*Wir sehen, dass im Juli die meisten Autodiebstähle passieren und Oktober 2001 die größte Zahl der Autodiebstähle denn je hatte.*

In [None]:
month.groupby('month')['incidents'].max().idxmax()

# Bonus

In this part we'll check which community area of Chicago saw the biggest decrease in crime.

*In diesem Teil werden wir sehen, welche Chicagos Bezirke den stärksten Rückgang in Kriminalität hatten.*

In [None]:
community_query = """SELECT
  community_area,
  COUNT(*) AS incidents
FROM `bigquery-public-data.chicago_crime.crime`
  WHERE arrest = TRUE
    AND year = 2021
  GROUP BY
    community_area
ORDER BY
  incidents DESC
        """

show_amount_of_data_scanned(month_query)

In [None]:
community_query = """SELECT
  community_area,
  COUNT(*) AS arrests
FROM `bigquery-public-data.chicago_crime.crime`
  WHERE arrest = TRUE
    AND year = 2021
  GROUP BY
    community_area
ORDER BY
  arrests DESC
        """

community = client.query(community_query).result().to_dataframe()

community.head()

In [None]:
community2020_query = """SELECT
  community_area,
  COUNTIF(year = 2019) AS arrests_2019,
  COUNTIF(year = 2020) AS arrests_2020,
  FORMAT('%3.2f', (COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019)*100) AS pct_change_2019_to_2020
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2019,
    2020)
GROUP BY
  community_area
HAVING COUNTIF(year = 2019) > 1
ORDER BY
  ABS(COUNTIF(year = 2020) - COUNTIF(year = 2019)) / COUNTIF(year = 2019) DESC
        """
community2020 = client.query(community2020_query).result().to_dataframe()

community2020.head()

In [None]:
areas_names = pd.read_csv("../input/chicagocsv/chicago_area.csv", sep=';')

areas_names.columns

In [None]:
community2020_join = pd.merge(areas_names, community2020, left_on='dropoff_community_area', right_on="community_area")

del community2020_join['dropoff_community_area']

community2020_join.head()

In [None]:
community2020_join.dtypes

In [None]:
community2020_join["pct_change_2019_to_2020"] = pd.to_numeric(community2020_join["pct_change_2019_to_2020"])

In [None]:
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

community2020_join.plot.bar(x='community_name', y='pct_change_2019_to_2020', rot=90)

plt.gcf().set_size_inches(20, 10)

plt.title("Change in number of arrests, by Community Areas")
plt.ylabel("Difference between 2019 and 2020 (in percent)")
plt.xlabel("Chicago Community Areas")