# Solving Crimes with Data Science
<br/>
<br/>
<br/>
<br/>
<br/>
<b>Markus Harrer</b>
  
`@feststelltaste`


<small>PyDay 1, 19. Dezember 2019</small>

<img src="resources/innoq_logo.jpg" width=20% height="20%" align="right"/>


## Facts of the case
1. A white mini bus with a red "g" sign on the side window was stolen
1. Police did an innovative mobile phone investigation
1. There is only phone number of unknown identity left

Our approach: Where is the place of residence of the phone number's owner?

## What do we have?

CDRs (Call Data Records) in an Excel file!

That means: Information about the cell towers used for the phone calls!



## Import and Load
Using pandas to read an Excel file into a Dataframe.

In [None]:
import pandas as pd
cdr = pd.read_excel("cdr_data.xlsx")
cdr.head()

Let's look into the dataset

In [None]:
cdr.info()

## Data Cleaning
Convert the text in `Start` to a date data type

In [None]:
cdr['Start'] = pd.to_datetime(cdr['Start'])
cdr.head()

## Filtering
We know the suspect's phone number (4638472273). We also want to only keep the incoming calls (== `Caller`'s `TowerID`).

In [None]:
suspect = cdr[cdr['Caller'] == 4638472273]
suspect = suspect[suspect['Event'] == 'Incoming']
suspect.head()

## But...

Unfortunately: Information about the tower's locations are missing!

We need a second data source from the DARKNET!

## Load another dataset
This time: a CSV (comma separated values) file

In [None]:
towers = pd.read_csv(
    "darknet.io/hacks/infrastructure/mobile_net/texas_towers.csv",
    index_col=0)
towers.head()

## Data Enrichment
Bringing datasets together by joining them

In [None]:
suspect_loc = suspect.join(towers, on='TowerID')
suspect_loc.head()

## Explore the datapoints
Let's take a look at the locations

In [None]:
%matplotlib inline
suspect_loc.plot.scatter('TowerLon', "TowerLat");

## Modeling
Make some hypothesis about the suspect's home

#### Hypothesis 1: Home at weekends

In [None]:
suspect_loc['DoW'] = suspect_loc['Start'].dt.weekday_name
suspect_loc.head()

In [None]:
suspect_on_weekend = suspect_loc[suspect_loc['DoW'].isin(['Saturday', 'Sunday'])].copy()
suspect_on_weekend.head()

Let's take a look at the weekend's locations

In [None]:
suspect_on_weekend.plot.scatter('TowerLon', "TowerLat");

#### Hypothesis 2: Sleeping at night

In [None]:
suspect_on_weekend['hour'] = suspect_on_weekend['Start'].dt.hour
suspect_on_weekend.head()

Keep only the sleeping hours

In [None]:
suspect_on_weekend_nights = suspect_on_weekend[
    (suspect_on_weekend['hour'] < 6)  | (suspect_on_weekend['hour'] > 22)]
suspect_on_weekend_nights.head()

Let's see where the suspect sleeps on weekends

In [None]:
ax = suspect_on_weekend_nights.plot.scatter('TowerLat', 'TowerLon');

#### Clustering
* Cell phones connect to various towers over time
* Therefore: Find the main center of sleeping activities

In [None]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 1)
data = suspect_on_weekend_nights[['TowerLat', 'TowerLon']]
kmeans.fit_predict(data)
centroids = kmeans.cluster_centers_
centroids

Let's plot the main center of sleeping activity

In [None]:
ax.scatter(x = centroids[:, 0], y = centroids[:, 1], c = 'r', marker = 'x').figure

## Results
Let's check the result in Google Maps!

In [None]:
print("https://www.google.com/maps/search/?api=1&query={},{}".format(centroids[0][0], centroids[0][1]))

## Discussion: How realistic do you think is this kind of investigation?

<div align="center"><img src="resources/switch_news.png" width="70%" /></div>

<small>https://gonintendo.com/stories/351366-switch-leads-to-break-in-murder-case</small>

## Thanks!

Any questions?

<br/>
  
<b>Contact</b>

Markus Harrer  
  
markus.harrer@innoq.com  
`@feststelltaste`  
https://feststelltaste.de

<img src="resources/innoq_logo.jpg" width=20% height="20%" align="right"/>

## Appendix: Tools used
- Jupyter Notebook
- Python
- pandas
- matplotlib
- scikit-learn

## Appendix: Getting started
My recommendations

1. https://www.feststelltaste.de/top5-jupyter/
1. https://www.feststelltaste.de/top5-python/
1. https://www.feststelltaste.de/top5-pandas/    

## Appendix: Credits

This presentation is based on data and ideas from the online course "Programming with Python for Data Science": https://www.edx.org/course/programming-with-python-for-data-science

## Appendix: Run this notebook

GitHub-Repository: https://github.com/feststelltaste/SolvingCrimesWithDataScience

<img src="resources/qrcode.png" />