# TPM034A Machine Learning for socio-technical systems 
## `Mini-project #1: Criminality Anticipation System (CAS)`

**Delft University of Technology**<br>
**Q2 2022**<br>
**Module manager:** Dr. Sander van Cranenburgh <br>
**Instructors:** Dr. Sander van Cranenburgh, Dr. Nadia Metoui, Dr. Amir Pooyan Afghari <br>
**TAs:**  Francisco Garrido Valenzuela & Lucas Spierenburg <br>

## `Learning objectives:`
This mini-project addresses LO3, LO4, LO5 and LO6 in the course.

After the course, students can:
1.	explain fundamental concepts of machine learning (ML).
2.	conceptually explain the workings of a selected number of ML models and eXplainable AI (XAI) techniques, and apply these to empirical data.
3.	**identify applications of ML and XAI techniques in real-world socio-technical systems.**
4.	**examine the impact of ML-based solutions and interventions on individuals, organisations, and society through XAI.**
5.	**conduct an in-depth analysis of a real-world socio-technical challenge, by applying ML and XAI to empirical data.**
6.	**reflect on the strengths and limitations of ML and XAI in real-world socio-technical systems.**

## `Project description` <br>

### **Introduction**

The Dutch police allocates resources over time and space, to prevent crime or to be as near as possible to crime when it happens. The police uses an algorithmic system for support in this allocation. This system uses socio-demographic data (for example, average net income in municipalities), spatial data (for example, the vicinity of shops/facilities in municipalities), and crime related data (for example, the address of suspects of a specific crime) in order to predict whether certain municipality will have a low or a high risk of the occurrence of a certain type of crime. The output of this system is presented with a heatmap of crime risk across different municipalities in the Netherlands. Subsequently, allocation of resources can be done in the longer term i.e. police officers will be allocated to high-risk municipalities for the next year.

*You are asked to assist the Dutch police in investigating the best allocation of resources.*<br>

### **Data**

You have access to the following datasets:
1. Crime data per municipality, buurt, and wijk (in 2016, 2017 and 2018)
2. Socio-economic attributes of municipality, buurt, and wijk
3. Spatial attributes of municipality, buurt, and wijk

You may assume that the socio-economic attributes and spatial data have not substantially changed from 2016 to 2018.
For a translation of the columns in English, please check [this file](source/dictionary.csv). 

### **Tasks and grading**

There are 8 tasks in this project. In total, 10 points can be earned for these 8 tasks. The weight per task is shown below.
1. **Data preparation:** construct data from multiple data sources, separate training and testing data, handle the missing data, handle outliers. [1 point]
2. **Data discovery and visualisation:** investigate the distribution of variables, the correlation between variables, etc. [1 point]
3. **Selection and application of a proper analytical technique:** create a regression or ML model to predict the high-risk areas from socio-economic and spatial data, at the wijk level in the Netherlands. [1 point]
4. **Model evaluation and output visualization:** evaluate the prediction ability of the selected model and create heatmaps of model predictions. [1 point]
5. **Model explanation:** identify top 5 factors that have the most contribution to crime risk at wijks in the Netherlands. [1 point]
6. **Reflection (a):** name two strengths and two limitations of using your selected model to predict high-risk wijks in the Netherlands. [2 points]
7. **Reflection (b):** discuss the impact of these strengths and limitations on individuals, organisations, and society. [2 points]
8. **Reflection (c):** propose an alternative potential solution to mitigate the most severe limitation. [1 point]


### **Grading criteria:**

For the first 5 tasks:
**Correctness of methods and techniques (45%)**
**Completeness (45%)**
**Coding skills (10%)**

For tasks 6, 7 and 8:
**Depth of critical thinking and creativity (60%)**
**Completeness (40%)**

### **Submission**
When you finish the project, please submit the Jupyter Notebook file of your work to Brightspace and prepare a final presentation (including the results of the tasks) to be delivered on the presentations day.

This project is a group project and so each group must submit one Jupyter Notebook file. However, it is expected that all members of the group contribute to the project. **Please prepare a short statement about the "Members contributions" in your final presentation and outline who did what in the project.**

The deadline for submission is **13/01/2023**.

In [1]:
# Getting the data

## You can use the following method to get the data for this project
## Just choose:
## Aggregation level (agg_level): 'GM', 'WK' or 'BU' (for Gemeente, wijk or buurt)
## Year (year): 2016, 2017 or 2018
## Language (language): 'EN' or 'ND'
## If you want to get the geo column (GeoDataFrame) use add_geo as True, if not
## add_geo == False to get a simple DataFrame
## The methods run in around 2 minutes, the you can store the results in variables and files to do not download again

import source.criminality as cmty

df = cmty.get_database(agg_level = 'GM', year = 2016, language = 'EN', add_geo = True)

Downloading crime data...
... Crime data done
Downloading socio economic data...
... Socio economic data done
Aggregating and mixing data...
... aggregating and mixing done


In [2]:
df.head()

Unnamed: 0,ID,DistrictsAndNeighborhoods,Gemeentaam_1,TypeRegion_2,Coding_3,Classificationchange_4,NumberofResidents_5,TotalPowerForgementAndViolence_6,TotalPower Crimes_7,TotalTheft_8,...,OppervlakteLand_101,OppervlakteWater_102,MeestVoorkomendePostcode_103,Dekkingspercentage_104,MateVanStedelijkheid_105,Omgevingsadressendichtheid_106,TotaalDiefstalUitWoningSchuurED_107,VernielingMisdrijfTegenOpenbareOrde_108,GeweldsEnSeksueleMisdrijven_109,geometry
0,0,Aa en Hunze,Aa en Hunze,Gemeente,GM1680,,25243,507.0,326.0,237.0,...,27625,262,.,,5.0,278.0,3.0,4.0,3.0,"POLYGON ((239585.806 560889.086, 239574.730 56..."
1,76,Aalburg,Aalburg,Gemeente,GM0738,,13038,269.0,189.0,130.0,...,5041,276,.,,5.0,372.0,1.0,3.0,3.0,
2,99,Aalsmeer,Aalsmeer,Gemeente,GM0358,,31299,950.0,680.0,561.0,...,2013,1216,.,,4.0,901.0,3.0,4.0,4.0,"POLYGON ((114563.503 478863.692, 114564.123 47..."
3,112,Aalten,Aalten,Gemeente,GM0197,,26912,520.0,374.0,274.0,...,9654,52,.,,4.0,780.0,2.0,2.0,3.0,"POLYGON ((241102.335 438964.060, 241140.888 43..."
4,146,Achtkarspelen,Achtkarspelen,Gemeente,GM0059,,28007,598.0,354.0,255.0,...,10226,172,.,,5.0,421.0,1.0,4.0,4.0,"POLYGON ((198897.831 582340.005, 198885.907 58..."
