# TP - Parte 1
## Análisis de datos sobre dataset ["Crímenes reportados en Chicago", año 2024](https://data.cityofchicago.org/Public-Safety/Crimes-2024/dqcy-ctma/about_data) 

### Consignas
El análisis debe abordar los siguientes aspectos:
 * Planteo de al menos tres preguntas a ser respondidas mediante análisis de datos.
    * Se pueden usar como ejemplo las preguntas sugeridas, o proponer otras.
 * Exploración y comprensión de los datos:
    * Cargar el dataset proporcionado y realizar un análisis exploratorio de los datos.
    * Describir las características principales del dataset, incluyendo el número de observaciones, número de variables y tipos de datos.
    * Identificar patrones generales, distribuciones y cualquier anomalía inicial en los datos.
    * Visualizar las variables más importantes para entender sus relaciones y distribuciones.
 * Aplicación de técnicas de visualización:
    * Utilizar técnicas de visualización adecuadas para ilustrar las principales características del dataset.
    * Asegurarse de que las visualizaciones sean claras, concisas y efectivas para comunicar la información.
    * Interpretar los resultados obtenidos a partir de las visualizaciones.
 * Limpieza del dataset:
    * Identificar y tratar los valores faltantes en el dataset.
    * Detectar y manejar los outliers utilizando técnicas estadísticas o visuales apropiadas.

### Preguntas sugeridas según el dataset elegido
 1. ¿Cómo varía la distribución de los crímenes a lo largo de las horas del día, los días de la semana y los meses del año?
 2. ¿Se observan anomalías y/o patrones estacionales?
 3. ¿Hay diferencias significativas entre el número de crímenes en distintos distritos o comunas?
 4. ¿Están las fuerzas policiales bien distribuidas en relación a las características de cada zona? (ej: la mayor cantidad de actividad policial/arrestos se registra en las zonas críticas)
 5. ¿Cómo variaron los crímenes en la ciudad después de algún cambio o evento social importante? (ej: Covid-19, protestas, etc.)


#### Integrantes:
* Mealla Pablo
* Viñas Gustavo

Descripción de columnas según información provista en el origen del dataset

| Nombre columna | Descripción | Tipo de dato |
| --- | --- | --- |
|	ID	|	Unique identifier for the record.	|	Number	|
|	Case Number	|	The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.	|	Text	|
|	Date	|	Date when the incident occurred. this is sometimes a best estimate.	|	Floating Timestamp	|
|	Block	|	The partially redacted address where the incident occurred, placing it on the same block as the actual address.	|	Text	|
|	IUCR	|	The Illinois Uniform Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.	|	Text	|
|	Primary Type	|	The primary description of the IUCR code.	|	Text	|
|	Description	|	The secondary description of the IUCR code, a subcategory of the primary description.	|	Text	|
|	Location Description	|	Description of the location where the incident occurred.	|	Text	|
|	Arrest	|	Indicates whether an arrest was made.	|	Checkbox	|
|	Domestic	|	Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.	|	Checkbox	|
|	Beat	|	Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.	|	Text	|
|	District	|	Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.	|	Text	|
|	Ward	|	The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.	|	Number	|
|	Community Area	|	Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.	|	Text	|
|	FBI Code	|	Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS).See the Chicago Police Department listing of these classifications at https://gis.chicagopolice.org/pages/crime_details.	|	Text	|
|	X Coordinate	|	The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.	|	Number	|
|	Y Coordinate	|	The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.	|	Number	|
|	Year	|	Year the incident occurred.	|	Number	|
|	Updated On	|	Date and time the record was last updated.	|	Floating Timestamp	|
|	Latitude	|	The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.	|	Number	|
|	Longitude	|	The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.	|	Number	|
|	Location	|	The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.	|	Point	|



In [1]:
# Import de librerías
import pandas as pd

In [2]:
# Carga del dataset
df_csv = pd.read_csv("dataset/Crimes_-_2024_20250502.zip")

# Análisis inicial de columnas y cantidad de datos
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 258082 entries, 0 to 258081
Data columns (total 22 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   ID                    258082 non-null  int64  
 1   Case Number           258082 non-null  object 
 2   Date                  258082 non-null  object 
 3   Block                 258082 non-null  object 
 4   IUCR                  258082 non-null  object 
 5   Primary Type          258082 non-null  object 
 6   Description           258082 non-null  object 
 7   Location Description  257054 non-null  object 
 8   Arrest                258082 non-null  bool   
 9   Domestic              258082 non-null  bool   
 10  Beat                  258082 non-null  int64  
 11  District              258082 non-null  int64  
 12  Ward                  258082 non-null  int64  
 13  Community Area        258080 non-null  float64
 14  FBI Code              258082 non-null  object 
 15  

Nos encontramos que el dataset tiene:
   * Columnas: 22 
      * 2 bool
      * 5 float
      * 5 int
      * 10 string
   * Filas (observaciones): 258.082

En principio, vemos que hay columnas con datos nulos, se observan en el conteo de not-null previo.  
Las columnas con datos nulos son:
   * Location Description
   * Community Area
   * X Coordinate
   * Y Coordinate
   * Latitude
   * Longitude
   * Location

In [3]:
# Verificación de que el dataset solo tiene datos del año elegido

print(df_csv["Year"].unique())

[2024]


In [4]:
# Visualización de algunas líneas del dataset

with pd.option_context('display.max_columns', None):
    display(df_csv.head(10))

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,13709672,JJ101940,12/31/2024 11:58:00 PM,014XX E 68TH ST,1310,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,332,3,5,43.0,14,1186817.0,1860189.0,2024,01/08/2025 03:42:09 PM,41.77147,-87.590742,POINT (-87.59074212 41.771470188)
1,13707925,JJ100089,12/31/2024 11:56:00 PM,047XX S DR MARTIN LUTHER KING JR DR,1365,CRIMINAL TRESPASS,TO RESIDENCE,APARTMENT,True,True,223,2,3,38.0,26,1179661.0,1873623.0,2024,01/08/2025 03:42:09 PM,41.808501,-87.616563,POINT (-87.616562762 41.808500903)
2,13708038,JJ100035,12/31/2024 11:55:00 PM,077XX S CICERO AVE,498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",HOTEL / MOTEL,False,True,834,8,18,70.0,04B,1145740.0,1853048.0,2024,01/08/2025 03:42:09 PM,41.752749,-87.741498,POINT (-87.741497836 41.752748627)
3,13709164,JJ101392,12/31/2024 11:53:00 PM,066XX S GREENWOOD AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,321,3,20,42.0,14,1184362.0,1861188.0,2024,01/08/2025 03:42:09 PM,41.774269,-87.59971,POINT (-87.599709962 41.774269351)
4,13707823,JJ100020,12/31/2024 11:50:00 PM,012XX N MENARD AVE,460,BATTERY,SIMPLE,SIDEWALK,False,False,2531,25,29,25.0,08B,1137458.0,1907694.0,2024,01/08/2025 03:42:09 PM,41.902858,-87.770537,POINT (-87.770536741 41.902858242)
5,13707839,JJ100021,12/31/2024 11:46:00 PM,021XX W CULLERTON ST,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,1234,12,25,31.0,08B,1162508.0,1890389.0,2024,01/08/2025 03:42:09 PM,41.854884,-87.679008,POINT (-87.67900769 41.854883985)
6,13707986,JJ100019,12/31/2024 11:45:00 PM,117XX S STATE ST,486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,False,True,532,5,9,53.0,08B,1178352.0,1827293.0,2024,01/08/2025 03:42:09 PM,41.681396,-87.622767,POINT (-87.622767037 41.68139574)
7,13707849,JJ100011,12/31/2024 11:45:00 PM,018XX W MAYPOLE AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,1223,12,27,28.0,14,1164081.0,1901067.0,2024,01/08/2025 03:42:09 PM,41.884152,-87.672933,POINT (-87.672932576 41.884152322)
8,13707847,JJ100007,12/31/2024 11:42:00 PM,029XX W CHICAGO AVE,1345,CRIMINAL DAMAGE,TO CITY OF CHICAGO PROPERTY,CTA BUS,False,False,1211,12,36,24.0,14,1156857.0,1905197.0,2024,01/08/2025 03:42:09 PM,41.895635,-87.699348,POINT (-87.699347915 41.895634912)
9,13707836,JJ100034,12/31/2024 11:40:00 PM,0000X S OAKLEY BLVD,910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,1223,12,27,28.0,07,1161090.0,1899804.0,2024,01/08/2025 03:42:09 PM,41.880749,-87.683951,POINT (-87.683950956 41.880749175)


Las columnas "ID" y "Case Number" identifican univocamente cada crimen. La primera es respecto al dataset, la segunda es respecto a la policia de Chicago.  
La columna "Primary Type" es la categoría del crímen y, junto con "Description", agrupan el incidente ocurrido.  
La codificación de ambas proviene de la columna "IUCR", que es el código que identifica el tipo de crimen acontecido. Tambien se encuentra la columna "FBI Code", con la categoría del crimen correspondiente al FBI.  
Las columnas "Block", "X Coordinate"/"Y Coordinate", "Latitude"/"Longitude" y "Location" representan la ubicación aproximada del crimen, con diferentes formatos.  


Para continuar el análisis, vamos a eliminar las columnas que consideramos innecesarias, ya que no aportan información en el análisis.  
Tambien crearemos nuevas columnas a partir de datos existentes, que nos servirán para analizar otros aspectos de los casos.

In [5]:
df_csv["Date"] = pd.to_datetime(df_csv.Date, format="%m/%d/%Y %I:%M:%S %p")

df_csv["Hour"] = df_csv.Date.dt.hour
df_csv["Day Of Week"] = df_csv.Date.dt.dayofweek
df_csv["Month"] = df_csv.Date.dt.month

drop_columns = ["ID", "Case Number", "Date", "Block", "X Coordinate", "Y Coordinate", "Year", "Updated On", "Location"]
df_csv.drop(drop_columns, inplace=True, axis=1)
