# Data Exploration: Historical Traffic Accidents from INEGI
This notebook explores traffic accident historical data from INEGI's "Accidentes de Tránsito Terrestre en Zonas Urbanas y Suburbanas" (ATUS). The purpose of this exploration is to understand the structure, quality, and key insights from the dataset before merging it with other traffic-related data sources.

## What is INEGI?

INEGI (Instituto Nacional de Estadística y Geografía) is a Mexican government agency responsible for collecting, processing, and disseminating statistical information about the country’s population, economy, and geography. It provides society and the government with accurate and timely data to support decision-making and public policies.

### About ATUS (Accidentes de Tránsito Terrestre en Zonas Urbanas y Suburbanas)
ATUS is a national dataset collected annually by INEGI. It provides detailed statistics on traffic accidents in non-federal areas, offering insights into transportation risks and aiding in the planning and organization of road infrastructure and accident prevention initiatives.

### Key Characteristics:
- **Geographic Coverage**: Nationwide data, disaggregated at the municipal level.
- **Temporal Coverage**: Data is available annually since 1997, with dissemination occurring seven months after the reference year.
- **Data Sources**: Administrative records from civic courts, public safety agencies, and municipal transit offices.

## Dataset Features

The traffic accident dataset provides the following features:
  - Accident occurrence details: date, time, and location
  - Accident classifications and types
  - Types of vehicles involved.
  - Causes of accidents
  - Road surface conditions
  - Types and classifications of victims

In [4]:
from dbfread import DBF

In [12]:
import pandas as pd

In [8]:
table = DBF('../data/atus_2023.dbf', load=True)
print(table.records[1])

OrderedDict({'EDO': 1, 'MES': 1, 'ANIO': 2023, 'MPIO': 1, 'HORA': 0, 'MINUTOS': 0, 'DIA': 1, 'DIASEMANA': 7, 'URBANA': 0, 'SUBURBANA': 2, 'TIPACCID': 3, 'AUTOMOVIL': 1, 'CAMPASAJ': 0, 'MICROBUS': 0, 'PASCAMION': 0, 'OMNIBUS': 0, 'TRANVIA': 0, 'CAMIONETA': 0, 'CAMION': 0, 'TRACTOR': 0, 'FERROCARRI': 0, 'MOTOCICLET': 0, 'BICICLETA': 0, 'OTROVEHIC': 0, 'CAUSAACCI': 4, 'CAPAROD': 1, 'SEXO': 2, 'ALIENTO': 6, 'CINTURON': 9, 'EDAD': 41, 'CONDMUERTO': 0, 'CONDHERIDO': 1, 'PASAMUERTO': 0, 'PASAHERIDO': 0, 'PEATMUERTO': 0, 'PEATHERIDO': 0, 'CICLMUERTO': 0, 'CICLHERIDO': 0, 'OTROMUERTO': 0, 'OTROHERIDO': 0})


In [14]:
dataResult = pd.DataFrame(iter(table))
dataResult.head()

Unnamed: 0,EDO,MES,ANIO,MPIO,HORA,MINUTOS,DIA,DIASEMANA,URBANA,SUBURBANA,...,CONDMUERTO,CONDHERIDO,PASAMUERTO,PASAHERIDO,PEATMUERTO,PEATHERIDO,CICLMUERTO,CICLHERIDO,OTROMUERTO,OTROHERIDO
0,1,1,2023,1,0,0,1,7,1,0,...,0,1,0,0,0,0,0,0,0,0
1,1,1,2023,1,0,0,1,7,0,2,...,0,1,0,0,0,0,0,0,0,0
2,1,1,2023,1,2,20,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
3,1,1,2023,1,3,20,1,7,1,0,...,1,0,0,1,0,0,0,0,0,0
4,1,1,2023,1,6,0,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
dataResult = dataResult[(dataResult.EDO == 19) & (dataResult.MPIO == 39)]

In [25]:
dataResult

Unnamed: 0,EDO,MES,ANIO,MPIO,HORA,MINUTOS,DIA,DIASEMANA,URBANA,SUBURBANA,...,CONDMUERTO,CONDHERIDO,PASAMUERTO,PASAHERIDO,PEATMUERTO,PEATHERIDO,CICLMUERTO,CICLHERIDO,OTROMUERTO,OTROHERIDO
18500,19,1,2023,39,1,18,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
18501,19,1,2023,39,4,44,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
18502,19,1,2023,39,7,17,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
18503,19,1,2023,39,8,6,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
18504,19,1,2023,39,10,10,1,7,1,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
384617,19,12,2023,39,20,43,31,7,1,0,...,0,0,0,0,0,0,0,0,0,0
384618,19,12,2023,39,20,56,31,7,1,0,...,0,1,0,0,0,0,0,0,0,0
384619,19,12,2023,39,21,45,31,7,1,0,...,0,1,0,1,0,0,0,0,0,0
384620,19,12,2023,39,22,7,31,7,1,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
dataResult.SEXO.value_counts()

SEXO
2    23198
3     7724
1     2744
Name: count, dtype: int64

In [29]:
dataResult.DIASEMANA.value_counts()

DIASEMANA
2    5438
5    5410
3    5353
4    5254
1    5133
6    4394
7    2684
Name: count, dtype: int64

In [31]:
dataResult.CAUSAACCI.value_counts()

CAUSAACCI
1    32115
4     1551
Name: count, dtype: int64

In [37]:
dataResult.TIPACCID.value_counts()

TIPACCID
1     27302
4      2717
10     2222
2       747
5       415
6        90
11       73
8        59
12       29
7         7
3         5
Name: count, dtype: int64