## Análisis del código y resultados del modelo de predicción de severidad de accidentes

Análisis y modelado de un conjunto de datos sobre accidentes de tráfico con el objetivo de predecir la severidad de las lesiones.

##Carga de datos a un dataframe Pandas.

In [1]:
import pandas as pd
import numpy as np

Mounted at /content/drive


In [None]:
df = pd.read_csv('crash_data.csv', low_memory=False)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172105 entries, 0 to 172104
Data columns (total 43 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Report Number                   172105 non-null  object 
 1   Local Case Number               172105 non-null  object 
 2   Agency Name                     172105 non-null  object 
 3   ACRS Report Type                172105 non-null  object 
 4   Crash Date/Time                 172105 non-null  object 
 5   Route Type                      155132 non-null  object 
 6   Road Name                       156168 non-null  object 
 7   Cross-Street Type               155099 non-null  object 
 8   Cross-Street Name               156154 non-null  object 
 9   Off-Road Description            15935 non-null   object 
 10  Municipality                    19126 non-null   object 
 11  Related Non-Motorist            5463 non-null    object 
 12  Collision Type  

##Eliminación de columnas que se consideran irrelevantes para el análisis

In [4]:
columns_to_remove = [
    'Report Number',
    'Local Case Number',
    'Agency Name',
    'Equipment Problems',
    'Driverless Vehicle',
    'ACRS Report Type',
    'Off-Road Description',
    'Municipality',
    'Related Non-Motorist',
    'Non-Motorist Substance Abuse',
    'Person ID',
    'Parked Vehicle',
    'Drivers License State',
    'Vehicle ID',
    'Latitude',
    'Longitude',
    'Location'
]

df = df.drop(columns=columns_to_remove)

##Análisis de la distribución de clases en la columna **"Injury Severity"**, revelando un desequilibrio significativo, donde la clase **"NO APPARENT INJURY"** es la más frecuente.

In [5]:
# Contar la frecuencia de cada clase en la columna 'Injury Severity'
class_counts = df['Injury Severity'].value_counts()

# Mostrar la frecuencia de cada clase en forma de tabla
print(class_counts)


Injury Severity
NO APPARENT INJURY          141185
POSSIBLE INJURY              17482
SUSPECTED MINOR INJURY       11870
SUSPECTED SERIOUS INJURY      1415
FATAL INJURY                   153
Name: count, dtype: int64


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172105 entries, 0 to 172104
Data columns (total 26 columns):
 #   Column                          Non-Null Count   Dtype 
---  ------                          --------------   ----- 
 0   Crash Date/Time                 172105 non-null  object
 1   Route Type                      155132 non-null  object
 2   Road Name                       156168 non-null  object
 3   Cross-Street Type               155099 non-null  object
 4   Cross-Street Name               156154 non-null  object
 5   Collision Type                  171520 non-null  object
 6   Weather                         158751 non-null  object
 7   Surface Condition               151987 non-null  object
 8   Light                           170660 non-null  object
 9   Traffic Control                 146636 non-null  object
 10  Driver Substance Abuse          140781 non-null  object
 11  Driver At Fault                 172105 non-null  object
 12  Injury Severity               

In [8]:
df['Injury Severity'].unique()

array(['NO APPARENT INJURY', 'SUSPECTED MINOR INJURY', 'POSSIBLE INJURY',
       'FATAL INJURY', 'SUSPECTED SERIOUS INJURY'], dtype=object)

##Imputación de los valores faltantes en las columnas utilizando la moda

In [10]:
for column in df.columns:
    if df[column].isnull().any():
        mode_value = df[column].mode()[0]
        df[column].fillna(mode_value, inplace=True)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172105 entries, 0 to 172104
Data columns (total 26 columns):
 #   Column                          Non-Null Count   Dtype 
---  ------                          --------------   ----- 
 0   Crash Date/Time                 172105 non-null  object
 1   Route Type                      172105 non-null  object
 2   Road Name                       172105 non-null  object
 3   Cross-Street Type               172105 non-null  object
 4   Cross-Street Name               172105 non-null  object
 5   Collision Type                  172105 non-null  object
 6   Weather                         172105 non-null  object
 7   Surface Condition               172105 non-null  object
 8   Light                           172105 non-null  object
 9   Traffic Control                 172105 non-null  object
 10  Driver Substance Abuse          172105 non-null  object
 11  Driver At Fault                 172105 non-null  object
 12  Injury Severity               

In [12]:
df.isnull().sum()

Unnamed: 0,0
Crash Date/Time,0
Route Type,0
Road Name,0
Cross-Street Type,0
Cross-Street Name,0
Collision Type,0
Weather,0
Surface Condition,0
Light,0
Traffic Control,0


In [13]:
df['Collision Type'].value_counts()

Unnamed: 0_level_0,count
Collision Type,Unnamed: 1_level_1
SAME DIR REAR END,56340
STRAIGHT MOVEMENT ANGLE,30340
OTHER,19030
SAME DIRECTION SIDESWIPE,16226
SINGLE VEHICLE,15869
HEAD ON LEFT TURN,12926
SAME DIRECTION RIGHT TURN,3832
HEAD ON,3786
SAME DIRECTION LEFT TURN,3715
OPPOSITE DIRECTION SIDESWIPE,2883


##Codificación de las variables categóricas utilizando LabelEncoder para convertirlas en numéricas

In [14]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

for column in df.select_dtypes(include=['object']).columns:
    df[column] = le.fit_transform(df[column].astype(str))

##División del conjunto de datos en conjuntos de entrenamiento (80%) y prueba (20%) utilizando train_test_split.

In [15]:
X = df.drop(columns=['Injury Severity'])
y = df['Injury Severity']

##Entrenamiento y evaluación de modelos:

- Se entrenan y evalúan 5 modelos de clasificación diferentes:
    - **XGBoost:** Un algoritmo de boosting de gradiente que suele ofrecer un buen rendimiento.
    - **Extra Trees Classifier:** Un algoritmo de bosque aleatorio que utiliza árboles de decisión extremadamente aleatorizados.
    - **Decision Tree:** Un algoritmo de árbol de decisión simple.
    - **Random Forest:** Un algoritmo de bosque aleatorio que utiliza árboles de decisión.
    - **SVM (Support Vector Machine):** Un algoritmo que busca un hiperplano que mejor separe las clases.
    - **LightGBM:** Un algoritmo de boosting de gradiente que es conocido por su velocidad y eficiencia.
    - **CatBoost:** Un algoritmo de boosting de gradiente que maneja bien las variables categóricas.
    - **HistGradientBoostingClassifier:** Un algoritmo de boosting de gradiente que utiliza histogramas para mejorar la eficiencia.
- Para cada modelo, se calculan las siguientes métricas de evaluación:
    - **Matriz de confusión:** Muestra el número de verdaderos positivos, verdaderos negativos, falsos positivos y falsos negativos para cada clase.
    - **Accuracy:** La proporción de predicciones correctas sobre el total de predicciones.
    - **Precision:** La proporción de verdaderos positivos sobre el total de positivos predichos.
    - **Recall:** La proporción de verdaderos positivos sobre el total de positivos reales.
    - **F1 Score:** Una medida que combina precision y recall.
    - **Reporte de Clasificación:** Muestra precision, recall, f1-score y soporte para cada clase.

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [17]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
#!pip install XGBoost
import xgboost as xgb
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier



<strong>XGBoost</strong>

In [18]:
model = xgb.XGBClassifier(colsample_bytree = 0.8, learning_rate = 0.1, max_depth = 5, n_estimators = 200, subsample = 1.0)
model.fit(X_train, y_train)

# Predict on the test set
y_pred_xgb = model.predict(X_test)

# Confusion Matrix
conf_matrix_xgb = confusion_matrix(y_test, y_pred_xgb)
print("Confusion Matrix (XGBoost Classifier):")
print(conf_matrix_xgb)

# Accuracy
accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
print("\nAccuracy (XGBoost Classifier):", accuracy_xgb)

# Precision
precision_xgb = precision_score(y_test, y_pred_xgb, average='weighted')
print("Precision (XGBoost Classifier):", precision_xgb)

# Recall
recall_xgb = recall_score(y_test, y_pred_xgb, average='weighted')
print("Recall (XGBoost Classifier):", recall_xgb)

# F1 Score
f1_xgb = f1_score(y_test, y_pred_xgb, average='weighted')
print("F1 Score (XGBoost Classifier):", f1_xgb)

# Print classification report
print("\nClassification Report (XGBoost Classifier):")
print(classification_report(y_test, y_pred_xgb))


Confusion Matrix (XGBoost Classifier):
[[    4    26     1     1     2]
 [    1 28185    23    37     5]
 [    1  3438    26    40    13]
 [    1  2280    29    37     6]
 [    3   227    10    18     7]]

Accuracy (XGBoost Classifier): 0.8209813776473665
Precision (XGBoost Classifier): 0.7281724028596237
Recall (XGBoost Classifier): 0.8209813776473665
F1 Score (XGBoost Classifier): 0.7454025834117611

Classification Report (XGBoost Classifier):
              precision    recall  f1-score   support

           0       0.40      0.12      0.18        34
           1       0.83      1.00      0.90     28251
           2       0.29      0.01      0.01      3518
           3       0.28      0.02      0.03      2353
           4       0.21      0.03      0.05       265

    accuracy                           0.82     34421
   macro avg       0.40      0.23      0.24     34421
weighted avg       0.73      0.82      0.75     34421



<span style="font-size: larger;"><strong>Extra Tree Classifier</strong></span>

In [19]:
extra_trees_model = ExtraTreesClassifier()
extra_trees_model.fit(X_train, y_train)

y_pred_extra_trees = extra_trees_model.predict(X_test)

# Confusion Matrix
conf_matrix_extra_trees = confusion_matrix(y_test, y_pred_extra_trees)
print("Confusion Matrix (Extra Trees Classifier):")
print(conf_matrix_extra_trees)

# Accuracy
accuracy_extra_trees = accuracy_score(y_test, y_pred_extra_trees)
print("\nAccuracy (Extra Trees Classifier):", accuracy_extra_trees)

# Precision
precision_extra_trees = precision_score(y_test, y_pred_extra_trees, average='weighted')
print("Precision (Extra Trees Classifier):", precision_extra_trees)

# Recall
recall_extra_trees = recall_score(y_test, y_pred_extra_trees, average='weighted')
print("Recall (Extra Trees Classifier):", recall_extra_trees)

# F1 Score
f1_extra_trees = f1_score(y_test, y_pred_extra_trees, average='weighted')
print("F1 Score (Extra Trees Classifier):", f1_extra_trees)

# Print classification report
print("\nClassification Report (Extra Trees Classifier):")
print(classification_report(y_test, y_pred_extra_trees))

Confusion Matrix (Extra Trees Classifier):
[[    3    29     0     1     1]
 [    0 28134    79    36     2]
 [    0  3440    47    29     2]
 [    0  2298    34    20     1]
 [    0   250    10     4     1]]

Accuracy (Extra Trees Classifier): 0.8194125679091252
Precision (Extra Trees Classifier): 0.7216780275173488
Recall (Extra Trees Classifier): 0.8194125679091252
F1 Score (Extra Trees Classifier): 0.7440119392500675

Classification Report (Extra Trees Classifier):
              precision    recall  f1-score   support

           0       1.00      0.09      0.16        34
           1       0.82      1.00      0.90     28251
           2       0.28      0.01      0.03      3518
           3       0.22      0.01      0.02      2353
           4       0.14      0.00      0.01       265

    accuracy                           0.82     34421
   macro avg       0.49      0.22      0.22     34421
weighted avg       0.72      0.82      0.74     34421



<span style="font-size: larger;"><strong>Decision Tree</strong></span>

In [20]:
decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train, y_train)

y_pred_decision_tree = decision_tree_model.predict(X_test)

# Confusion Matrix
conf_matrix_decision_tree = confusion_matrix(y_test, y_pred_decision_tree)
print("Confusion Matrix (Decision Tree Classifier):")
print(conf_matrix_decision_tree)

# Accuracy
accuracy_decision_tree = accuracy_score(y_test, y_pred_decision_tree)
print("\nAccuracy (Decision Tree Classifier):", accuracy_decision_tree)

# Precision
precision_decision_tree = precision_score(y_test, y_pred_decision_tree, average='weighted')
print("Precision (Decision Tree Classifier):", precision_decision_tree)

# Recall
recall_decision_tree = recall_score(y_test, y_pred_decision_tree, average='weighted')
print("Recall (Decision Tree Classifier):", recall_decision_tree)

# F1 Score
f1_decision_tree = f1_score(y_test, y_pred_decision_tree, average='weighted')
print("F1 Score (Decision Tree Classifier):", f1_decision_tree)

# Print classification report
print("\nClassification Report (Decision Tree Classifier):")
print(classification_report(y_test, y_pred_decision_tree))

Confusion Matrix (Decision Tree Classifier):
[[    2    11     5    11     5]
 [    9 23225  2849  1984   184]
 [    3  2476   580   402    57]
 [    2  1591   382   324    54]
 [    2   152    45    52    14]]

Accuracy (Decision Tree Classifier): 0.7014613172191395
Precision (Decision Tree Classifier): 0.7180893754007781
Recall (Decision Tree Classifier): 0.7014613172191395
F1 Score (Decision Tree Classifier): 0.7095317502041157

Classification Report (Decision Tree Classifier):
              precision    recall  f1-score   support

           0       0.11      0.06      0.08        34
           1       0.85      0.82      0.83     28251
           2       0.15      0.16      0.16      3518
           3       0.12      0.14      0.13      2353
           4       0.04      0.05      0.05       265

    accuracy                           0.70     34421
   macro avg       0.25      0.25      0.25     34421
weighted avg       0.72      0.70      0.71     34421



<span style="font-size: larger;"><strong>Random Forest</strong></span>

In [21]:
random_forest_model = RandomForestClassifier()
random_forest_model.fit(X_train, y_train)

y_pred_random_forest = random_forest_model.predict(X_test)

# Confusion Matrix
conf_matrix_random_forest = confusion_matrix(y_test, y_pred_random_forest)
print("Confusion Matrix (Random Forest Classifier):")
print(conf_matrix_random_forest)

# Accuracy
accuracy_random_forest = accuracy_score(y_test, y_pred_random_forest)
print("\nAccuracy (Random Forest Classifier):", accuracy_random_forest)

# Precision
precision_random_forest = precision_score(y_test, y_pred_random_forest, average='weighted')
print("Precision (Random Forest Classifier):", precision_random_forest)

# Recall
recall_random_forest = recall_score(y_test, y_pred_random_forest, average='weighted')
print("Recall (Random Forest Classifier):", recall_random_forest)

# F1 Score
f1_random_forest = f1_score(y_test, y_pred_random_forest, average='weighted')
print("F1 Score (Random Forest Classifier):", f1_random_forest)

# Print classification report
print("\nClassification Report (Random Forest Classifier):")
print(classification_report(y_test, y_pred_random_forest))

Confusion Matrix (Random Forest Classifier):
[[    1    31     0     1     1]
 [    0 28191    36    24     0]
 [    0  3468    27    20     3]
 [    0  2309    23    21     0]
 [    0   253     2     8     2]]

Accuracy (Random Forest Classifier): 0.8204874931001424
Precision (Random Forest Classifier): 0.7298265279070975
Recall (Random Forest Classifier): 0.8204874931001424
F1 Score (Random Forest Classifier): 0.7432555347844091

Classification Report (Random Forest Classifier):
              precision    recall  f1-score   support

           0       1.00      0.03      0.06        34
           1       0.82      1.00      0.90     28251
           2       0.31      0.01      0.01      3518
           3       0.28      0.01      0.02      2353
           4       0.33      0.01      0.01       265

    accuracy                           0.82     34421
   macro avg       0.55      0.21      0.20     34421
weighted avg       0.73      0.82      0.74     34421



<strong>Support Vector Machine (SVM)</strong>

In [22]:
svm_model = SVC()
svm_model.fit(X_train, y_train)

y_pred_svm = svm_model.predict(X_test)

# Confusion Matrix
conf_matrix_svm = confusion_matrix(y_test, y_pred_svm)
print("Confusion Matrix (SVM Classifier):")
print(conf_matrix_svm)

# Accuracy
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print("\nAccuracy (SVM Classifier):", accuracy_svm)

# Precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')
print("Precision (SVM Classifier):", precision_svm)

# Recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')
print("Recall (SVM Classifier):", recall_svm)

# F1 Score
f1_svm = f1_score(y_test, y_pred_svm, average='weighted')
print("F1 Score (SVM Classifier):", f1_svm)

# Print classification report
print("\nClassification Report (SVM Classifier):")
print(classification_report(y_test, y_pred_svm))


Confusion Matrix (SVM Classifier):
[[    0    34     0     0     0]
 [    0 28251     0     0     0]
 [    0  3518     0     0     0]
 [    0  2353     0     0     0]
 [    0   265     0     0     0]]

Accuracy (SVM Classifier): 0.8207489613898492
Precision (SVM Classifier): 0.6736288576225162
Recall (SVM Classifier): 0.8207489613898492
F1 Score (SVM Classifier): 0.73994699094411

Classification Report (SVM Classifier):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        34
           1       0.82      1.00      0.90     28251
           2       0.00      0.00      0.00      3518
           3       0.00      0.00      0.00      2353
           4       0.00      0.00      0.00       265

    accuracy                           0.82     34421
   macro avg       0.16      0.20      0.18     34421
weighted avg       0.67      0.82      0.74     34421



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


<strong>LightGBM</strong>

In [29]:
#!pip install lightgbm
import lightgbm as lgb

# Crear el modelo LightGBM
lgb_model = lgb.LGBMClassifier()

# Entrenar el modelo
lgb_model.fit(X_train, y_train)

# Realizar predicciones en el conjunto de prueba
y_pred_lgb = lgb_model.predict(X_test)

# Matriz de confusión
conf_matrix_lgb = confusion_matrix(y_test, y_pred_lgb)
print("Confusion Matrix (LightGBM Classifier):")
print(conf_matrix_lgb)

# Precisión
accuracy_lgb = accuracy_score(y_test, y_pred_lgb)
print("\nAccuracy (LightGBM Classifier):", accuracy_lgb)

# Precisión
precision_lgb = precision_score(y_test, y_pred_lgb, average='weighted')
print("Precision (LightGBM Classifier):", precision_lgb)

# Recuperación
recall_lgb = recall_score(y_test, y_pred_lgb, average='weighted')
print("Recall (LightGBM Classifier):", recall_lgb)

# Puntuación F1
f1_lgb = f1_score(y_test, y_pred_lgb, average='weighted')
print("F1 Score (LightGBM Classifier):", f1_lgb)

# Imprimir informe de clasificación
print("\nClassification Report (LightGBM Classifier):")
print(classification_report(y_test, y_pred_lgb))


[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.032850 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1650
[LightGBM] [Info] Number of data points in the train set: 137684, number of used features: 25
[LightGBM] [Info] Start training from score -7.053593
[LightGBM] [Info] Start training from score -0.198158
[LightGBM] [Info] Start training from score -2.288479
[LightGBM] [Info] Start training from score -2.671882
[LightGBM] [Info] Start training from score -4.785199
Confusion Matrix (LightGBM Classifier):
[[    3    25     1     1     4]
 [  111 28036    37    39    28]
 [   21  3433    15    32    17]
 [   10  2274    21    32    16]
 [    5   231     6    16     7]]

Accuracy (LightGBM Classifier): 0.8161587403038842
Precision (LightGBM Classifier): 0.7149607208119397
Recall (LightGBM Classifier): 0.8161587403038842
F1 Score (LightGB

<strong>Catboost</strong>

In [27]:
#!pip install catboost

from catboost import CatBoostClassifier

# Creamos el modelo CatBoost
catboost_model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, loss_function='MultiClass', random_seed=42, verbose=False)

# Entrenamos el modelo
catboost_model.fit(X_train, y_train)

# Realizamos predicciones
y_pred_catboost = catboost_model.predict(X_test)

# --- Evaluación del modelo ---

# Matriz de confusión
print("Matriz de Confusión (CatBoost):\n", confusion_matrix(y_test, y_pred_catboost))

# Accuracy
print("\nAccuracy (CatBoost):", accuracy_score(y_test, y_pred_catboost))

# Precision
print("Precision (CatBoost):", precision_score(y_test, y_pred_catboost, average='weighted', zero_division=0))

# Recall
print("Recall (CatBoost):", recall_score(y_test, y_pred_catboost, average='weighted', zero_division=0))

# F1 Score
print("F1 Score (CatBoost):", f1_score(y_test, y_pred_catboost, average='weighted', zero_division=0))

# Reporte de Clasificación
print("\nReporte de Clasificación (CatBoost):\n", classification_report(y_test, y_pred_catboost))


Matriz de Confusión (CatBoost):
 [[    0    31     0     1     2]
 [    0 28234    11     2     4]
 [    0  3486    14    16     2]
 [    0  2322    11    18     2]
 [    0   252     4     5     4]]

Accuracy (CatBoost): 0.8213009500014526
Precision (CatBoost): 0.7423746911657287
Recall (CatBoost): 0.8213009500014526
F1 Score (CatBoost): 0.7426888217393944

Reporte de Clasificación (CatBoost):
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        34
           1       0.82      1.00      0.90     28251
           2       0.35      0.00      0.01      3518
           3       0.43      0.01      0.02      2353
           4       0.29      0.02      0.03       265

    accuracy                           0.82     34421
   macro avg       0.38      0.21      0.19     34421
weighted avg       0.74      0.82      0.74     34421



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


<strong>HistGradientBoostingClassifier</strong>

In [30]:
from sklearn.ensemble import HistGradientBoostingClassifier

# Creamos el modelo HistGradientBoostingClassifier
hist_gb_model = HistGradientBoostingClassifier(random_state=42)

# Entrenamos el modelo
hist_gb_model.fit(X_train, y_train)

# Realizamos predicciones
y_pred_hist_gb = hist_gb_model.predict(X_test)

# --- Evaluación del modelo ---

# Matriz de confusión
print("Matriz de Confusión (HistGradientBoostingClassifier):\n", confusion_matrix(y_test, y_pred_hist_gb))

# Accuracy
print("\nAccuracy (HistGradientBoostingClassifier):", accuracy_score(y_test, y_pred_hist_gb))

# Precision
print("Precision (HistGradientBoostingClassifier):", precision_score(y_test, y_pred_hist_gb, average='weighted'))

# Recall
print("Recall (HistGradientBoostingClassifier):", recall_score(y_test, y_pred_hist_gb, average='weighted'))

# F1 Score
print("F1 Score (HistGradientBoostingClassifier):", f1_score(y_test, y_pred_hist_gb, average='weighted'))

# Reporte de Clasificación
print("\nReporte de Clasificación (HistGradientBoostingClassifier):\n", classification_report(y_test, y_pred_hist_gb))


Matriz de Confusión (HistGradientBoostingClassifier):
 [[    3    25     0     1     5]
 [   76 28145     4     2    24]
 [   11  3475     3     5    24]
 [    8  2319     1     6    19]
 [    4   243     3     2    13]]

Accuracy (HistGradientBoostingClassifier): 0.8183957467824874
Precision (HistGradientBoostingClassifier): 0.7300151834085392
Recall (HistGradientBoostingClassifier): 0.8183957467824874
F1 Score (HistGradientBoostingClassifier): 0.7408319473013496

Reporte de Clasificación (HistGradientBoostingClassifier):
               precision    recall  f1-score   support

           0       0.03      0.09      0.04        34
           1       0.82      1.00      0.90     28251
           2       0.27      0.00      0.00      3518
           3       0.38      0.00      0.01      2353
           4       0.15      0.05      0.07       265

    accuracy                           0.82     34421
   macro avg       0.33      0.23      0.21     34421
weighted avg       0.73      0.82   

## Conclusiones del análisis y modelado de la severidad de accidentes:

- Los modelos XGBoost, LightGBM, CatBoost e HistGradientBoostingClassifier presentan un accuracy superior al 80%, pero con precision, recall y f1-score significativamente más bajos para las clases minoritarias (FATAL INJURY, SUSPECTED SERIOUS INJURY).
- Los modelos Extra Trees Classifier, Decision Tree y Random Forest tienen una precisión ligeramente inferior y el mismo problema con las clases minoritarias.
- El modelo SVM presenta una precision, recall y f1-score muy bajos para todas las clases excepto para la clase mayoritaria "NO APPARENT INJURY".
- El desequilibrio de clases en el conjunto de datos afecta significativamente el rendimiento de los modelos, especialmente en la predicción de las clases minoritarias.