# Proceso ETL de datos de ***"Accidentes en Londres 2015-2018" (VERIFICACIÓN)***

Para verificar que tanto las uniones de los datos mediante la función de Pandas "***merge***" y las inserciones de los registros fueron aplicadas correctamente se realizará lo siguiente:

## Importación de *librerías base*:

In [2]:
import pandas as pd
from sqlalchemy import create_engine

## Conexión con la base de datos

In [3]:
engine = create_engine('postgresql+psycopg2://postgres:12345678@localhost:5432/accidents')

## Obtención de todos los registros de cada tabla.

In [4]:
df_accidents = pd.read_sql("SELECT * FROM accidents", engine)
df_route = pd.read_sql("SELECT * FROM routes", engine)
df_operator = pd.read_sql("SELECT * FROM operator", engine)
df_group_name = pd.read_sql("SELECT * FROM group_name", engine)
df_bus_garage = pd.read_sql("SELECT * FROM bus_garage", engine)
df_borough = pd.read_sql("SELECT * FROM borough", engine)
df_injury_result_description = pd.read_sql("SELECT * FROM injury_result_description", engine)
df_incident_event_type = pd.read_sql("SELECT * FROM incident_event_type", engine)
df_victim_category = pd.read_sql("SELECT * FROM victim_category", engine)
df_victims_sex = pd.read_sql("SELECT * FROM victims_sex", engine)
df_victims_age = pd.read_sql("SELECT * FROM victims_age", engine)

In [5]:
df_accidents

Unnamed: 0,id_accident,date_of_incident,id_route,id_operator,id_group_name,id_bus_garage,id_borough,id_injury_result_description,id_incident_event,id_victim_category,id_victims_sex,id_victims_age
0,1,2015-01-01,1,1,1,1,1,1,1,1,1,1
1,2,2015-01-01,2,2,2,1,2,1,1,1,1,2
2,3,2015-01-01,3,3,3,1,3,2,1,1,1,3
3,4,2015-01-01,3,3,3,1,4,2,1,1,1,3
4,5,2015-01-01,4,2,2,1,5,3,1,2,2,3
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,23154,2018-09-01,612,3,3,57,28,1,6,4,1,4
23154,23155,2018-09-01,612,5,4,35,35,1,6,4,1,4
23155,23156,2018-09-01,612,5,4,65,29,1,7,11,1,4
23156,23157,2018-09-01,612,5,4,41,7,1,6,4,1,4


## Unión de los elementos

A partir de los ID's en cada registro de la tabla principal "***accidents***" se aplicará una unión con las tablas de las demás dimensiones:

In [6]:
merged_df  = df_accidents.merge(df_route, on='id_route', how='left')
merged_df

Unnamed: 0,id_accident,date_of_incident,id_route,id_operator,id_group_name,id_bus_garage,id_borough,id_injury_result_description,id_incident_event,id_victim_category,id_victims_sex,id_victims_age,route_name
0,1,2015-01-01,1,1,1,1,1,1,1,1,1,1,1
1,2,2015-01-01,2,2,2,1,2,1,1,1,1,2,4
2,3,2015-01-01,3,3,3,1,3,2,1,1,1,3,5
3,4,2015-01-01,3,3,3,1,4,2,1,1,1,3,5
4,5,2015-01-01,4,2,2,1,5,3,1,2,2,3,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...
23153,23154,2018-09-01,612,3,3,57,28,1,6,4,1,4,(blank)
23154,23155,2018-09-01,612,5,4,35,35,1,6,4,1,4,(blank)
23155,23156,2018-09-01,612,5,4,65,29,1,7,11,1,4,(blank)
23156,23157,2018-09-01,612,5,4,41,7,1,6,4,1,4,(blank)


In [7]:
# Realizar las uniones adicionales con los DataFrames restantes
merged_df = merged_df.merge(df_operator, left_on='id_operator', right_on='id_operator', how='left')
merged_df = merged_df.merge(df_group_name, left_on='id_group_name', right_on='id_group_name', how='left')
merged_df = merged_df.merge(df_bus_garage, left_on='id_bus_garage', right_on='id_bus_garage', how='left')
merged_df = merged_df.merge(df_borough, left_on='id_borough', right_on='id_borough', how='left')

In [8]:
merged_df

Unnamed: 0,id_accident,date_of_incident,id_route,id_operator,id_group_name,id_bus_garage,id_borough,id_injury_result_description,id_incident_event,id_victim_category,id_victims_sex,id_victims_age,route_name,operator_name,group_name,bus_garage_name,borough_name
0,1,2015-01-01,1,1,1,1,1,1,1,1,1,1,1,London General,Go-Ahead,Garage Not Available,Southwark
1,2,2015-01-01,2,2,2,1,2,1,1,1,1,2,4,Metroline,Metroline,Garage Not Available,Islington
2,3,2015-01-01,3,3,3,1,3,2,1,1,1,3,5,East London,Stagecoach,Garage Not Available,Havering
3,4,2015-01-01,3,3,3,1,4,2,1,1,1,3,5,East London,Stagecoach,Garage Not Available,None London Borough
4,5,2015-01-01,4,2,2,1,5,3,1,2,2,3,6,Metroline,Metroline,Garage Not Available,Westminster
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23153,23154,2018-09-01,612,3,3,57,28,1,6,4,1,4,(blank),East London,Stagecoach,West Ham,Newham
23154,23155,2018-09-01,612,5,4,35,35,1,6,4,1,4,(blank),London United,London United,Hounslow,Not specified
23155,23156,2018-09-01,612,5,4,65,29,1,7,11,1,4,(blank),London United,London United,Park Royal,Harrow
23156,23157,2018-09-01,612,5,4,41,7,1,6,4,1,4,(blank),London United,London United,Shepherds Bush,Hammersmith & Fulham


In [9]:
merged_df = merged_df.merge(df_injury_result_description, left_on='id_injury_result_description', right_on='id_injury_result_description', how='left')
merged_df = merged_df.merge(df_incident_event_type, left_on='id_incident_event', right_on='id_incident_event', how='left')
merged_df = merged_df.merge(df_victim_category, left_on='id_victim_category', right_on='id_victim_category', how='left')
merged_df = merged_df.merge(df_victims_sex, left_on='id_victims_sex', right_on='id_victims_sex', how='left')
merged_df = merged_df.merge(df_victims_age, left_on='id_victims_age', right_on='id_victims_age', how='left')

In [10]:
merged_df

Unnamed: 0,id_accident,date_of_incident,id_route,id_operator,id_group_name,id_bus_garage,id_borough,id_injury_result_description,id_incident_event,id_victim_category,...,route_name,operator_name,group_name,bus_garage_name,borough_name,injury_result_description_name,incident_event_type_name,victim_category_name,victims_sex_name,victims_age_name
0,1,2015-01-01,1,1,1,1,1,1,1,1,...,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2,2015-01-01,2,2,2,1,2,1,1,1,...,4,Metroline,Metroline,Garage Not Available,Islington,Injuries treated on scene,Onboard Injuries,Passenger,Male,Unknown
2,3,2015-01-01,3,3,3,1,3,2,1,1,...,5,East London,Stagecoach,Garage Not Available,Havering,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
3,4,2015-01-01,3,3,3,1,4,2,1,1,...,5,East London,Stagecoach,Garage Not Available,None London Borough,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
4,5,2015-01-01,4,2,2,1,5,3,1,2,...,6,Metroline,Metroline,Garage Not Available,Westminster,Reported Minor Injury - Treated at Hospital,Onboard Injuries,Pedestrian,Female,Elderly
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23153,23154,2018-09-01,612,3,3,57,28,1,6,4,...,(blank),East London,Stagecoach,West Ham,Newham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23154,23155,2018-09-01,612,5,4,35,35,1,6,4,...,(blank),London United,London United,Hounslow,Not specified,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23155,23156,2018-09-01,612,5,4,65,29,1,7,11,...,(blank),London United,London United,Park Royal,Harrow,Injuries treated on scene,Slip Trip Fall,Operational Staff,Male,Adult
23156,23157,2018-09-01,612,5,4,41,7,1,6,4,...,(blank),London United,London United,Shepherds Bush,Hammersmith & Fulham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult


## Agregación de la columna "***Year***"

El archivo original contiene la columna "***Year***" en la primera posición, por ello a partir de la columna "date_of_incident" es posible obtener su respectivo valor

In [11]:
merged_df['date_of_incident'] = pd.to_datetime(merged_df['date_of_incident'])
merged_df['Year'] = merged_df['date_of_incident'].dt.year
merged_df

Unnamed: 0,id_accident,date_of_incident,id_route,id_operator,id_group_name,id_bus_garage,id_borough,id_injury_result_description,id_incident_event,id_victim_category,...,operator_name,group_name,bus_garage_name,borough_name,injury_result_description_name,incident_event_type_name,victim_category_name,victims_sex_name,victims_age_name,Year
0,1,2015-01-01,1,1,1,1,1,1,1,1,...,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child,2015
1,2,2015-01-01,2,2,2,1,2,1,1,1,...,Metroline,Metroline,Garage Not Available,Islington,Injuries treated on scene,Onboard Injuries,Passenger,Male,Unknown,2015
2,3,2015-01-01,3,3,3,1,3,2,1,1,...,East London,Stagecoach,Garage Not Available,Havering,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly,2015
3,4,2015-01-01,3,3,3,1,4,2,1,1,...,East London,Stagecoach,Garage Not Available,None London Borough,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly,2015
4,5,2015-01-01,4,2,2,1,5,3,1,2,...,Metroline,Metroline,Garage Not Available,Westminster,Reported Minor Injury - Treated at Hospital,Onboard Injuries,Pedestrian,Female,Elderly,2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23153,23154,2018-09-01,612,3,3,57,28,1,6,4,...,East London,Stagecoach,West Ham,Newham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult,2018
23154,23155,2018-09-01,612,5,4,35,35,1,6,4,...,London United,London United,Hounslow,Not specified,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult,2018
23155,23156,2018-09-01,612,5,4,65,29,1,7,11,...,London United,London United,Park Royal,Harrow,Injuries treated on scene,Slip Trip Fall,Operational Staff,Male,Adult,2018
23156,23157,2018-09-01,612,5,4,41,7,1,6,4,...,London United,London United,Shepherds Bush,Hammersmith & Fulham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult,2018


## Reubicación de las columnas.

In [12]:
df_accidents_final = merged_df[['Year', 'date_of_incident', 'route_name', 'operator_name', 'group_name', 'bus_garage_name', 'borough_name', 'injury_result_description_name', 'incident_event_type_name', 'victim_category_name', 'victims_sex_name', 'victims_age_name']]
df_accidents_final

Unnamed: 0,Year,date_of_incident,route_name,operator_name,group_name,bus_garage_name,borough_name,injury_result_description_name,incident_event_type_name,victim_category_name,victims_sex_name,victims_age_name
0,2015,2015-01-01,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2015,2015-01-01,4,Metroline,Metroline,Garage Not Available,Islington,Injuries treated on scene,Onboard Injuries,Passenger,Male,Unknown
2,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,Havering,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
3,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,None London Borough,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
4,2015,2015-01-01,6,Metroline,Metroline,Garage Not Available,Westminster,Reported Minor Injury - Treated at Hospital,Onboard Injuries,Pedestrian,Female,Elderly
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,2018,2018-09-01,(blank),East London,Stagecoach,West Ham,Newham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23154,2018,2018-09-01,(blank),London United,London United,Hounslow,Not specified,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23155,2018,2018-09-01,(blank),London United,London United,Park Royal,Harrow,Injuries treated on scene,Slip Trip Fall,Operational Staff,Male,Adult
23156,2018,2018-09-01,(blank),London United,London United,Shepherds Bush,Hammersmith & Fulham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult


## Renombramiento de las columnas.

El archivo original tiene diferentes nombres a las columnas de cada tabla de la base de datos, por ello hay que cambiarlos.

In [13]:
df_accidents_final = df_accidents_final.rename(columns= {
    'date_of_incident': 'Date Of Incident',
    'route_name': 'Route',
    'operator_name': 'Operator',
    'group_name': 'Group Name',
    'bus_garage_name': 'Bus Garage',
    'borough_name': 'Borough',
    'injury_result_description_name': 'Injury Result Description',
    'incident_event_type_name': 'Incident Event Type',
    'victim_category_name': 'Victim Category',
    'victims_sex_name': 'Victims Sex',
    'victims_age_name': 'Victims Age'
})
df_accidents_final

Unnamed: 0,Year,Date Of Incident,Route,Operator,Group Name,Bus Garage,Borough,Injury Result Description,Incident Event Type,Victim Category,Victims Sex,Victims Age
0,2015,2015-01-01,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2015,2015-01-01,4,Metroline,Metroline,Garage Not Available,Islington,Injuries treated on scene,Onboard Injuries,Passenger,Male,Unknown
2,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,Havering,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
3,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,None London Borough,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
4,2015,2015-01-01,6,Metroline,Metroline,Garage Not Available,Westminster,Reported Minor Injury - Treated at Hospital,Onboard Injuries,Pedestrian,Female,Elderly
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,2018,2018-09-01,(blank),East London,Stagecoach,West Ham,Newham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23154,2018,2018-09-01,(blank),London United,London United,Hounslow,Not specified,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23155,2018,2018-09-01,(blank),London United,London United,Park Royal,Harrow,Injuries treated on scene,Slip Trip Fall,Operational Staff,Male,Adult
23156,2018,2018-09-01,(blank),London United,London United,Shepherds Bush,Hammersmith & Fulham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult


## Carga del archivo original

In [14]:
df_original = pd.read_excel('src/accidents.xlsx')

In [15]:
df_original

Unnamed: 0,Year,Date Of Incident,Route,Operator,Group Name,Bus Garage,Borough,Injury Result Description,Incident Event Type,Victim Category,Victims Sex,Victims Age
0,2015,2015-01-01,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2015,2015-01-01,4,Metroline,Metroline,Garage Not Available,Islington,Injuries treated on scene,Onboard Injuries,Passenger,Male,Unknown
2,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,Havering,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
3,2015,2015-01-01,5,East London,Stagecoach,Garage Not Available,None London Borough,Taken to Hospital – Reported Serious Injury or...,Onboard Injuries,Passenger,Male,Elderly
4,2015,2015-01-01,6,Metroline,Metroline,Garage Not Available,Westminster,Reported Minor Injury - Treated at Hospital,Onboard Injuries,Pedestrian,Female,Elderly
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,2018,2018-09-01,(blank),East London,Stagecoach,West Ham,Newham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23154,2018,2018-09-01,(blank),London United,London United,Hounslow,Not specified,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult
23155,2018,2018-09-01,(blank),London United,London United,Park Royal,Harrow,Injuries treated on scene,Slip Trip Fall,Operational Staff,Male,Adult
23156,2018,2018-09-01,(blank),London United,London United,Shepherds Bush,Hammersmith & Fulham,Injuries treated on scene,Personal Injury,Bus Driver,Male,Adult


## Conversión de la columna "***Route***" a String

Dentro del DataFrame que pertenece al archivo original, Pandas permite respetar los tipos de datos en cada registro, por ello en la columna "***Route***" contiene números y caracteres, por esta razón se guarda como ***Object*** y, por otro lado, en la base de datos esa columna tiene asignado un tipo de dato ***VARCHAR*** que es igual a un ***String***, de ahí parte a generar una conversión en el tipo de dato de dicha columna.

In [17]:
df_original['Route'] = df_original['Route'].astype(str)
df_accidents_final['Route'] = df_accidents_final['Route'].astype(str)

In [61]:
df_accidents_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23158 entries, 0 to 23157
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   Year                       23158 non-null  int64         
 1   Date Of Incident           23158 non-null  datetime64[ns]
 2   Route                      23158 non-null  object        
 3   Operator                   23158 non-null  object        
 4   Group Name                 23158 non-null  object        
 5   Bus Garage                 23158 non-null  object        
 6   Borough                    23158 non-null  object        
 7   Injury Result Description  23158 non-null  object        
 8   Incident Event Type        23158 non-null  object        
 9   Victim Category            23158 non-null  object        
 10  Victims Sex                23158 non-null  object        
 11  Victims Age                23158 non-null  object        
dtypes: d

In [87]:
df_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23158 entries, 0 to 23157
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   Year                       23158 non-null  int64         
 1   Date Of Incident           23158 non-null  datetime64[ns]
 2   Route                      23158 non-null  object        
 3   Operator                   23158 non-null  object        
 4   Group Name                 23158 non-null  object        
 5   Bus Garage                 23158 non-null  object        
 6   Borough                    23158 non-null  object        
 7   Injury Result Description  23158 non-null  object        
 8   Incident Event Type        23158 non-null  object        
 9   Victim Category            23158 non-null  object        
 10  Victims Sex                23158 non-null  object        
 11  Victims Age                23158 non-null  object        
dtypes: d

## Probar que los DataFrames son iguales.

Mediante la función de Pandas "***equals***" es posible validar si dos DataFrames son iguales, pero un detalle es que los registros deben estar en la misma ***posición***, situación que no sucede en el DataFrame traído desde la Base de Datos, es por ello que se ***ordenan*** los valores en base en las ***columnas*** que toman mayo ***relevancia*** y que puedan impactar de forma inmediata en el ordenamiento, así como ***resetear los índices*** de cada registro para que inicien desde 0 de forma consecutiva.

In [18]:
df_accidents_final.sort_values(by=['Date Of Incident', 'Route', 'Operator', 'Group Name', 'Bus Garage', 'Borough', 'Injury Result Description', 'Incident Event Type', 'Victim Category', 'Victims Sex', 'Victims Age' ]).reset_index(drop=True)

Unnamed: 0,Year,Date Of Incident,Route,Operator,Group Name,Bus Garage,Borough,Injury Result Description,Incident Event Type,Victim Category,Victims Sex,Victims Age
0,2015,2015-01-01,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2015,2015-01-01,10,London United,London United,Garage Not Available,Westminster,Injuries treated on scene,Onboard Injuries,Passenger,Female,Elderly
2,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Barnet,Injuries treated on scene,Onboard Injuries,Passenger,Unknown,Unknown
3,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Barnet,Injuries treated on scene,Onboard Injuries,Passenger,Unknown,Unknown
4,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Enfield,Injuries treated on scene,Collision Incident,Pedestrian,Unknown,Unknown
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,2018,2018-09-01,W3,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23154,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23155,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23156,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Not specified,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult


In [19]:
df_original.sort_values(by=['Date Of Incident', 'Route', 'Operator', 'Group Name', 'Bus Garage', 'Borough', 'Injury Result Description', 'Incident Event Type', 'Victim Category', 'Victims Sex', 'Victims Age' ]).reset_index(drop=True)

Unnamed: 0,Year,Date Of Incident,Route,Operator,Group Name,Bus Garage,Borough,Injury Result Description,Incident Event Type,Victim Category,Victims Sex,Victims Age
0,2015,2015-01-01,1,London General,Go-Ahead,Garage Not Available,Southwark,Injuries treated on scene,Onboard Injuries,Passenger,Male,Child
1,2015,2015-01-01,10,London United,London United,Garage Not Available,Westminster,Injuries treated on scene,Onboard Injuries,Passenger,Female,Elderly
2,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Barnet,Injuries treated on scene,Onboard Injuries,Passenger,Unknown,Unknown
3,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Barnet,Injuries treated on scene,Onboard Injuries,Passenger,Unknown,Unknown
4,2015,2015-01-01,102,Arriva London North,Arriva London,Garage Not Available,Enfield,Injuries treated on scene,Collision Incident,Pedestrian,Unknown,Unknown
...,...,...,...,...,...,...,...,...,...,...,...,...
23153,2018,2018-09-01,W3,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23154,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23155,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Haringey,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult
23156,2018,2018-09-01,W4,Arriva London North,Arriva London,Wood Green,Not specified,Injuries treated on scene,Slip Trip Fall,Passenger,Female,Adult


In [20]:
is_equal =  df_accidents_final.sort_values(by=['Date Of Incident', 'Route', 'Operator', 'Group Name', 'Bus Garage', 'Borough', 'Injury Result Description', 'Incident Event Type', 'Victim Category', 'Victims Sex', 'Victims Age' ]).reset_index(drop=True).equals(df_original.sort_values(by=['Date Of Incident', 'Route', 'Operator', 'Group Name', 'Bus Garage', 'Borough', 'Injury Result Description', 'Incident Event Type', 'Victim Category', 'Victims Sex', 'Victims Age' ]).reset_index(drop=True))

if is_equal:
    print('Ambos datasets son iguales.')
else:
    print('Los datasets no son iguales.')

Ambos datasets son iguales.
