## Database Format
### Date:	 Date of accident,  in the format - January 01, 2001
### Time:	 Local time, in 24 hr
### Airline/Op:	 Airline or operator of the aircraft
### Flight #:	 Flight number assigned by the aircraft operator
### Route:	 Complete or partial route flown prior to the accident
### AC Type:	 Aircraft type
### Reg:	 ICAO registration of the aircraft
### cn / ln:	 Construction or serial number / Line or fuselage number
### Aboard:	 Total aboard (passengers / crew)
### Fatalities:	 Total fatalities aboard (passengers / crew)
### Ground:	 Total killed on the ground
### Summary:	 Brief description of the accident and cause if known
 

## To Do
### Data is cross checked in Excel while doing cleaning with Pandas
1. Drop not needed cols
2. replace "?"  with "unknown"
3. Separate passengers and crew in "Aboard" and "Fatalaties" into new col for each
4. Create new Columns for Totals and Survivors
5. Cluster words in Summary
6. Reorder columns and drop what's not needed 
7. Make the Data usuble for BI tools
8. will use streamli, will add columns as needed

In [47]:
import pandas as pd
import numpy as np

In [48]:
#loading the excel into a dataframe
df = pd.read_excel('./Data/plane_crash_info_data.xlsx')
df

Unnamed: 0,Date,Time,Location,Operator,Flight_Number,Route,AC_Type,Registration,cn_ln,Aboard,Fatalities,Ground,Summary
0,"September 17, 1908",1718,"Fort Myer, Virginia",Military - U.S. Army,?,Demonstration,Wright Flyer III,?,1,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly..."
1,"September 07, 1909",?,"Juvisy-sur-Orge, France",?,?,Air show,Wright Byplane,SC1,?,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...
2,"July 12, 1912",0630,"Atlantic City, New Jersey",Military - U.S. Navy,?,Test flight,Dirigible,?,?,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...
3,"August 06, 1913",?,"Victoria, British Columbia, Canada",Private,?,?,Curtiss seaplane,?,?,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...
4,"September 09, 1913",1830,Over the North Sea,Military - German Navy,?,?,Zeppelin L-1 (airship),?,?,20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,"July 16, 2022",2247,"Eleftheroupolis, Greece",Meridian,MEM3032,Nis- Amman,Antonov An-12,UR-CIC,01347701,8 (passengers:0 crew:8),8 (passengers:0 crew:8),0,The cargo plane carrying eight crew members an...
5059,"November 06, 2022",0853,"Bukoba, Tanzania",Precision Air,PW494,Dar es-Salaam -Bukoba,ATR 42-500,5H-PWF,819,43 (passengers:39 crew:39),19 (passengers:17 crew:2),0,"While on final approach to Bukoba Airport, the..."
5060,"November 18, 2022",1511,"Lima, Peru",LATAM,LA2213,Lima - Juliaca,Airbus 320-271N,CC-BHB,7864,108 (passengers:102 crew:6),0 (passengers:0 crew:0),2,The Airbus A320 collided with a fire truck whi...
5061,"November 21, 2022",1015,"Medellín, Colombia",AeroPaca SAS,?,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,HK-5121,31-7652004,8 (passengers:6 crew:2),8 (passengers:6 crew:2),0,The plane was chartered to carry a team of six...


In [49]:
#droping not needed columns
df.drop(columns=["Flight_Number", 'Registration', 'cn_ln'], inplace=True)

In [50]:
#renaming Ground Column
df.rename(columns={'Ground': 'Ground_Fatalities'},inplace=True)

In [51]:
#Checking for missing values
percent_missing = df.isnull().sum() * 100 / len(df)
percent_missing


Date                 0.0
Time                 0.0
Location             0.0
Operator             0.0
Route                0.0
AC_Type              0.0
Aboard               0.0
Fatalities           0.0
Ground_Fatalities    0.0
Summary              0.0
dtype: float64

In [52]:
#replacing the ? with Unknown, because why not?
df = df.replace(to_replace='?', value=np.nan)
df.head()

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary
0,"September 17, 1908",1718.0,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly..."
1,"September 07, 1909",,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...
2,"July 12, 1912",630.0,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...
3,"August 06, 1913",,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...
4,"September 09, 1913",1830.0,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...


In [53]:
df.dtypes

Date                 object
Time                 object
Location             object
Operator             object
Route                object
AC_Type              object
Aboard               object
Fatalities           object
Ground_Fatalities    object
Summary              object
dtype: object

In [54]:
# need to change Date to datetime object
df["Date"] = pd.to_datetime(df["Date"])
df.head()

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary
0,1908-09-17,1718.0,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly..."
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...
2,1912-07-12,630.0,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...
4,1913-09-09,1830.0,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...


In [55]:
df.dtypes

Date                 datetime64[ns]
Time                         object
Location                     object
Operator                     object
Route                        object
AC_Type                      object
Aboard                       object
Fatalities                   object
Ground_Fatalities            object
Summary                      object
dtype: object

In [56]:
# some values in Time have a substrings that need to be removed; leading and trailing spaces, need to be deleted; 
df['Time'] = df['Time'].str.replace('c|c:|z|c|Z|:|;', '', regex=True)
df['Time'] = df['Time'].str.strip()

In [57]:
#":" adding to assist in converting the dtype to HH:MM
#not able to do change to datatime, but Excel is reading it correclty, will return if theres need
df['Time'] = df['Time'].apply(lambda x: x[:-2]+':'+x[-2:] if not pd.isnull(x) else x)

In [58]:
# is this even needed?
df['Time'] = df['Time'].apply(lambda x: x+':00' if not pd.isnull(x) else x)
df

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly..."
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...
...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,8 (passengers:0 crew:8),8 (passengers:0 crew:8),0,The cargo plane carrying eight crew members an...
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,43 (passengers:39 crew:39),19 (passengers:17 crew:2),0,"While on final approach to Bukoba Airport, the..."
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,108 (passengers:102 crew:6),0 (passengers:0 crew:0),2,The Airbus A320 collided with a fire truck whi...
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,8 (passengers:6 crew:2),8 (passengers:6 crew:2),0,The plane was chartered to carry a team of six...


### let's separate and create new cols for Aboard Passengers and Crew and Fatalities Passegners an Crew

In [59]:
df['Passengers_Aboard'] = df['Aboard'].str.extract('passengers:(\d+)')
df['Crew_Aboard'] = df['Aboard'].str.extract('crew:(\d+)')
df['Passengers_Fatalities'] = df['Fatalities'].str.extract('passengers:(\d+)')
df['Crew_Fatalities'] = df['Fatalities'].str.extract('crew:(\d+)')
df['Aboard_Aircraft'] = df['Aboard'].str.extract('(\d+)')
df['Aboard_Fatalities'] = df['Fatalities'].str.extract('(\d+)')
df

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary,Passengers_Aboard,Crew_Aboard,Passengers_Fatalities,Crew_Fatalities,Aboard_Aircraft,Aboard_Fatalities
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly...",1,1,1,0,2,1
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...,0,1,0,0,1,1
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...,0,5,0,5,5,5
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...,0,1,0,1,1,1
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...,,,,,20,14
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,8 (passengers:0 crew:8),8 (passengers:0 crew:8),0,The cargo plane carrying eight crew members an...,0,8,0,8,8,8
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,43 (passengers:39 crew:39),19 (passengers:17 crew:2),0,"While on final approach to Bukoba Airport, the...",39,39,17,2,43,19
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,108 (passengers:102 crew:6),0 (passengers:0 crew:0),2,The Airbus A320 collided with a fire truck whi...,102,6,0,0,108,0
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,8 (passengers:6 crew:2),8 (passengers:6 crew:2),0,The plane was chartered to carry a team of six...,6,2,6,2,8,8


### total fatalities column

In [60]:
# since I'm not able to sum str and theres need to fill NaN wih 0, will create two new var and sum them, this way unreliable data is not added to the main dataframe
dftotal1 =  pd.to_numeric(df['Ground_Fatalities'], errors='coerce')
dftotal2 =  pd.to_numeric(df['Aboard_Fatalities'], errors='coerce')

df['Total_Fatalites'] = dftotal1 + dftotal2
df



Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary,Passengers_Aboard,Crew_Aboard,Passengers_Fatalities,Crew_Fatalities,Aboard_Aircraft,Aboard_Fatalities,Total_Fatalites
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly...",1,1,1,0,2,1,1.0
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...,0,1,0,0,1,1,1.0
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...,0,5,0,5,5,5,5.0
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...,0,1,0,1,1,1,1.0
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...,,,,,20,14,14.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,8 (passengers:0 crew:8),8 (passengers:0 crew:8),0,The cargo plane carrying eight crew members an...,0,8,0,8,8,8,8.0
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,43 (passengers:39 crew:39),19 (passengers:17 crew:2),0,"While on final approach to Bukoba Airport, the...",39,39,17,2,43,19,19.0
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,108 (passengers:102 crew:6),0 (passengers:0 crew:0),2,The Airbus A320 collided with a fire truck whi...,102,6,0,0,108,0,2.0
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,8 (passengers:6 crew:2),8 (passengers:6 crew:2),0,The plane was chartered to carry a team of six...,6,2,6,2,8,8,8.0


### create a column survivors

In [61]:
#same thing as above
dftotal3 =  pd.to_numeric(df['Aboard_Aircraft'], errors='coerce')
dftotal4 =  pd.to_numeric(df['Aboard_Fatalities'], errors='coerce')
df['Survivors'] = dftotal3 - dftotal4
df

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Aboard,Fatalities,Ground_Fatalities,Summary,Passengers_Aboard,Crew_Aboard,Passengers_Fatalities,Crew_Fatalities,Aboard_Aircraft,Aboard_Fatalities,Total_Fatalites,Survivors
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,2 (passengers:1 crew:1),1 (passengers:1 crew:0),0,"During a demonstration flight, a U.S. Army fly...",1,1,1,0,2,1,1.0,1.0
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,1 (passengers:0 crew:1),1 (passengers:0 crew:0),0,Eugene Lefebvre was the first pilot to ever be...,0,1,0,0,1,1,1.0,0.0
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,5 (passengers:0 crew:5),5 (passengers:0 crew:5),0,First U.S. dirigible Akron exploded just offsh...,0,5,0,5,5,5,5.0,0.0
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,1 (passengers:0 crew:1),1 (passengers:0 crew:1),0,The first fatal airplane accident in Canada oc...,0,1,0,1,1,1,1.0,0.0
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),20 (passengers:? crew:?),14 (passengers:? crew:?),0,The airship flew into a thunderstorm and encou...,,,,,20,14,14.0,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,8 (passengers:0 crew:8),8 (passengers:0 crew:8),0,The cargo plane carrying eight crew members an...,0,8,0,8,8,8,8.0,0.0
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,43 (passengers:39 crew:39),19 (passengers:17 crew:2),0,"While on final approach to Bukoba Airport, the...",39,39,17,2,43,19,19.0,24.0
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,108 (passengers:102 crew:6),0 (passengers:0 crew:0),2,The Airbus A320 collided with a fire truck whi...,102,6,0,0,108,0,2.0,108.0
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,8 (passengers:6 crew:2),8 (passengers:6 crew:2),0,The plane was chartered to carry a team of six...,6,2,6,2,8,8,8.0,0.0


### reorder the columns for the final format

In [62]:
df.drop(columns=["Aboard", 'Fatalities'], inplace=True)

In [63]:
df = df[['Date', "Time", "Location", "Operator", "Route", "AC_Type", "Summary", "Passengers_Aboard", "Crew_Aboard", "Aboard_Aircraft", "Passengers_Fatalities", "Crew_Fatalities", "Aboard_Fatalities", "Ground_Fatalities", "Total_Fatalites", "Survivors"]]
df

Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Summary,Passengers_Aboard,Crew_Aboard,Aboard_Aircraft,Passengers_Fatalities,Crew_Fatalities,Aboard_Fatalities,Ground_Fatalities,Total_Fatalites,Survivors
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,"During a demonstration flight, a U.S. Army fly...",1,1,2,1,0,1,0,1.0,1.0
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,Eugene Lefebvre was the first pilot to ever be...,0,1,1,0,0,1,0,1.0,0.0
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,First U.S. dirigible Akron exploded just offsh...,0,5,5,0,5,5,0,5.0,0.0
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,The first fatal airplane accident in Canada oc...,0,1,1,0,1,1,0,1.0,0.0
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),The airship flew into a thunderstorm and encou...,,,20,,,14,0,14.0,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,The cargo plane carrying eight crew members an...,0,8,8,0,8,8,0,8.0,0.0
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,"While on final approach to Bukoba Airport, the...",39,39,43,17,2,19,0,19.0,24.0
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,The Airbus A320 collided with a fire truck whi...,102,6,108,0,0,0,2,2.0,108.0
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,The plane was chartered to carry a team of six...,6,2,8,6,2,8,0,8.0,0.0


In [64]:
df['year'] = pd.DatetimeIndex(df['Date']).year
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['year'] = pd.DatetimeIndex(df['Date']).year


Unnamed: 0,Date,Time,Location,Operator,Route,AC_Type,Summary,Passengers_Aboard,Crew_Aboard,Aboard_Aircraft,Passengers_Fatalities,Crew_Fatalities,Aboard_Fatalities,Ground_Fatalities,Total_Fatalites,Survivors,year
0,1908-09-17,17:18:00,"Fort Myer, Virginia",Military - U.S. Army,Demonstration,Wright Flyer III,"During a demonstration flight, a U.S. Army fly...",1,1,2,1,0,1,0,1.0,1.0,1908
1,1909-09-07,,"Juvisy-sur-Orge, France",,Air show,Wright Byplane,Eugene Lefebvre was the first pilot to ever be...,0,1,1,0,0,1,0,1.0,0.0,1909
2,1912-07-12,06:30:00,"Atlantic City, New Jersey",Military - U.S. Navy,Test flight,Dirigible,First U.S. dirigible Akron exploded just offsh...,0,5,5,0,5,5,0,5.0,0.0,1912
3,1913-08-06,,"Victoria, British Columbia, Canada",Private,,Curtiss seaplane,The first fatal airplane accident in Canada oc...,0,1,1,0,1,1,0,1.0,0.0,1913
4,1913-09-09,18:30:00,Over the North Sea,Military - German Navy,,Zeppelin L-1 (airship),The airship flew into a thunderstorm and encou...,,,20,,,14,0,14.0,6.0,1913
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5058,2022-07-16,22:47:00,"Eleftheroupolis, Greece",Meridian,Nis- Amman,Antonov An-12,The cargo plane carrying eight crew members an...,0,8,8,0,8,8,0,8.0,0.0,2022
5059,2022-11-06,08:53:00,"Bukoba, Tanzania",Precision Air,Dar es-Salaam -Bukoba,ATR 42-500,"While on final approach to Bukoba Airport, the...",39,39,43,17,2,19,0,19.0,24.0,2022
5060,2022-11-18,15:11:00,"Lima, Peru",LATAM,Lima - Juliaca,Airbus 320-271N,The Airbus A320 collided with a fire truck whi...,102,6,108,0,0,0,2,2.0,108.0,2022
5061,2022-11-21,10:15:00,"Medellín, Colombia",AeroPaca SAS,Medellín - Pizarro,Piper PA-31-350 Navajo Chieftain,The plane was chartered to carry a team of six...,6,2,8,6,2,8,0,8.0,0.0,2022


In [65]:
df[['Time', 'Location', 'Operator', 'Route', 'AC_Type', 'Summary']] = df[['Time', 'Location', 'Operator', 'Route', 'AC_Type', 'Summary']].apply(lambda x : x.fillna('Unknown'))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


In [66]:
percent_missing = df.isnull().sum() * 100 / len(df)
percent_missing.sort_values(ascending=False)

Crew_Fatalities          4.700770
Passengers_Fatalities    4.700770
Passengers_Aboard        4.404503
Crew_Aboard              4.365001
Total_Fatalites          0.888801
Ground_Fatalities        0.888801
Aboard_Aircraft          0.335769
Survivors                0.335769
Aboard_Fatalities        0.158009
Date                     0.000000
Time                     0.000000
Summary                  0.000000
AC_Type                  0.000000
Route                    0.000000
Operator                 0.000000
Location                 0.000000
year                     0.000000
dtype: float64

In [67]:
df.to_excel('Data/plane_crash_info_cleaned.xlsx', index=False)