1. BUSINESS UNDERSTANDING

This project is for an aviation company that is in the process of expanding into new industries. The principal business activities of the company are those of purchasing and operating airplanes for commercial and private enterprises; and is in the process of purchasing a new aircraft. What the new aviation division of the company does not understand is the potential risk associated with each plane they may aim to buy.

The purpose of this project is to ascertain the aircraft that provides the company with the minimum possible risk, and to result in actionable insights that will help the aviation division decide which aircraft to purchase.

2. DATA UNDERSTANDING

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df =pd.read_csv('flight.csv')

In [3]:
df #This is to see that the cde above has read the data correctly before starting anything else

Unnamed: 0.1,Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,0,3 Jan 2022,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0,near Venetia Mine Airport,sub
1,1,4 Jan 2022,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2,5 Jan 2022,Boeing 737-4H6,EP-CAP,Caspian Airlines,0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,3,8 Jan 2022,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,4,12 Jan 2022,Beechcraft 200 Super King Air,,private,0,"Machakilha, Toledo District, Grahem Creek area",w/o
...,...,...,...,...,...,...,...,...
2495,1245,20 Dec 2018,Cessna 560 Citation V,N188CW,Chen Aircrafts LLC,4,"2 km NE of Atlanta-Fulton County Airport, GA (...",w/o
2496,1246,22 Dec 2018,PZL-Mielec M28 Skytruck,GNB-96107,Guardia Nacional Bolivariana de Venezuela - GNBV,0,Kamarata Airport (KTV),sub
2497,1247,24 Dec 2018,Antonov An-26B,9T-TAB,Air Force of the Democratic Republic of the Congo,0,Beni Airport (BNC),w/o
2498,1248,31 Dec 2018,Boeing 757-2B7 (WL),N938UW,American Airlines,0,"Charlotte-Douglas International Airport, NC (C...",sub


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  2500 non-null   int64 
 1   acc.date    2500 non-null   object
 2   type        2500 non-null   object
 3   reg         2408 non-null   object
 4   operator    2486 non-null   object
 5   fat         2488 non-null   object
 6   location    2500 non-null   object
 7   dmg         2500 non-null   object
dtypes: int64(1), object(7)
memory usage: 156.4+ KB


In [5]:
df.describe()  #This is for descriptive statistics. Just trying to get a sneakpeak of some few characteristics of the data


Unnamed: 0.1,Unnamed: 0
count,2500.0
mean,624.5
std,360.915993
min,0.0
25%,312.0
50%,624.5
75%,937.0
max,1249.0


In [6]:
df.isna().sum()
#This gives us the sum of instances in the dataset where the dataframe has missing information

Unnamed: 0     0
acc.date       0
type           0
reg           92
operator      14
fat           12
location       0
dmg            0
dtype: int64

In [7]:
df.isna().any() #This code mainly gives a different picture of the 'df.isna().sum()' code. Only that this shows only whether there is missing information
                #in a particular column but doesn't count the number of instances that that occurs


Unnamed: 0    False
acc.date      False
type          False
reg            True
operator       True
fat            True
location      False
dmg           False
dtype: bool

 The data is from a CSV file which comes from the National Transportation Safety Board of the United States through Kaggle. It contains data relating to aircraft accidents that have occurred between 2018 to 2022.The data has 2,500 entries over the mentioned period of time and encompasses data ranging from the accident date to the specific locations and the type of aircraft that was involved. For purposes of this analysis, the aircraft type, the accident dates, and the damages columns will be very relevant in coming up with a proper analysis that will lead to the company making an informed decision.
 The data is suitable for solving the real-world problem in question as it covers a long period of time, which makes it a good sample size to lead the aviation department of my company to make an informed decision.
 The National Transportation Safety Board is an independent body that is tasked with investigating main transportation accidents to reduce future incidents with the aim of making transportation safer. This makes the date more credible as it comes from a genuine source. 

3. DATA PREPARATION

The purpose of this section is to prepare the data for analysis. This mainly includes cleaning the data to an extent that it's fit for analysis.

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,0,3 Jan 2022,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0,near Venetia Mine Airport,sub
1,1,4 Jan 2022,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2,5 Jan 2022,Boeing 737-4H6,EP-CAP,Caspian Airlines,0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,3,8 Jan 2022,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,4,12 Jan 2022,Beechcraft 200 Super King Air,,private,0,"Machakilha, Toledo District, Grahem Creek area",w/o


In [9]:
df['acc.date'].str.strip()
#This is to strip any extra space that is in the date column before converting all dates to the right formats

0           3 Jan 2022
1           4 Jan 2022
2           5 Jan 2022
3           8 Jan 2022
4          12 Jan 2022
             ...      
2495       20 Dec 2018
2496       22 Dec 2018
2497       24 Dec 2018
2498       31 Dec 2018
2499    unk. date 2018
Name: acc.date, Length: 2500, dtype: object

In [10]:
df['acc.date'].unique() #This is meant to give a sneakpeak into the data to see if there are any data that is in incorrect formats

array(['3 Jan 2022', '4 Jan 2022', '5 Jan 2022', '8 Jan 2022',
       '12 Jan 2022', '16 Jan 2022', '19 Jan 2022', '22 Jan 2022',
       '27 Jan 2022', '31 Jan 2022', '2 Feb 2022', '5 Feb 2022',
       '7 Feb 2022', '8 Feb 2022', '11 Feb 2022', '14 Feb 2022',
       '15 Feb 2022', '16 Feb 2022', '18 Feb 2022', '21 Feb 2022',
       '23 Feb 2022', '24 Feb 2022', '26 Feb 2022', '27 Feb 2022',
       '28 Feb 2022', '1 Mar 2022', '2 Mar 2022', '3 Mar 2022',
       '5 Mar 2022', '6 Mar 2022', '7 Mar 2022', '8 Mar 2022',
       '9 Mar 2022', '12 Mar 2022', '17 Mar 2022', '21 Mar 2022',
       '26 Mar 2022', '30 Mar 2022', '1 Apr 2022', '2 Apr 2022',
       '7 Apr 2022', '8 Apr 2022', '11 Apr 2022', '13 Apr 2022',
       '14 Apr 2022', '17 Apr 2022', '22 Apr 2022', '26 Apr 2022',
       '30 Apr 2022', '1 May 2022', '3 May 2022', '6 May 2022',
       '10 May 2022', '11 May 2022', '12 May 2022', '20 May 2022',
       '21 May 2022', '24 May 2022', '25 May 2022', '27 May 2022',
       '28 May 202

In [11]:
df['acc.date'] = pd.to_datetime(df['acc.date'], errors = 'coerce', dayfirst = True) 
#This converts the date column into a proper date format

In [12]:
df['acc.date'].info #This confirms that the data code above has worked and that the date column has been converted correctly.

<bound method Series.info of 0      2022-01-03
1      2022-01-04
2      2022-01-05
3      2022-01-08
4      2022-01-12
          ...    
2495   2018-12-20
2496   2018-12-22
2497   2018-12-24
2498   2018-12-31
2499          NaT
Name: acc.date, Length: 2500, dtype: datetime64[ns]>

In [13]:
df

Unnamed: 0.1,Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,0,2022-01-03,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0,near Venetia Mine Airport,sub
1,1,2022-01-04,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2,2022-01-05,Boeing 737-4H6,EP-CAP,Caspian Airlines,0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,3,2022-01-08,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,4,2022-01-12,Beechcraft 200 Super King Air,,private,0,"Machakilha, Toledo District, Grahem Creek area",w/o
...,...,...,...,...,...,...,...,...
2495,1245,2018-12-20,Cessna 560 Citation V,N188CW,Chen Aircrafts LLC,4,"2 km NE of Atlanta-Fulton County Airport, GA (...",w/o
2496,1246,2018-12-22,PZL-Mielec M28 Skytruck,GNB-96107,Guardia Nacional Bolivariana de Venezuela - GNBV,0,Kamarata Airport (KTV),sub
2497,1247,2018-12-24,Antonov An-26B,9T-TAB,Air Force of the Democratic Republic of the Congo,0,Beni Airport (BNC),w/o
2498,1248,2018-12-31,Boeing 757-2B7 (WL),N938UW,American Airlines,0,"Charlotte-Douglas International Airport, NC (C...",sub


In [14]:
df['type'].unique()

array(['British Aerospace 4121 Jetstream 41',
       'British Aerospace 3101 Jetstream 31', 'Boeing 737-4H6',
       'Tupolev Tu-204-100C', 'Beechcraft 200 Super King Air',
       'Airbus A320-214 (WL)', 'Cessna 208B Grand Caravan EX',
       'Airbus A320-232', 'Bombardier CL-600-2B16 Challenger 604',
       'Beechcraft B300 King Air 350', 'Hawker 1000',
       'Cessna 208B Grand Caravan', 'Embraer ERJ-190-100LR',
       'Antonov An-26', 'Cessna 501 Citation I/SP', 'Antonov An-2R',
       'Let L-410UVP-E3', 'ATR 42-500', 'Britten-Norman BN-2A-9 Islander',
       'Swearingen SA226-AT Merlin IV', 'Embraer EMB-500 Phenom 100E',
       'de Havilland Canada DHC-3T Texas Turbine Otter',
       'Raytheon Hawker 800XP', 'Antonov An-2', 'Antonov An-22A',
       'Antonov An-26-100', 'Antonov An-74T', 'Cessna 208B Supervan 900',
       'Antonov An-124-100', 'Antonov An-225',
       'Embraer ERJ 170-200 LR (ERJ-175LR)', 'Shaanxi Y-8Q',
       'Embraer EMB-500 Phenom 100', 'Boeing 737-8AS (WL)',
  

In [15]:
df['type'].str.lower() #Considering that the code is case sensitive, I have converted the data into lower case to ensure consistency

0       british aerospace 4121 jetstream 41
1       british aerospace 3101 jetstream 31
2                            boeing 737-4h6
3                       tupolev tu-204-100c
4             beechcraft 200 super king air
                       ...                 
2495                  cessna 560 citation v
2496                pzl-mielec m28 skytruck
2497                         antonov an-26b
2498                    boeing 757-2b7 (wl)
2499                 rockwell sabreliner 80
Name: type, Length: 2500, dtype: object

In [16]:
df['type'].str.strip() #This is to remove extra spaces in the data if any

0       British Aerospace 4121 Jetstream 41
1       British Aerospace 3101 Jetstream 31
2                            Boeing 737-4H6
3                       Tupolev Tu-204-100C
4             Beechcraft 200 Super King Air
                       ...                 
2495                  Cessna 560 Citation V
2496                PZL-Mielec M28 Skytruck
2497                         Antonov An-26B
2498                    Boeing 757-2B7 (WL)
2499                 Rockwell Sabreliner 80
Name: type, Length: 2500, dtype: object

Next we head to the 'fat' column which should represent fatalitiesand ensure that the data is in integer or float format

In [17]:
df['fat'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 2500 entries, 0 to 2499
Series name: fat
Non-Null Count  Dtype 
--------------  ----- 
2488 non-null   object
dtypes: object(1)
memory usage: 19.7+ KB


In [18]:
df['fat'].unique() #This is meant to find the unique values in the 'fat' column.

array(['0', '2', nan, '5', '14', '11', '132', '1', '22', '6', '4', '8',
       '0+2', '10', '3', '0+1', '19', '5+1', '62', '7', '12', '50+3',
       '28', '16', '9', '1+1', '18', '176', '97+1', '21', '26', '15',
       '157', '1+2', '13', '41', '1+5', '5+14', '21+6', '38', '71', '66',
       '39', '51', '257', '112', '20', '189'], dtype=object)

In [19]:
df['fat'].replace({'0+2':'2', '0+1':'1','5+1':'6','50+3':'53', '1+1':'2','97+1':'98','1+2':'3','1+5':'6','5+14':'19','21+6':'27'}, inplace = True)

#This is a code to replace the data that is displayed incorrectly in the 'fat' column to ensure that everything is in float/int format.

In [20]:
df['fat'].unique()

array(['0', '2', nan, '5', '14', '11', '132', '1', '22', '6', '4', '8',
       '10', '3', '19', '62', '7', '12', '53', '28', '16', '9', '18',
       '176', '98', '21', '26', '15', '157', '13', '41', '27', '38', '71',
       '66', '39', '51', '257', '112', '20', '189'], dtype=object)

In [21]:
df['fat'] = df['fat'].astype(float) #This is to convert the data in the 'fat'colum into float format.

In [22]:
df['fat'].info() #This is to see if the code above has worked correctly in converting the 'fat' column into float format.

<class 'pandas.core.series.Series'>
RangeIndex: 2500 entries, 0 to 2499
Series name: fat
Non-Null Count  Dtype  
--------------  -----  
2488 non-null   float64
dtypes: float64(1)
memory usage: 19.7 KB


In [23]:
 df['fat'].unique() #This is to show all the unique values that are there in the dataset

array([  0.,   2.,  nan,   5.,  14.,  11., 132.,   1.,  22.,   6.,   4.,
         8.,  10.,   3.,  19.,  62.,   7.,  12.,  53.,  28.,  16.,   9.,
        18., 176.,  98.,  21.,  26.,  15., 157.,  13.,  41.,  27.,  38.,
        71.,  66.,  39.,  51., 257., 112.,  20., 189.])

In [24]:
df['fat'].isna().sum() #This shows the extent of missing information in the dataset

12

In [25]:
df

Unnamed: 0.1,Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,0,2022-01-03,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0.0,near Venetia Mine Airport,sub
1,1,2022-01-04,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0.0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2,2022-01-05,Boeing 737-4H6,EP-CAP,Caspian Airlines,0.0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,3,2022-01-08,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0.0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,4,2022-01-12,Beechcraft 200 Super King Air,,private,0.0,"Machakilha, Toledo District, Grahem Creek area",w/o
...,...,...,...,...,...,...,...,...
2495,1245,2018-12-20,Cessna 560 Citation V,N188CW,Chen Aircrafts LLC,4.0,"2 km NE of Atlanta-Fulton County Airport, GA (...",w/o
2496,1246,2018-12-22,PZL-Mielec M28 Skytruck,GNB-96107,Guardia Nacional Bolivariana de Venezuela - GNBV,0.0,Kamarata Airport (KTV),sub
2497,1247,2018-12-24,Antonov An-26B,9T-TAB,Air Force of the Democratic Republic of the Congo,0.0,Beni Airport (BNC),w/o
2498,1248,2018-12-31,Boeing 757-2B7 (WL),N938UW,American Airlines,0.0,"Charlotte-Douglas International Airport, NC (C...",sub


In [26]:
df['dmg'].unique() #This is meant to bring out the unique elements in the damages column. From the look of the data in this column is in good
                    #and cleaning here doesn't need to go any further at least as far as the formatting of data is concerned.

array(['sub', 'w/o', 'non', 'min', 'unk', 'mis'], dtype=object)

In [27]:
df

Unnamed: 0.1,Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,0,2022-01-03,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0.0,near Venetia Mine Airport,sub
1,1,2022-01-04,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0.0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2,2022-01-05,Boeing 737-4H6,EP-CAP,Caspian Airlines,0.0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,3,2022-01-08,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0.0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,4,2022-01-12,Beechcraft 200 Super King Air,,private,0.0,"Machakilha, Toledo District, Grahem Creek area",w/o
...,...,...,...,...,...,...,...,...
2495,1245,2018-12-20,Cessna 560 Citation V,N188CW,Chen Aircrafts LLC,4.0,"2 km NE of Atlanta-Fulton County Airport, GA (...",w/o
2496,1246,2018-12-22,PZL-Mielec M28 Skytruck,GNB-96107,Guardia Nacional Bolivariana de Venezuela - GNBV,0.0,Kamarata Airport (KTV),sub
2497,1247,2018-12-24,Antonov An-26B,9T-TAB,Air Force of the Democratic Republic of the Congo,0.0,Beni Airport (BNC),w/o
2498,1248,2018-12-31,Boeing 757-2B7 (WL),N938UW,American Airlines,0.0,"Charlotte-Douglas International Airport, NC (C...",sub


The data when you make reference to the unnamed:0 column starts from index 0 to index 1249 then starts from 0 again all the way to index 1249. This is an indication that the data is repeated going downwards. For us to have the correct data, we have to drop the rows which start from index 1250 all the way to the bottom. This has been done below.

In [28]:
df= df.drop(df.loc[1250:2499].index)
df.info()

#The rows whichare a repetition have been dropned and now we have unique data to work with.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Unnamed: 0  1250 non-null   int64         
 1   acc.date    1247 non-null   datetime64[ns]
 2   type        1250 non-null   object        
 3   reg         1204 non-null   object        
 4   operator    1243 non-null   object        
 5   fat         1244 non-null   float64       
 6   location    1250 non-null   object        
 7   dmg         1250 non-null   object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 78.3+ KB


The unnamed 0 column is also not  relevant to the dataset as it's just made up of indices which are assigned by python automatically This column can be dropped and  this has been done below.

In [29]:
df.drop(columns='Unnamed: 0', inplace =True)

In [30]:
df

Unnamed: 0,acc.date,type,reg,operator,fat,location,dmg
0,2022-01-03,British Aerospace 4121 Jetstream 41,ZS-NRJ,SA Airlink,0.0,near Venetia Mine Airport,sub
1,2022-01-04,British Aerospace 3101 Jetstream 31,HR-AYY,LANHSA - Línea Aérea Nacional de Honduras S.A,0.0,Roatán-Juan Manuel Gálvez International Airpor...,sub
2,2022-01-05,Boeing 737-4H6,EP-CAP,Caspian Airlines,0.0,Isfahan-Shahid Beheshti Airport (IFN),sub
3,2022-01-08,Tupolev Tu-204-100C,RA-64032,"Cainiao, opb Aviastar-TU",0.0,Hangzhou Xiaoshan International Airport (HGH),w/o
4,2022-01-12,Beechcraft 200 Super King Air,,private,0.0,"Machakilha, Toledo District, Grahem Creek area",w/o
...,...,...,...,...,...,...,...
1245,2018-12-20,Cessna 560 Citation V,N188CW,Chen Aircrafts LLC,4.0,"2 km NE of Atlanta-Fulton County Airport, GA (...",w/o
1246,2018-12-22,PZL-Mielec M28 Skytruck,GNB-96107,Guardia Nacional Bolivariana de Venezuela - GNBV,0.0,Kamarata Airport (KTV),sub
1247,2018-12-24,Antonov An-26B,9T-TAB,Air Force of the Democratic Republic of the Congo,0.0,Beni Airport (BNC),w/o
1248,2018-12-31,Boeing 757-2B7 (WL),N938UW,American Airlines,0.0,"Charlotte-Douglas International Airport, NC (C...",sub


In [31]:
df.isna().any()

acc.date     True
type        False
reg          True
operator     True
fat          True
location    False
dmg         False
dtype: bool

As of now, the 4 columns of acc.date, operator, registration number and fatalities have missing information. Next we will see the count of the missing
information and see what can be dropped before we have a final set of our dataset.

In [32]:
df.isna().sum()

acc.date     3
type         0
reg         46
operator     7
fat          6
location     0
dmg          0
dtype: int64

Based on the above information about missing values, the columns that are relevant to our analysis are the acc.date and fatalities column. This means that the rows with missing information in the other columns don't need to be dropped. Below, I will drop rows with missing information from the accident date and fatalities.

In [39]:
df = df.dropna(subset ='acc.date')

In [40]:
df = df.dropna(subset ='fat')

In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1241 entries, 0 to 1248
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   acc.date  1241 non-null   datetime64[ns]
 1   type      1241 non-null   object        
 2   reg       1198 non-null   object        
 3   operator  1234 non-null   object        
 4   fat       1241 non-null   float64       
 5   location  1241 non-null   object        
 6   dmg       1241 non-null   object        
dtypes: datetime64[ns](1), float64(1), object(5)
memory usage: 77.6+ KB


In [42]:
#Exporting cleaned data to a new csv file.
df.to_csv('cleaned_flight_data.csv')

4. DATA ANALYSIS

The data analysis has mainly been done in Tableau before heading to vizualization; and has been approached from the perspectives below and findings have also been specified.

4.1 ANALYSIS AND FINDINGS

1. I have looked at the number of accidents in the whole dataset from the perspective of each year. Leading to a conclusion that 2021 was the year with the least accidents while 2019 had the most accidents. This makes 2021 to be a good year from which management can make a decision from as the flights in 2021 con be considered to be the safest across the whole dataset.
2. I also looked at the aircraft types that had the least accidents in the dataset; and narrowed down to the bottom 5 as the company will most likely have to choose just one aircraft from the suggestions provided in the project. The findings have been presented in the accompanying vizualization.
3. I have also considered the aircrafts that have had the least fatalities even after having accidents and that promotes the ability of the aircraft to manouvre challenging occurences and end up preserving life. This was also narrowed down to the least 5 aircrafts in the dataset.
4. The data has also been looked at from the point of the aircrafts that have had the maximum accidents in the period covered. This will mainly show the stakeholders the spread between the spread between the least accidents and maximum accidents. This will show how much relevance the management's decision has in ensuring that the end goal of a low risk aircraft is.


These findings support the recommendations below as they are in line with ensuring that management makes the best decision in choosng the lowest risk aircraft for purchase. Low risk here has been looked from the persepective of least accidents, and least fatalities that an aircraft has caused.

 4.2 RECOMMENDATIONS

 1. I recommend management to look at 2021 as the year that will provide the best basis for a decision to be made as there were least accidents in this year meaning that the aircrafts in that year were the safest.
 2. I recommend that management use discretion in selecting the aircrafts that have had the least accidents. This is becasue the least 5 options found in the project has all had 1 accident in the period covered by the dataset. Management needs to consider other factors like cost, availability of spareparts, maintenance and running costs,a availability of expertise for maintenence  etc in making these decisions.
 3. The ability of an aircraft to preserve life even during the event that an accident has occured is going to be very essential in this decision. Hence they should put emphasis in the aircrafts that have had the least fatalities.All other factrs held constant.