# Aviation Safety Risk Analysis  

## Business Understanding

### Stakeholder
The primary stakeholder for this project is the Head of the new Aviation Division, who is responsible for making aircraft purchasing and operational decisions.

### Business Problem
As the company expands into the aviation industry, it faces significant safety and financial risks associated with aircraft operations. The organization lacks historical insight into which aircraft types and manufacturers have lower accident risk, making it difficult to make informed purchasing decisions.

### Project Goal
The goal of this project is to analyze historical aviation accident data to identify aircraft with lower operational risk. This analysis will support data-driven decisions about which aircraft the company should prioritize when entering the aviation market.

### Key Business Questions
- Which aircraft makes and models have historically been involved in fewer and less severe accidents?
- Are there trends in accident frequency or severity over time?
- Which aircraft characteristics are associated with lower overall risk?

### Success Criteria
This project will be considered successful if it produces clear, data-backed insights that lead to three actionable recommendations for selecting lower-risk aircraft.
## Data Understanding & Initial Exploration

### Business Context
Our company is evaluating entry into the aviation industry.  
To minimize operational risk, we analyze historical aviation accident data to understand trends, data quality, and relevant risk indicators.

### Objective of This Step
- Understand dataset structure
- Identify relevant variables
- Detect missing or inconsistent data
- Inform data cleaning and analysis decisions
- successfully load dataset
- understand the time range of the data


**IMPORT LIBRARIES**

In [31]:
import pandas as pd # for data manipulation
import numpy as np # numerical handling
import matplotlib.pyplot as plt # exploring visuals
import seaborn as sns
import os
os.getcwd()
os.listdir("..")


pd.set_option("display.max_columns", None)

In [32]:
os.listdir("../data")

['AviationData.csv']

**LOAD DATA**

In [33]:
data_path = "../data/AviationData.csv"
df = pd.read_csv(data_path, encoding="latin1")

  df = pd.read_csv(data_path, encoding="latin1")


In [34]:
df.shape # This shows how many rows and columns we have in our dataset

(88889, 31)

**Preview the Data**

In [35]:
df.head() # shows the first five rows

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
0,20001218X45444,Accident,SEA87LA080,1948-10-24,"MOOSE CREEK, ID",United States,,,,,Fatal(2),Destroyed,,NC6404,Stinson,108-3,No,1.0,Reciprocating,,,Personal,,2.0,0.0,0.0,0.0,UNK,Cruise,Probable Cause,
1,20001218X45447,Accident,LAX94LA336,1962-07-19,"BRIDGEPORT, CA",United States,,,,,Fatal(4),Destroyed,,N5069P,Piper,PA24-180,No,1.0,Reciprocating,,,Personal,,4.0,0.0,0.0,0.0,UNK,Unknown,Probable Cause,19-09-1996
2,20061025X01555,Accident,NYC07LA005,1974-08-30,"Saltville, VA",United States,36.922223,-81.878056,,,Fatal(3),Destroyed,,N5142R,Cessna,172M,No,1.0,Reciprocating,,,Personal,,3.0,,,,IMC,Cruise,Probable Cause,26-02-2007
3,20001218X45448,Accident,LAX96LA321,1977-06-19,"EUREKA, CA",United States,,,,,Fatal(2),Destroyed,,N1168J,Rockwell,112,No,1.0,Reciprocating,,,Personal,,2.0,0.0,0.0,0.0,IMC,Cruise,Probable Cause,12-09-2000
4,20041105X01764,Accident,CHI79FA064,1979-08-02,"Canton, OH",United States,,,,,Fatal(1),Destroyed,,N15NY,Cessna,501,No,,,,,Personal,,1.0,2.0,,0.0,VMC,Approach,Probable Cause,16-04-1980


In [36]:
df.tail() # shows the last five rows

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
88884,20221227106491,Accident,ERA23LA093,2022-12-26,"Annapolis, MD",United States,,,,,Minor,,,N1867H,PIPER,PA-28-151,No,,,91.0,,Personal,,0.0,1.0,0.0,0.0,,,,29-12-2022
88885,20221227106494,Accident,ERA23LA095,2022-12-26,"Hampton, NH",United States,,,,,,,,N2895Z,BELLANCA,7ECA,No,,,,,,,0.0,0.0,0.0,0.0,,,,
88886,20221227106497,Accident,WPR23LA075,2022-12-26,"Payson, AZ",United States,341525N,1112021W,PAN,PAYSON,Non-Fatal,Substantial,Airplane,N749PJ,AMERICAN CHAMPION AIRCRAFT,8GCBC,No,1.0,,91.0,,Personal,,0.0,0.0,0.0,1.0,VMC,,,27-12-2022
88887,20221227106498,Accident,WPR23LA076,2022-12-26,"Morgan, UT",United States,,,,,,,,N210CU,CESSNA,210N,No,,,91.0,,Personal,MC CESSNA 210N LLC,0.0,0.0,0.0,0.0,,,,
88888,20221230106513,Accident,ERA23LA097,2022-12-29,"Athens, GA",United States,,,,,Minor,,,N9026P,PIPER,PA-24-260,No,,,91.0,,Personal,,0.0,1.0,0.0,1.0,,,,30-12-2022


**Understand Column Structure**

In [37]:
df.info() #Look for:

#Column names

#Data types

#Missing values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88889 entries, 0 to 88888
Data columns (total 31 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Event.Id                88889 non-null  object 
 1   Investigation.Type      88889 non-null  object 
 2   Accident.Number         88889 non-null  object 
 3   Event.Date              88889 non-null  object 
 4   Location                88837 non-null  object 
 5   Country                 88663 non-null  object 
 6   Latitude                34382 non-null  object 
 7   Longitude               34373 non-null  object 
 8   Airport.Code            50132 non-null  object 
 9   Airport.Name            52704 non-null  object 
 10  Injury.Severity         87889 non-null  object 
 11  Aircraft.damage         85695 non-null  object 
 12  Aircraft.Category       32287 non-null  object 
 13  Registration.Number     87507 non-null  object 
 14  Make                    88826 non-null

**Summary Statistics**

In [38]:
df.describe(include="all")

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date
count,88889,88889,88889,88889,88837,88663,34382,34373,50132,52704,87889,85695,32287,87507,88826,88797.0,88787,82805.0,81793,32023.0,12582,82697,16648,77488.0,76379.0,76956.0,82977.0,84397,61724,82505,75118
unique,87951,2,88863,14782,27758,219,25592,27156,10374,24870,109,4,15,79104,8237,12318.0,2,,12,31.0,3,26,13590,,,,,4,12,17074,2924
top,20001212X19172,Accident,CEN22LA149,1984-06-30,"ANCHORAGE, AK",United States,332739N,0112457W,NONE,Private,Non-Fatal,Substantial,Airplane,NONE,Cessna,152.0,No,,Reciprocating,91.0,NSCH,Personal,Pilot,,,,,VMC,Landing,Probable Cause,25-09-2020
freq,3,85015,2,25,434,82248,19,24,1488,240,67357,64148,27617,344,22227,2367.0,80312,,69530,18221.0,4474,49448,258,,,,,77303,15428,61754,17019
mean,,,,,,,,,,,,,,,,,,1.146585,,,,,,0.647855,0.279881,0.357061,5.32544,,,,
std,,,,,,,,,,,,,,,,,,0.44651,,,,,,5.48596,1.544084,2.235625,27.913634,,,,
min,,,,,,,,,,,,,,,,,,0.0,,,,,,0.0,0.0,0.0,0.0,,,,
25%,,,,,,,,,,,,,,,,,,1.0,,,,,,0.0,0.0,0.0,0.0,,,,
50%,,,,,,,,,,,,,,,,,,1.0,,,,,,0.0,0.0,0.0,1.0,,,,
75%,,,,,,,,,,,,,,,,,,1.0,,,,,,0.0,0.0,0.0,2.0,,,,


**Check Missing Values (Very Important)**

In [39]:
df.isnull().sum().sort_values(ascending=False).head(15)

Schedule                  76307
Air.carrier               72241
FAR.Description           56866
Aircraft.Category         56602
Longitude                 54516
Latitude                  54507
Airport.Code              38757
Airport.Name              36185
Broad.phase.of.flight     27165
Publication.Date          13771
Total.Serious.Injuries    12510
Total.Minor.Injuries      11933
Total.Fatal.Injuries      11401
Engine.Type                7096
Report.Status              6384
dtype: int64

## Initial Observations

- The dataset contains aviation accident records across multiple years.
- Several columns contain significant missing values.
- Injury-related fields and aircraft information appear relevant for risk analysis.
- Some categorical fields may require standardization during cleaning.


**TIME COVERAGE**

In [40]:
df["Event.Date"] = pd.to_datetime(df["Event.Date"], errors="coerce")

df["Event.Date"].min(), df["Event.Date"].max()

(Timestamp('1948-10-24 00:00:00'), Timestamp('2022-12-29 00:00:00'))

This confirms:

Earliest year

Latest year

Whether dates parse correctly