# Phase 1 Project 

### Project Overview
For this project, I am required to use data cleaning, imputation, analysis, and visualization to generate insights for a business stakeholder.


# **1. Business Understanding**

## **Business Context**
The company is establishing a new **aviation division** focused on expanding operations through the purchase and management of aircraft. As part of this initiative, leadership seeks to ensure that investment and operational decisions are guided by **evidence-based safety insights** derived from historical data on aircraft accidents and fatalities.

## **Business Problem**
Aircraft procurement and operational planning carry significant financial and safety risks. Without data-driven insights, decisions about which aircraft types, models, or flight purposes to invest in may expose the company to **avoidable operational hazards**, higher insurance costs, or reputational damage resulting from safety incidents.

The primary business challenge is to **identify which aircraft types, makes, and operational categories have historically demonstrated lower accident and fatality risks**, so that purchasing and operational policies can be optimized for safety, cost efficiency, and long-term sustainability.

## **Business Objectives**
The main objectives of this analysis are to:
1. Assess historical aviation accident patterns across aircraft makes, models, and purposes of flight.  
2. Identify high-risk versus low-risk aircraft types and flight purposes based on recorded fatalities and incident severity.  
3. Provide actionable insights that inform aircraft procurement, operational planning, and safety management policies.  
4. Develop a foundation for future **risk-based decision-making**, where accident rates are normalized against exposure data (e.g., flight hours or fleet size).

## **Key Business Questions**
- Which aircraft makes and models have historically recorded the **fewest fatal incidents**?  
- How does **purpose of flight** (e.g., personal, instructional, commercial, aerial application) influence accident severity?  
- What are the **historical trends** in accident frequency and fatality rates, and what do they imply about safety improvements or degradation over time?  
- Based on this analysis, what **purchase and operational priorities** should guide the aviation division’s strategy?

# **2. Data Understanding**

## **Dataset Overview**
The dataset, titled **"Airline Accidents"**, contains historical records of aircraft accidents, including details such as the date, location, operator, aircraft type, purpose of flight, total fatalities, and onboard fatalities. The data spans multiple decades and provides valuable insights into aviation safety patterns across different aircraft and flight types.

This dataset will be used to explore trends in aviation accidents and identify factors associated with higher or lower accident severity, with the goal of improving safety-focused business decisions.

## **Data Source**
The dataset was obtained from a publicly available repository containing **historical aircraft accident records**.  
Each record represents a single aviation incident and includes both numerical and categorical data points relevant to understanding the event.

The data includes the following key fields:
- **Date** – The date the accident occurred.  
- **Location** – The geographical location of the accident.  
- **Operator** – The airline or operator of the aircraft involved.  
- **Aircraft Type** – The make or model of the aircraft.  
- **Purpose of Flight** – The intended operation (e.g., personal, instructional, commercial, or military).  
- **Aboard** – Total number of people aboard the aircraft.  
- **Fatalities** – Number of fatalities from the accident.  
- **Ground** – Number of fatalities on the ground (if any).  
- **Summary** – Brief narrative description of the incident.

### 2.1 Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

## 2.2 Load Dataset

In [5]:
df = pd.read_csv('C:\\Users\\geoff\\OneDrive\\Desktop\\final_project_phase_1\\final_Airline_Accidents_Phase1_Project\\data\\airline_accidents.csv')  # Read the CSV file into a DataFrame named df
df.head()

Unnamed: 0,Event Id,Investigation Type,Accident Number,Event Date,Location,Country,Latitude,Longitude,Airport Code,Airport Name,...,Purpose of Flight,Air Carrier,Total Fatal Injuries,Total Serious Injuries,Total Minor Injuries,Total Uninjured,Weather Condition,Broad Phase of Flight,Report Publication Date,Unnamed: 30
0,20080125X00106,Accident,SEA08CA056,12/31/2007,"Santa Ana, CA",United States,33.675556,-117.868056,SNA,John Wayne - Orange County,...,Instructional,,,,,2.0,VMC,LANDING,02/28/2008,
1,20080206X00141,Accident,CHI08WA075,12/31/2007,"Guernsey, United Kingdom",United Kingdom,49.435,-2.600278,,,...,Unknown,,,,,1.0,,,02/06/2008,
2,20080129X00122,Accident,CHI08CA057,12/30/2007,"Alexandria, MN",United States,45.866111,-95.394444,AXN,Chandler Field Airport,...,Personal,,,,,1.0,VMC,TAKEOFF,02/28/2008,
3,20080114X00045,Accident,LAX08FA043,12/30/2007,"Paso Robles, CA",United States,35.542222,-120.522778,PRB,Paso Robles Airport,...,Personal,,1.0,,,,VMC,MANEUVERING,06/20/2014,
4,20080109X00032,Accident,NYC08FA071,12/30/2007,"Cherokee, AL",United States,34.688611,-87.92,,,...,Other Work Use,,3.0,0.0,0.0,0.0,VMC,MANEUVERING,01/15/2009,
