# Aviation Accidents Analysis

You are part of a consulting firm that is tasked to do an analysis of commercial and passenger jet airline safety. The client (an airline/airplane insurer) is interested in knowing what types of aircraft (makes/models) exhibit low rates of total destruction and low likelihood of fatal or serious passenger injuries in the event of an accident. They are also interested in any general variables/conditions that might be at play. Your analysis will be based off of aviation accident data accumulated from the years 1948-2023. 

Our client is only interested in airplane makes/models that are professional builds and could potentially still be active. Assume a max lifetime of 40 years for a make/model retirement and make sure to filter your data accordingly (i.e. from 1983 onwards). They would also like separate recommendations for small aircraft vs. larger passenger models. **In addition, make sure that claims that you make are statistically robust and that you have enough samples when making comparisons between groups.**


In this summative assessment you will demonstrate your ability to:
- Use Pandas to load, inspect, and clean the dataset appropriately. 
- Transform relevant columns to create measures that address the problem at hand.
- **conduct EDA: visualization and statistical measures to understand the structure of the data**
- **recommend a set of manufacturers to consider as well as specific airplanes conforming to the client's request**
- **discuss the relationship between serious injuries/airplane damage incurred and at least *two* factors at play in the incident. You must provide supporting evidence (visuals, summary statistics, tables) for each claim you make.**

In [1]:
# loading relevant packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)

## Exploratory Data Analysis  
- Load in the cleaned data

In [2]:
df = pd.read_csv("aviation_cleaned.csv")

df.shape
df.head()

Unnamed: 0,Event.Id,Investigation.Type,Accident.Number,Event.Date,Location,Country,Latitude,Longitude,Airport.Code,Airport.Name,Make.and.Model,Injury.Severity,Aircraft.damage,Aircraft.Category,Registration.Number,Make,Model,Amateur.Built,Number.of.Engines,Engine.Type,FAR.Description,Schedule,Purpose.of.flight,Air.carrier,Total.Fatal.Injuries,Total.Serious.Injuries,Total.Minor.Injuries,Total.Uninjured,Weather.Condition,Broad.phase.of.flight,Report.Status,Publication.Date,Total.Occupants,Aircraft.Size,Destroyed.Flag
0,20001214X42331,Accident,ATL83FA140,1983-03-20,"CROSSVILLE, TN",United States,,,,,PIPER PA-28-140,Fatal(1),DESTROYED,Airplane,N9600W,PIPER,PA-28-140,No,1.0,RECIPROCATING,Part 91: General Aviation,,PERSONAL,,1.0,1.0,0.0,0.0,IMC,CRUISE,Probable Cause,02-05-2011,2.0,Small,1
1,20001214X40407,Accident,MKC84FA197,1984-07-03,"WRIGHT, AR",United States,,,,,PIPER PA-18-150,Fatal(1),DESTROYED,Airplane,N4025,PIPER,PA-18-150,No,1.0,RECIPROCATING,Part 137: Agricultural,UNK,AERIAL APPLICATION,,1.0,0.0,0.0,0.0,VMC,MANEUVERING,Probable Cause,15-12-2009,1.0,Small,1
2,20001214X41706,Accident,ATL85FA072,1984-12-30,"DUBLIN, VA",United States,,,PSK,NEW RIVER VALLEY,CESSNA 182A,Fatal(1),DESTROYED,Airplane,N4963D,CESSNA,182A,No,1.0,RECIPROCATING,Part 91: General Aviation,,SKYDIVING,,1.0,0.0,0.0,0.0,VMC,MANEUVERING,Probable Cause,17-10-2016,1.0,Small,1
3,20001214X35509,Accident,DEN85LA064,1985-01-14,"WAPITI, WY",United States,,,,,CESSNA 182Q,Non-Fatal,DESTROYED,Airplane,N759WB,CESSNA,182Q,No,1.0,RECIPROCATING,Part 91: General Aviation,,PERSONAL,,0.0,1.0,1.0,0.0,VMC,MANEUVERING,Probable Cause,12-01-2016,2.0,Small,1
4,20001214X36887,Accident,NYC85FA145B,1985-06-11,"BELMAR, NJ",United States,,,BLM,BELMAR MONMOUTH CO.,CESSNA 152,Fatal(1),DESTROYED,Airplane,N4956B,CESSNA,152,No,1.0,RECIPROCATING,Part 91: General Aviation,,INSTRUCTIONAL,,1.0,1.0,4.0,0.0,VMC,TAKEOFF,Probable Cause,08-04-2013,6.0,Small,1


## Explore safety metrics across models/makes
- Remember that the client is interested in separate recommendations for smaller airplanes and larger airplanes. Choose a passenger threshold of 20 and separate the plane types. 

In [6]:
df["Aircraft.Size"].value_counts()

aircraft_summary = (df.groupby(["Aircraft.Size", "Make.and.Model"])
    .agg(
        accidents=("Destroyed.Flag", "count"),
        destroy_rate=("Destroyed.Flag", "mean"),
        avg_fatal_rate=("Total.Fatal.Injuries", "mean"),
        avg_serious_rate=("Total.Serious.Injuries", "mean")
    )
    .reset_index()
)

aircraft_summary

small_aircraft = aircraft_summary[aircraft_summary["Aircraft.Size"] == "Small"].sort_values(
    ["destroy_rate", "avg_serious_rate", "avg_fatal_rate"]
)

large_aircraft = aircraft_summary[aircraft_summary["Aircraft.Size"] == "Large"].sort_values(
    ["destroy_rate", "avg_serious_rate", "avg_fatal_rate"]
)

print(large_aircraft)
print(small_aircraft)

  Aircraft.Size Make.and.Model  accidents  destroy_rate  avg_fatal_rate  \
0         Large     BOEING 737         97      0.113402       13.824742   

   avg_serious_rate  
0          3.061856  
    Aircraft.Size    Make.and.Model  accidents  destroy_rate  avg_fatal_rate  \
50          Small        CESSNA 195         37      0.000000        0.000000   
25          Small       CESSNA 170A         35      0.000000        0.028571   
43          Small       CESSNA 180H         34      0.000000        0.000000   
77          Small       LUSCOMBE 8A         60      0.000000        0.083333   
82          Small         PIPER J3C         34      0.000000        0.176471   
..            ...               ...        ...           ...             ...   
74          Small       CIRRUS SR22         90      0.266667        0.577778   
114         Small       PIPER PA32R         37      0.270270        0.837838   
55          Small       CESSNA 310R         32      0.281250        1.156250   
105  

#### Analyzing Makes

Explore the human injury risk profile for small and larger Makes:
- choose the 15 makes for each group possessing the lowest mean fatal/seriously injured fraction
- plot the mean fatal/seriously injured fraction for each of these subgroups side-by-side

**Distribution of injury rates: small makes**

Use a violinplot to look at the distribution of the fraction of passengers serious/fatally injured for small airplane makes. Just display makes with the ten lowest mean serious/fatal injury rates.

**Distribution of injury rates: large makes**

Use a stripplot to look at the distribution of the fraction of passengers serious/fatally injured for large airplane makes. Just display makes with the ten lowest mean serious/fatal injury rates.

**Evaluate the rate of aircraft destruction for both small and large aircraft by Make.** 

Sort your results and keep the lowest 15.

#### Provide a short discussion on your findings for your summary statistics and plots:
- Make any recommendations for Makes here based off of the destroyed fraction and fraction fatally/seriously injured
- Comment on the calculated statistics and any corresponding distributions you have visualized.

### Analyze plane types
- plot the mean fatal/seriously injured fraction for both small and larger planes 
- also provide a distributional plot of your choice for the fatal/seriously injured fraction by airplane type (stripplot, violin, etc)  
- filter ensuring that you have at least ten individual examples in each model/make to average over

**Larger planes**

**Smaller planes**
- for smaller planes, limit your plotted results to the makes with the 10 lowest mean serious/fatal injury fractions

### Discussion of Specific Airplane Types
- Discuss what you have found above regarding passenger fraction seriously/ both small and large airplane models.

### Exploring Other Variables
- Investigate how other variables effect aircraft damage and injury. You must choose **two** factors out of the following but are free to analyze more:

- Weather Condition
- Engine Type
- Number of Engines
- Phase of Flight
- Purpose of Flight

For each factor provide a discussion explaining your analysis with appropriate visualization / data summaries and interpreting your findings.