
# DSC Phase 1 Project: Aviation Data Analysis

## 1. Business Understanding
The goal of this analysis is to investigate aviation accident data to uncover trends, identify risk factors, and propose recommendations for improving aviation safety in the United States.

## 2. Data Understanding
We'll begin by loading and inspecting the aviation data and U.S. state codes.

In [None]:
import pandas as pd

aviation_df = pd.read_csv("AviationData.csv", encoding="ISO-8859-1")
state_codes_df = pd.read_csv("USState_Codes.csv")

aviation_df.head()
```

## 3. Data Cleaning
We'll parse dates, handle missing values, and extract useful information such as state names from the location.

In [None]:
aviation_df['Event.Date'] = pd.to_datetime(aviation_df['Event.Date'], errors='coerce')
aviation_df['State'] = aviation_df['Location'].str.extract(r',\s*([A-Z]{2})')
aviation_df = aviation_df[aviation_df['Country'] == 'United States']
aviation_df.dropna(subset=['Event.Date', 'State'], inplace=True)
aviation_df.reset_index(drop=True, inplace=True)

## 4. Exploratory Data Analysis
### Accidents per Year

In [None]:
aviation_df['Year'] = aviation_df['Event.Date'].dt.year
accidents_per_year = aviation_df['Year'].value_counts().sort_index()

import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
accidents_per_year.plot(kind='bar')
plt.title("Accidents Per Year")
plt.xlabel("Year")
plt.ylabel("Number of Accidents")
plt.tight_layout()
plt.show()
```

### Most Common States for Accidents

In [None]:
state_counts = aviation_df['State'].value_counts().head(10)
state_counts.plot(kind='bar', color='orange')
plt.title("Top 10 States with Most Accidents")
plt.xlabel("State")
plt.ylabel("Number of Accidents")
plt.tight_layout()
plt.show()

### Fatal Injuries Distribution

In [None]:
aviation_df['Total.Fatal.Injuries'] = pd.to_numeric(aviation_df['Total.Fatal.Injuries'], errors='coerce')
aviation_df['Total.Fatal.Injuries'].hist(bins=20, figsize=(10,5))
plt.title("Distribution of Fatal Injuries")
plt.xlabel("Number of Fatal Injuries")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()

## 5. Key Insights
- Certain years and states experience higher frequencies of accidents.
- Most accidents result in few or no fatalities.
- Weather and flight phase could be explored further for correlation with accident severity.

## 6. Recommendations
- Improve pilot training in states with high accident counts.
- Monitor high-risk phases of flight (e.g., takeoff, landing).
- Enhance weather forecasting and decision-making support.

## 7. Conclusion
This project demonstrates how aviation accident data can be used to identify patterns and inform safety improvements.
