# Bird Strikes in Aviation
`Jordan Burmylo-Magrann`

The dataset that will be studied throughout this project is bird strikes by planes. Though this topic may seem plain, there is a lot of information involving the common occurence of planes striking birds while in flight. If you've ever been on a flight before, chances are a flight you were on struck a bird at some given time even though you may not know so. On occasion, these strikes can cause more damage than one would think possible.

Thoughout this project, the similarities and differences of statistics involving these common bird strikes will be explored. Whether it be plane size, type of plane, cost of damage or type of damage caused, or even the type of bird, weather, effect, number of birds struck total, or other types of statistics, this project will compare and contrast the relationships of each. 

## Motivation

This topic jumped out to me as it is vastly different and interesting. Not only did I not know of the commonness of bird strikes, I didn't know of the vast damage it could cause and the impacts it could have. Traveling via flight is a very common mode of travel, and not enough people know about the about how regular planes striking birds has become. 

## Methods

### `Software hygiene`
Keeping data clean and readable, for purpose of viewing and analyzing more effeciently
### `Cleaning dataframe`
Easier to access and refer to columns, certain columns needed editing for plotting purposes
### `Plotting` (Line, Scatter, Heatmap, Boxplot)
These were the most fitting and usable for this data. Created the best logistical views for visualization and analyzing purposes. Also helped describe and show the trends the best


In [3]:
# Basic imports to begin
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
import hvplot.pandas
# Turn off scroll wheel 
hv.plotting.bokeh.element.ElementPlot.active_tools = ['pan']

# Begin by creating a dataframe of the bird strike information
birdstrike_data = pd.read_csv('Data/Bird_strikes.csv')
birdstrike_data.head()


ModuleNotFoundError: No module named 'holoviews'

In [16]:
# Determine what information is given to us and if we need to edit column names for efficiency  
birdstrike_data.columns

Index(['RecordID', 'AircraftType', 'AirportName', 'AltitudeBin', 'MakeModel',
       'NumberStruck', 'NumberStruckActual', 'Effect', 'FlightDate', 'Damage',
       'Engines', 'Operator', 'OriginState', 'FlightPhase',
       'ConditionsPrecipitation', 'RemainsCollected?',
       'RemainsSentToSmithsonian', 'Remarks', 'WildlifeSize', 'ConditionsSky',
       'WildlifeSpecies', 'PilotWarned', 'Cost', 'Altitude', 'PeopleInjured',
       'IsAircraftLarge?'],
      dtype='object')

In [17]:
# Clean columns 
birdstrike_data.columns = [col.lower() for col in birdstrikeData.columns]
birdstrike_data.rename({'recordid': 'record_id', 'aircrafttype': 'aircraft_type', 
                       'airportname': 'airport_name', 'altitudebin': 'altitude_bin', 
                       'makemodel': 'make_model', 'numberstruck': 'number_struck', 
                       'numberstruckactual': 'number_struck_actual', 'flightdate': 'flight_date',
                       'originstate': 'origin_state', 'flightphase': 'flight_phase',
                       'conditionsprecipitation': 'conditions_precipitation', 'remainscollected?': 'remains_collected',
                       'remainssenttosmithsonian': 'remains_sent_to_smithsonian', 'wildlifesize': 'wildlife_size',
                       'conditionssky': 'conditions_sky', 'wildlifespecies': 'wildlife_species',
                       'pilotwarned': 'pilot_warned', 'peopleinjured': 'people_injured', 
                       'isaircraftlarge?': 'is_aircraft_large'}, axis=1, inplace=True)


In [18]:
birdstrike_data.columns

Index(['record_id', 'aircraft_type', 'airport_name', 'altitude_bin',
       'make_model', 'number_struck', 'number_struck_actual', 'effect',
       'flight_date', 'damage', 'engines', 'operator', 'origin_state',
       'flight_phase', 'conditions_precipitation', 'remains_collected',
       'remains_sent_to_smithsonian', 'remarks', 'wildlife_size',
       'conditions_sky', 'wildlife_species', 'pilot_warned', 'cost',
       'altitude', 'people_injured', 'is_aircraft_large'],
      dtype='object')

In [90]:
# Begin analyzing
birdstrike_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25429 entries, 0 to 25428
Data columns (total 27 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   record_id                    25429 non-null  int64 
 1   aircraft_type                25429 non-null  object
 2   airport_name                 25429 non-null  object
 3   altitude_bin                 25429 non-null  object
 4   make_model                   25429 non-null  object
 5   number_struck                25429 non-null  object
 6   number_struck_actual         25429 non-null  int64 
 7   effect                       2078 non-null   object
 8   flight_date                  25429 non-null  object
 9   damage                       25429 non-null  object
 10  engines                      25195 non-null  object
 11  operator                     25429 non-null  object
 12  origin_state                 24980 non-null  object
 13  flight_phase                 25

In [91]:
birdstrike_data.describe()

Unnamed: 0,record_id,number_struck_actual,altitude,people_injured
count,25429.0,25429.0,25429.0,25429.0
mean,253800.148767,2.699634,799.028432,0.000826
std,38472.800499,12.825804,1740.079843,0.047339
min,1195.0,1.0,0.0,0.0
25%,225742.0,1.0,0.0,0.0
50%,248609.0,1.0,50.0,0.0
75%,269044.0,1.0,700.0,0.0
max,321909.0,942.0,18000.0,6.0


In [92]:
# Missing data/information?
birdstrike_data.isnull().sum()

record_id                          0
aircraft_type                      0
airport_name                       0
altitude_bin                       0
make_model                         0
number_struck                      0
number_struck_actual               0
effect                         23351
flight_date                        0
damage                             0
engines                          234
operator                           0
origin_state                     449
flight_phase                       0
conditions_precipitation       23414
remains_collected                  0
remains_sent_to_smithsonian        0
remarks                         4761
wildlife_size                      0
conditions_sky                     0
wildlife_species                   0
pilot_warned                       0
cost                               0
altitude                           0
people_injured                     0
is_aircraft_large                  0
years                              0
d

In [153]:
# Convert FlightDate to datetime
birdstrike_data['flight_date'] = pd.to_datetime(birdstrike_data['flight_date'])


In [155]:
import Modules.BirdStrikePlotter as mbsv
mbsv.BirdStrikePlotter
plotter = BirdStrikePlotter(birdstrike_data)
dashboard = plotter.create_dashboard()
dashboard




NameError: name 'BirdStrikePlotter' is not defined

In [123]:
# Create a line plot of bird strikes over time
time_series = birdstrike_data.groupby('flight_date').size().reset_index(name='count')
birdstrike_time_lineplot = time_series.hvplot.line(
    x='flight_date', 
    y='count', 
    title='Bird Strikes Over Time', 
    xlabel = 'Flight Date',
    ylabel = 'Bird Strike Count',
    width=800, 
    height=400
).opts(tools=['hover'])

In [124]:
# Create a scatter plot of altitude vs. number struck
altitude_numberstruck_scatter = birdstrike_data.hvplot.scatter(
    x='altitude', 
    y='number_struck_actual', 
    title='Altitude vs. Number of Birds Struck', 
    xlabel = 'Altitude',
    ylabel = 'Number of birds struck',
    width=600, 
    height=400,
    color='wildlife_size',
    size=8,
    alpha=0.6
).opts(tools=['hover'])

In [125]:
# Extract month and year from flight date
birdstrike_data['month'] = birdstrike_data['flight_date'].dt.month
birdstrike_data['year'] = birdstrike_data['flight_date'].dt.year

# Create heatmap of strikes by month and year
month_year_strikes_heatmap = birdstrike_data.groupby(['year', 'month']).size().unstack().hvplot.heatmap(
    title='Bird Strikes by month and year',
    width=800,
    height=400,
    cmap='YlOrRd',
    xlabel='Month',
    ylabel='Year'
).opts(tools=['hover'])

In [127]:
# Create a box plot of altitude by wildlife size
altitude_wildlifesize_boxplot = birdstrike_data.hvplot.box(
    y='altitude',
    by='wildlife_size',
    title='Altitude Distribution by Wildlife Size',
    xlabel = 'Wildlife Sizes',
    ylabel = 'Altitude',
    width=600,
    height=400
).opts(tools=['hover'])

In [129]:
# Combine them for better view
combined_plot = (birdstrike_time_lineplot + altitude_numberstruck_scatter + 
                 month_year_strikes_heatmap + altitude_wildlifesize_boxplot).cols(2)
combined_plot

In [156]:
# Import panel for dashboard
import panel as pn

In [157]:
# Create a dashboard
dashboard = pn.Column(
    pn.Row(birdstrike_time_lineplot, altitude_numberstruck_scatter),
    pn.Row(month_year_strikes_heatmap, altitude_wildlifesize_boxplot)
)

## Main Results

In [159]:
dashboard.show()

Launching server at http://localhost:33745


<panel.io.server.Server at 0x7ba98d83f4d0>

In general, my findings are that bird strikes over time have generally increased as the years went on. In general, birds get struck at a lower altitude, aside from some outliers that get struck at high altitudes. For the most part lower altitudes are where bird strikes occur. An interesting find was that the highest quantity of bird strikes take place from July-October, potentially because of weather and migration patterns. A noticeable spike in this data in terms of how many birds were struck took place in August of both 2009 and 2010. Another mentionable stat is that small birds, for the most part, get struck at very low altitudes, while medium and large birds get struck at a greater span from low to pretty high altitudes. 

## Conclusion

Overall there were great findings that came from this data set. Most birds get struck at lower altitudes, while there are some outliers, specifically medium birds, getting struck more often at high altitudes. Recorded time of strike spikes from July-October with the highest strike month being August. Migration and other factors may have something to do with these facts. In the end, we need to find more ways to preserve bird life and reduce our strike numbers moving forward as there is a large amount killed annually. 