# Wildlife Strike Analysis

*This notebook should be used to record all of your analysis.*

## Imports

In [1]:
import pandas as pd

from redshift_connector import connect

import altair as alt

from scipy.stats import zscore

### Making a Connection

In [2]:
def get_db_connection():
    return connect(host="c17-redshift-cluster.cdq12ms5gjyk.eu-west-2.redshift.amazonaws.com",
                   database="dw_air_travel",
                   user="najma_hassan",
                   password="Nnajma_71",
                   port=5439)


conn = get_db_connection()
curs = conn.cursor()

**How significant a problem are wildlife strikes?**


1) Get the relevant columns from wildlife_strike table to answer this question and load into pd dataframe

In [73]:
query = """SELECT incident_date ,cost_repairs, indicated_damage, 
nr_injuries,nr_fatalities
FROM wildlife_strike"""
curs.execute(query)
strikes_significance = curs.fetch_dataframe()

2) Clean the data

In [82]:
strikes_significance['incident_date'] = pd.to_datetime(strikes_significance['incident_date']).dt.year

In [84]:
# Only have rows where year is after 2010
strikes_significance = strikes_significance[strikes_significance['incident_date'] >= 2010]

In [76]:
def convert_str_to_ints(df, column_name):
    df[column_name] = df[column_name].replace('', 0)
    df[column_name] = pd.to_numeric(df[column_name]).astype(int)
    return df 
    

In [None]:
strikes_significance = convert_str_to_ints(strikes_significance, 'nr_injuries')

strikes_significance = convert_str_to_ints(strikes_significance, 'nr_fatalities')

strikes_significance = convert_str_to_ints(strikes_significance, 'cost_repairs')

strikes_significance = convert_str_to_ints(strikes_significance, 'indicated_damage')

In [85]:
strikes_significance.head()

Unnamed: 0,incident_date,cost_repairs,indicated_damage,nr_injuries,nr_fatalities
87761,2010,0,0,0,0
87762,2010,0,0,0,0
87763,2010,0,0,0,0
87764,2010,0,0,0,0
87765,2010,0,0,0,0


3) Make the Visualisations

_Annual trend chart_

In [93]:
annual_strikes = strikes_significance.groupby('incident_date').size().reset_index(name='Total Incidents')

In [98]:
yearly_trends = alt.Chart(annual_strikes, title = 'Yearly Trend of Wildlife Strikes (2010-2024)').mark_line().encode(
    x = alt.X('incident_date', title='Year of Incident'),
    y = 'Total Incidents',
    tooltip=['incident_date', 'Total Incidents']
)
yearly_trends

_Cost repairs chart_

In [113]:
strikes_significance['indicated_damage'].value_counts()
damages = pd.DataFrame(columns=['Damage Reported', 'Count'], data=[['Yes', 9303], ['No', 193929]])

In [138]:
strikes_significance[strikes_significance['incident_date'] == 2024 ]['indicated_damage'].value_counts()
damages_2024 = pd.DataFrame(columns=['Damage Reported', 'Count'], data=[
                       ['Yes', 347], ['No', 7310]])

In [129]:
damages_chart = alt.Chart(damages, title = 'Proportion of Reported Damage in Wildlife Strikes').mark_arc().encode(
    theta='Count',
    color= alt.Color('Damage Reported').scale(scheme='bluegreen'),
    tooltip=['Count', 'Damage Reported'],
)
damages_chart

In [141]:
damages_2024_chart = alt.Chart(damages_2024, title='Proportion of Reported Damage in Wildlife Strikes 2024').mark_arc().encode(
    theta='Count',
    color=alt.Color('Damage Reported').scale(scheme='bluegreen'),
    tooltip=['Count', 'Damage Reported'],
)
damages_2024_chart

In [149]:
injuries = strikes_significance.groupby('incident_date')['nr_injuries'].sum().reset_index()
fatalities = strikes_significance.groupby(
    'incident_date')['nr_fatalities'].sum().reset_index()

In [154]:
injury_fatalities = injuries.merge(fatalities)


In [158]:
cost_repairs_by_year = strikes_significance.groupby('incident_date')['cost_repairs'].sum().reset_index()

In [None]:
cost_repairs_chart = alt.Chart(cost_repairs_by_year, title = 'Total Cost of Repairs Per year').mark_bar().encode(
    x = alt.X('cost_repairs:Q', title='Cost of Repairs'),
    y= alt.Y('incident_date:O', title='Year'),
    tooltip=['cost_repairs','incident_date'],
    color = alt.Color('cost_repairs')
)
cost_repairs_chart

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1571737245.py, line 4)

**Are strikes by particular animals more likely/dangerous than others?**


Unnamed: 0,incident_date,cost_repairs,indicated_damage,nr_injuries,nr_fatalities
87761,2010,0,0,0,0
87762,2010,0,0,0,0
87763,2010,0,0,0,0
87764,2010,0,0,0,0
87765,2010,0,0,0,0
...,...,...,...,...,...
304319,2024,0,0,0,0
304320,2024,0,0,0,0
304321,2024,0,0,0,0
304322,2024,0,0,0,0


**When and in what conditions are strikes most likely?**


**Which airlines/airports/states would be likely potential customers for any of this technology?**
