# Dataset Description Questions
* Brief description of the dataset;
* Provide a description of the collection process executed to build the dataset. Discuss the implications of the data collection method on the generated conclusions and insights. Note that you may need to look at relevant sources related to the dataset to be able to provide the necessary information for this part of the project.
* Describe the structure of the dataset file. In the dataset file, what does each row and column represent? How many observations are there in the dataset? How many variables are there in the dataset? If the dataset is composed of different files that you will combine in the succeeding steps, describe the structure and the contents of each file.

* Discuss the variables in each dataset file. What does each variable represent? In this section, all variables, even those which are not used for the study, should be described to the reader. The purpose of each variable in the dataset should be clear to the reader of the notebook without having to go through an external link.


# Dataset Description
The following dataset presents aviation related incidents since 1908 until 2009. It consists of information of over 5,000 airplane crashes around the world. Analyizing similar data is important for the advancement of aviation safety. It contains information such as the time and date of the incident, where it happened, how many people were involved, the type of plane and its operator, and much more.

The dataset was obtained from the Data Society. ## Add methodology

### Rows:
- Each row represents an individual record of an incident involving airplanes since 1908 until 2009.

### Columns:
- **index**: index of the row
- **Date**: date of the incident
- **Time**: time of the incident
- **Location**: location of the incident
- **Operator**: operator of the aircraft
- **Flight #**: flight number of the aircraft
- **Route**: route of the aircraft
- **Type**: type of aircraft
- **Registration**: registration of the aircraft
- **cn/ln**: construction number/serial number of the aircraft
- **Aboard**: number people on board the aircraft
- **Fatalities**: fatalities in the incident
- **Ground**: number of the people on the ground killed
- **Summary**: summary of the incident

# Exploratory Data Analysis
* Identify three (3) exploratory data analysis questions. Properly state the questions in the notebook.
* Answer the EDA questions using both:
    * Numerical Summaries – measures of central tendency, measures of dispersion,and correlation
    * Visualization – Appropriate visualization should be used. Each visualization should be accompanied by a brief explanation.
To emphasize, both numerical summary and visualization should be presented for each question.
The whole process should be supported with verbose textual descriptions of your procedures and
findings.

In [2]:
#Import Libraries
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency

In [3]:
airplane_df = pd.read_csv('Airplane_Crashes_and_Fatalities_Since_1908.csv')
airplane_df.head()

Unnamed: 0,index,Date,Time,Location,Operator,Flight #,Route,Type,Registration,cn/In,Aboard,Fatalities,Ground,Summary
0,0,09/17/1908,17:18,"Fort Myer, Virginia",Military - U.S. Army,,Demonstration,Wright Flyer III,,1.0,2.0,1.0,0.0,"During a demonstration flight, a U.S. Army fly..."
1,1,07/12/1912,06:30,"AtlantiCity, New Jersey",Military - U.S. Navy,,Test flight,Dirigible,,,5.0,5.0,0.0,First U.S. dirigible Akron exploded just offsh...
2,2,08/06/1913,,"Victoria, British Columbia, Canada",Private,-,,Curtiss seaplane,,,1.0,1.0,0.0,The first fatal airplane accident in Canada oc...
3,3,09/09/1913,18:30,Over the North Sea,Military - German Navy,,,Zeppelin L-1 (airship),,,20.0,14.0,0.0,The airship flew into a thunderstorm and encou...
4,4,10/17/1913,10:30,"Near Johannisthal, Germany",Military - German Navy,,,Zeppelin L-2 (airship),,,30.0,30.0,0.0,Hydrogen gas which was being vented was sucked...


In [4]:
airplane_df['Route'].value_counts()

Training                      81
Sightseeing                   29
Test flight                   17
Test                           6
Sao Paulo - Rio de Janeiro     5
                              ..
Manila - Lapu Lapu             1
Saint Denis - Paris            1
Cork - London                  1
Peoria, IL - St. Louis, MO     1
Mechuka for Jorhat             1
Name: Route, Length: 3244, dtype: int64

In [5]:
airplane2_df=airplane_df.copy()
airplane2_df['Route']=airplane2_df['Route'].replace({"Test":"Test flight"})

In [6]:
airplane2_df['Route'].value_counts()

Training                      81
Sightseeing                   29
Test flight                   23
Sao Paulo - Rio de Janeiro     5
Bogota - Barranquilla          4
                              ..
Manila - Lapu Lapu             1
Saint Denis - Paris            1
Cork - London                  1
Peoria, IL - St. Louis, MO     1
Mechuka for Jorhat             1
Name: Route, Length: 3243, dtype: int64

In [7]:
airplane2_df['Route'].unique()

array(['Demonstration', 'Test flight', nan, ..., 'Rio de Janeiro - Paris',
       'Lourdes de BlanSablon - Port Hope Simpson', 'Mechuka for Jorhat'],
      dtype=object)

In [10]:
airplane2_df = airplane_df[airplane_df['Type'].notna()]

# airplane2_df.plot.bar(y="Fatalities", x="Type", figsize=(50,50))
airplane3_df = airplane_df[airplane_df['Operator'].notna()]

airplane3_df.head()

# sortedList = sorted(list(airplane3_df['Operator'].unique()))

# print(sortedList)

# arr = airplane3_df['Operator'].unique().sort()
# print(airplane3_df['Operator'].unique().sort())

# airplane2_df.duplicated()



Unnamed: 0,index,Date,Time,Location,Operator,Flight #,Route,Type,Registration,cn/In,Aboard,Fatalities,Ground,Summary
0,0,09/17/1908,17:18,"Fort Myer, Virginia",Military - U.S. Army,,Demonstration,Wright Flyer III,,1.0,2.0,1.0,0.0,"During a demonstration flight, a U.S. Army fly..."
1,1,07/12/1912,06:30,"AtlantiCity, New Jersey",Military - U.S. Navy,,Test flight,Dirigible,,,5.0,5.0,0.0,First U.S. dirigible Akron exploded just offsh...
2,2,08/06/1913,,"Victoria, British Columbia, Canada",Private,-,,Curtiss seaplane,,,1.0,1.0,0.0,The first fatal airplane accident in Canada oc...
3,3,09/09/1913,18:30,Over the North Sea,Military - German Navy,,,Zeppelin L-1 (airship),,,20.0,14.0,0.0,The airship flew into a thunderstorm and encou...
4,4,10/17/1913,10:30,"Near Johannisthal, Germany",Military - German Navy,,,Zeppelin L-2 (airship),,,30.0,30.0,0.0,Hydrogen gas which was being vented was sucked...


# Research Question
Come up with one (1) research question to answer using the dataset. Here are some
requirements:

* The research question should arise from the exploratory data analysis. There should be an explanation regarding the connection of the research question to the answers obtained from performing exploratory data analysis.
* The research question should be within the scope of the dataset.
* The research question should be answerable by either performing data mining techniques or any domain-specific data modelling technique (i.e., techniques in modelling text, timeseries, graph, or image data) taught in class.
* Make sure to indicate the importance and significance of the research question