# Exploratory Data Analysis in Action - EDA: Airplanes



In this section we explore the [_Arial Bombing Data Set_](https://www.kaggle.com/usaf/world-war-ii) and apply techniques referred to as __Exploratory Data Analysis__.

**Import statements**



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

**Global settings**

In [None]:
pd.options.display.max_rows = 999
pd.options.display.max_columns = 100
#pd.set_option('display.max_colwidth', -1)
plt.rcParams["figure.figsize"] = [15,6]

**Load data set**

In [None]:
import pickle
gdf_europe = pickle.load( open( "../data/gdf_europe.p", "rb" ) )
europe = pickle.load(open( "../data/europe.p", "rb" ) )
germany = pickle.load(open("../data/germany.p", "rb"))
gdf_germany = pickle.load(open("../data/gdf_germany.p", "rb"))

## Research questions 

__@Airplanes__
- Q1: Which type of airplane types was mostly engaged over?
- Q2: At what height do airplanes operate? At what height to the 10 most common airplane types operate?
- Q3: Which type of airplane carried the heaviest bombs? Which were the 10 most dangerous airplane types with respect to carried explosives?
- Q4: Which Allied Force uses which airplane when and where?

In [None]:
df_airpl = gdf_europe.copy()

In [None]:
df_airpl.columns

> **Q1: Which type of airplane is mostly engaged?**

In [None]:
## your code here

In [None]:
print("Unique airplanes:\n")
df_airpl["Aircraft Series"].unique()

In [None]:
print("Most enganged airplanes:\n")
df_airpl["Aircraft Series"].value_counts()

In [None]:
df_airpl["Aircraft Series"].value_counts().plot.bar(rot=0);

> **Q2: At what height do airplanes operate? At what height to the 10 most common airplane types operate?**

In [None]:
## your code here

In [None]:
# get operating height
print("Operating height for each aircraft:\n")
df_airpl.groupby("Aircraft Series")["Altitude (meters)"].agg(["mean", "min", "max"]).dropna()

In [None]:
# compute 10 most common airplane types
list_ten_most_common = df_airpl["Aircraft Series"].value_counts()[:10].index
print("10 most common airplane types:\n")
list(list_ten_most_common)


In [None]:
ten_most_common = df_airpl.loc[df_airpl["Aircraft Series"].isin(list_ten_most_common)]
print(ten_most_common.shape)

In [None]:
fig, ax = plt.subplots(3,1, figsize=(16,18))

df_airpl.groupby("Aircraft Series")["Altitude (meters)"].mean().dropna().sort_values(ascending=False).plot.bar(rot=0, ax=ax[0])
plt.ylabel("Mean altitude (meters)");
sns.boxplot(x="Aircraft Series", y="Altitude (meters)", data=ten_most_common, ax=ax[1])
sns.violinplot(x="Aircraft Series", y="Altitude (meters)", hue="Country", split=True, data=ten_most_common, ax=ax[2]);

> **Q3: Which type of airplane carried the heaviest bombs? Which were the 10 most dangerous airplane types with respect to carried explosives?**

In [None]:
## your code here

In [None]:
fig, ax = plt.subplots(2,1, figsize=(16,12))
df_airpl.columns
(df_airpl.groupby('Aircraft Series')['High Explosives Weight (Tons)'].
 max().
 dropna().
 sort_values(ascending=False).
 plot.bar(ax=ax[0]))
ax[0].set_title("Aircrafs carring the heaviest explosives weights")

# compute most devastating aircrafts
list_ten_dangerous = (df_airpl.groupby('Aircraft Series')['High Explosives Weight (Tons)'].
                      max().sort_values(ascending=False).
                      dropna()[:10].index)
ten_dangerous = df_airpl.loc[df_airpl["Aircraft Series"].isin(list_ten_dangerous)]

sns.boxplot(x="Aircraft Series", y="High Explosives Weight (Tons)", data=ten_dangerous, ax=ax[1]);
plt.tight_layout()

> **Q4: Which Allied Force uses which airplane when and where?**   
_This question is for sure a huge one. We suggest to write a function (or script) that plots  for each year the Allied attacks over Europe for any specified airplane type._

In [None]:
df_airpl['Aircraft Series'].unique()

In [None]:
## your code here

_Note: If you struggle you may take a look at our implementation for this problem. Uncomment and run the cell below and apply the_ `plot_airplane_type_over_europe` _function._

In [None]:
# %load ../src/_solutions/plot_airplane_type_over_europe.py

In [None]:
#plot_airplane_type_over_europe(df_airpl, airplane="B17", kdp=False);

***