# Exploratory Data Analysis in Action - EDA: Targets

In this section we explore the [_Arial Bombing Data Set_](https://www.kaggle.com/usaf/world-war-ii) and apply techniques referred to as __Exploratory Data Analysis__.

**Import statements**



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

**Global settings**

In [None]:
pd.options.display.max_rows = 999
pd.options.display.max_columns = 100
plt.rcParams["figure.figsize"] = [15,6]

**Load data set**

In [None]:
import pickle
gdf_europe = pickle.load( open( "../data/gdf_europe.p", "rb" ) )
europe = pickle.load( open( "../data/europe.p", "rb" ) )

## Research questions 

__@Targets__
- Q1: Which cities were the 15 most frequent targets?
- Q2: How much high explosives (in tons) went down on the 25 most frequent targets?
- Q3: How did the aerial attacks change over time for the 15 most frequent targeted cities?


In [None]:
df_tar = gdf_europe.copy()

In [None]:
df_tar.columns

> **Q1: Which cities were the 15 most frequent targets?**

In [None]:
## your code here ...

In [None]:
print("Number of unique cities in the data set:\n", df_tar['Target City'].nunique())
print("---------------------------------------")
most_frequent_cities = df_tar['Target City'].value_counts().sort_values(ascending=False)[:15]
print("Most frequent cities:\n", most_frequent_cities)
print("---------------------------------------")

> **Q2: How much high explosives (in tons) went down on the 25 most frequent targets?**

In [None]:
## your code here ...

In [None]:
list_most_frequent_cities = most_frequent_cities.index
df_cities = df_tar.loc[df_tar["Target City"].isin(list_most_frequent_cities)]
print("Summed high explosives (in tons) per city:\n") 
df_cities.groupby("Target City")["High Explosives Weight (Tons)"].sum().sort_values(ascending=False)

In [None]:
# plot
df_cities.groupby("Target City")["High Explosives Weight (Tons)"].sum().sort_values(ascending=False).plot.bar()
plt.ylabel("High Explosives in tons", size=12);

> **Q3: How did the aerial attacks change over time for the 15 most frequent targeted cities?**

In [None]:
## your code here ...

In [None]:
df_daily_index=pd.date_range(start=df_cities["Mission Date"].min(), end=df_cities["Mission Date"].max(), freq="d")
df_cities.set_index("Mission Date", inplace=True)

fig, ax = plt.subplots(15,1,sharey=True, figsize=(10,32)) 
for e, city in enumerate(list_most_frequent_cities):
    s = df_cities.loc[df_cities["Target City"]==city, "High Explosives Weight (Tons)"].resample("d").sum()
    s = s.reindex(df_daily_index)
    s.cumsum().plot(ax=ax[e])
    ax[e].set_title(city.capitalize())
plt.tight_layout()
plt.suptitle("Accumulated high explosives weight (in Tons) due to arial attacks\nfor the 15 most frequent targeted cities in Germany", size=18)
plt.subplots_adjust(top=0.95)

***