# Drone strike dataset

**Challenge dataset**

---

This dataset, located in the `drone_strikes` folder, has a variety of information about drone strikes.

I don't describe the columns, that is up to you as a group to infer/figure out.
    
This dataset is challenging for a variety of reasons. It is not cleaned up - there are missing cells. Relationships between variables are more complicated. A lot of the variables are in string format, and if you are interested in them, may need to be "recoded" as binary 1 vs. 0 in a new column.

**If you choose this dataset it is as much a data cleaning lab as an EDA lab. Buyer beware.**

---

### Requirements

As a group you should:

1. Load and clean the data with pandas. You will probably want to remove variables you are not interested first so that cleaning is easier.
2. Identify variables and subsets of the data your are interested in as a group.
2. Describe the data and investigate any outliers for those variables.
3. Explore relationships between variables.
4. Visualize at least three variables of your choice with appropriate visualizations. They should be understandable.
5. Visualize subsets of the variables you chose, subsetted conditional on some other variable. For example, number of civillians killed by area.
6. Write a brief report on at least 5 things you found interesting about the data or, if it doesn't interest you at all, things you found out and why they are boring.

In [54]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set_style('darkgrid')

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

drone_file = 'drones.csv'
drone = pd.read_csv(drone_file)

In [55]:
drone.isnull().sum()

Strike ID                     0
Bureau ID                     0
Date                          0
Time                        304
Location                      0
Area                          0
Target                      316
Target Group                257
Westerners involved         379
Minimum Total Killed          0
Mean Total Killed            98
Maximum Total Killed          0
Number of deaths              8
AQ/TB Killed                340
Minimum civilians killed    208
Maximum civilians killed    208
Civilians Killed            114
Min injured                  76
Max injured                  76
Injured                      49
Minimum children killed     312
Max children killed         313
Children Killed             281
Pakistani approval          362
Short Summary                 1
Related ID                  334
Notes                       378
dtype: int64

In [56]:
drone.head(2)

Unnamed: 0,Strike ID,Bureau ID,Date,Time,Location,Area,Target,Target Group,Westerners involved,Minimum Total Killed,...,Min injured,Max injured,Injured,Minimum children killed,Max children killed,Children Killed,Pakistani approval,Short Summary,Related ID,Notes
0,3,B1,6/17/04,1/0/00,Wana,South Waziristan,Nek Mohammed,,,6,...,1.0,1.0,At least 1,2.0,2.0,2.0,Likely,"First known drone strike in Pakistan kills 7, ...",,
1,4,B2,5/8/05,1/0/00,Toorikhel,North Waziristan,Haitham al-Yemeni,,,2,...,,,,,,,Likely,"Two killed, including al Qaeda operative, near...",,


In [57]:
drone.drop(['Strike ID','Bureau ID','Time','Related ID','Notes','Mean Total Killed','Number of deaths'],axis=1,inplace=True)

In [58]:
drone.head(2)

Unnamed: 0,Date,Location,Area,Target,Target Group,Westerners involved,Minimum Total Killed,Maximum Total Killed,AQ/TB Killed,Minimum civilians killed,Maximum civilians killed,Civilians Killed,Min injured,Max injured,Injured,Minimum children killed,Max children killed,Children Killed,Pakistani approval,Short Summary
0,6/17/04,Wana,South Waziristan,Nek Mohammed,,,6,8,5,2.0,2.0,2.0,1.0,1.0,At least 1,2.0,2.0,2.0,Likely,"First known drone strike in Pakistan kills 7, ..."
1,5/8/05,Toorikhel,North Waziristan,Haitham al-Yemeni,,,2,2,1 al-Qaeda,,,,,,,,,,Likely,"Two killed, including al Qaeda operative, near..."


In [120]:
civ_mask = (~drone['Civilians Killed'].isnull()&(drone['Minimum civilians killed'].isnull()|drone['Maximum civilians killed'].isnull()))
# print drone[civ_mask].ix[:,['Minimum civilians killed','Maximum civilians killed','Civilians Killed']]
drone['Minimum Total Killed'] = pd.to_numeric(drone['Minimum Total Killed'],errors='coerce')
drone['Maximum Total Killed'] = pd.to_numeric(drone['Maximum Total Killed'],errors='coerce')
drone['Civilians Killed'] = pd.to_numeric(drone['Civilians Killed'],errors='coerce')
civ_sub = drone[civ_mask]
civ_sub.reset_index(inplace=True)
for index, value in enumerate(civ_sub['Civilians Killed']):
    civ_sub['Minimum civilians killed'][index] = value
    civ_sub['Maximum civilians killed'][index] = value

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,index,Date,Location,Area,Target,Target Group,Westerners involved,Minimum Total Killed,Maximum Total Killed,AQ/TB Killed,...,Maximum civilians killed,Civilians Killed,Min injured,Max injured,Injured,Minimum children killed,Max children killed,Children Killed,Pakistani approval,Short Summary
0,19,8/27/08,Ganki Khel,South Waziristan,,AQ,,0,0,0.0,...,0.0,0.0,4.0,4.0,4.0,,,,,"Attack injures four. Ganki Khel, South Waziris..."
1,160,4/16/10,Toorikhel,North Waziristan,,,,3,4,,...,0.0,0.0,2.0,2.0,0.0,,,,,6 alleged militants and rescuers killed in att...
2,187,10/2/10,Inzarkas,North Waziristan,,,,7,14,,...,0.0,0.0,2.0,2.0,2.0,,,,,"8-9 people, possibly militants, killed in atta..."
3,229,6/15/11,Karez,South Waziristan,,,,0,0,,...,0.0,0.0,0.0,0.0,0.0,,,,,None killed as car occupants escape drone stri...
4,319,5/23/12,Datta Khel Kalai,North Waziristan,,,,4,5,,...,0.0,0.0,2.0,2.0,,,,0.0,,4-5 alleged militants killed and several injur...


In [111]:
inj_mask = (~drone['Injured'].isnull()&(drone['Min injured'].isnull()|drone['Max injured'].isnull()))
#print drone[inj_mask].ix[:,['Min injured','Max injured','Injured']]
drone['Min injured'] = pd.to_numeric(drone['Min injured'],errors='coerce')
drone['Max injured'] = pd.to_numeric(drone['Max injured'],errors='coerce')
drone['Injured'] = pd.to_numeric(drone['Injured'],errors='coerce')
print drone[inj_mask].ix[:,['Min injured','Max injured','Injured']]
drone['Min injured'][323] = 0.0
drone['Max injured'][323] = 0.0

     Min injured  Max injured  Injured
323          NaN          NaN      0.0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [119]:
child_mask = (~drone['Children Killed'].isnull()&(drone['Minimum children killed'].isnull()|drone['Max children killed'].isnull()))
#print drone[child_mask].ix[:,['Minimum children killed','Max children killed','Children Killed']]
drone['Children Killed'] = pd.to_numeric(drone['Children Killed'],errors='coerce')
child_sub = drone[child_mask]
child_sub.reset_index(inplace=True)
for index, value in enumerate(child_sub['Children Killed']):
    child_sub['Minimum children killed'][index] = value
    child_sub['Max children killed'][index] = value

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,index,Date,Location,Area,Target,Target Group,Westerners involved,Minimum Total Killed,Maximum Total Killed,AQ/TB Killed,...,Maximum civilians killed,Civilians Killed,Min injured,Max injured,Injured,Minimum children killed,Max children killed,Children Killed,Pakistani approval,Short Summary
0,318,5/5/12,Shawal,North Waziristan,,,,8,10,,...,0.0,,1.0,3.0,1.0,0.0,0.0,0.0,,Up to 10 killed and one injured in a strike. D...
1,319,5/23/12,Datta Khel Kalai,North Waziristan,,,,4,5,,...,,0.0,2.0,2.0,,0.0,0.0,0.0,,4-5 alleged militants killed and several injur...
2,320,5/24/12,Khassokhel near Mir Ali,North Waziristan,,,,8,12,,...,8.0,,3.0,4.0,,0.0,0.0,0.0,,"Up tp 12 killed and 3 injured, mostly civilian..."
3,321,5/26/12,Miranshah,North Waziristan,,,,3,4,,...,,0.0,2.0,2.0,2.0,0.0,0.0,0.0,,3-4 alleged militants killed and 2 injured on ...
4,322,5/28/12,Khassokhel near Mir Ali,North Waziristan,,,,5,10,,...,,0.0,4.0,4.0,4.0,0.0,0.0,0.0,,Up to 10 alleged militants killed and 4 injure...
