# Analyzing Data

## Prison Helicopter Escapes

We begin by importing some helper commands in the ```helper.py``` file provided by DataQuest.

In [2]:
from helper import *

ImportError: lxml not found, please install it

### Get the Data

Now, let's get the data from the [List of helicopter prison escapes](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes) Wikipedia article.

In [None]:
url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"

In [None]:
data = data_from_url(url)

Remove the lengthy description in the last index of the data

In [None]:
index = 0
for row in data:
    row = row.pop()
    index += 1

Let's print the first three rows.

In [None]:
for row in data[0:3]:
    print(row)

Modify date column to show only the year

In [None]:
for row in data:
    row[0] = fetch_year(row[0])

Print first three rows to show changes.

In [None]:
for row in data[0:3]:
    print(row)

### Attempts by Year

Retrieve earliest and latest years in the data

In [None]:
min_year = min(data, key = lambda x: x[0])[0]
max_year = max(data, key = lambda x: x[0])[0]

In [None]:
print(min_year)
print(max_year)

Store all years in data

In [None]:
years = []
for year in range(min_year, max_year + 1):
    years.append(year)

In [None]:
print(years)

Build a list of attempts per year in the format ```[year, 0]```

In [None]:
attempts_per_year = []
for year in years:
    attempts_per_year.append([year,0])

In [None]:
print(attempts_per_year)

For each year between the ```min_year``` and ```max_year``` (inclusive), count how many times the year shows up in the data

In [None]:
for row in data:
    for year_attempt in attempts_per_year:
        year = year_attempt[0]
        if row[0] == year:
            year_attempt[1] += 1

print(attempts_per_year)


Graphically visualize attempts per year

In [None]:
%matplotlib inline
barplot(attempts_per_year)

**The most helicopter prison break attempts in a year was 3 in four different years: 1986, 2001, 2007, and 2009.**

### Attempts by Country

Count number of occurrences for each country

In [None]:
countries_frequency = df["Country"].value_counts()

In [None]:
print_pretty_table(countries_frequency)

Graphically visualize attempts per country

In [None]:
%matplotlib inline
ax = countries_frequency.plot(kind="barh")
ax.invert_yaxis()

**France had the highest number of helicopter prison breaks with 15 attempts, followed closely by United States with 8.**

Calculate total number of attempts

In [None]:
len(data)

**There have 48 attempts to break out of prison by helicopter.**

In [None]:
france = 15 / 48 * 100
united_states = 8 / 48 * 100
print(france)
print(united_states)

**About 31% of all atttempts were performed in France and 17% in the United States.**

### Helicopter Prison Escape Success Rate

In [None]:
succeeded = df["Succeeded"].value_counts()
print(succeeded)

In [None]:
success_rate = succeeded["Yes"] / sum(succeeded) * 100
print(success_rate)

**The success rate of breaking out of prison by helicopter is 70.83%.**