# Challenge - Cleaning Data
The Avengers are a well-known and widely-loved team of superheroes in the Marvel universe that were originally introduced in the 1960's comic book series. The recent Disney movies re-popularized them, as part of the new Marvel Cinematic Universe.

Because the writers killed off and revived many of the superheroes, the team at FiveThirtyEight was curious to explore data from the Marvel Wikia site further. To learn how they collected their data, which is available in their [GitHub repository](https://github.com/fivethirtyeight/data/tree/master/avengers), read the write-up they published on the [FiveThirtyEight website](https://fivethirtyeight.com/features/avengers-death-comics-age-of-ultron/).

import pandas as pd

avengers = pd.read_csv("avengers.csv")
avengers.head(5)
avengers.info()

In [None]:
import matplotlib.pyplot as plt
true_avengers = pd.DataFrame()

avengers['Year'].hist()
true_avengers=avengers[avengers['Year']>1960]
print(true_avengers.info())

- Filtering bad data, removing entries for avengers prior to 1960
import matplotlib.pyplot as plt
true_avengers = pd.DataFrame()

avengers['Year'].hist()
true_avengers=avengers[avengers['Year']>1960]
print(true_avengers.info())

- Calculating total number of deaths
`Solution 1`complicated
```
def clean_deaths(row):
    num_deaths = 0
    columns = ['Death1', 'Death2', 'Death3', 'Death4', 'Death5']
    
    for c in columns:
        death = row[c]
        if pd.isnull(death) or death == 'NO':
            continue
        elif death == 'YES':
            num_deaths += 1
    return num_deaths
true_avengers['Deaths'] = true_avengers.apply(clean_deaths, axis=1)

```
`Solution 2` Simple
```
print(true_avengers['Death1'].dtypes)
def is_death(x):
    
    if x=='YES':
        return True
    else:
        return False
Death_columns=['Death1','Death2','Death3','Death4','Death5'] 
dead_avengers=true_avengers[Death_columns]
for c in Death_columns:
    dead_avengers[c]=dead_avengers[c].apply(is_death)

dead_avengers['Deaths']=dead_avengers[Death_columns].sum(axis=1)

true_avengers['Deaths']=dead_avengers['Deaths']
print(true_avengers['Deaths'].head(20))
```

- Verifying Years since Joining
we want to verify that the Years since joining field accurately reflects the Year column. For example, if an Avenger was introduced in the Year 1960, is the Years since joining value for that Avenger 55?
We will:
Calculate the number of rows where Years since joining is accurate.
- Because this data was created in 2015, use that as the reference year.
- We want to know for how many rows `Years since joining` was correctly calculated as the `Year` value subtracted from 2015.
- Assign the integer value describing the number of rows with a correct value for `Years since joining` to `joined_accuracy_count`

```
joined_accuracy_count  = int()
print(true_avengers['Year'].dtypes)
print(true_avengers['Years since joining'].dtypes)
joining=true_avengers[['Year','Years since joining']]
joining['length']=2015-joining['Year']
print(joining['length'].head(20))
joining['accuracy']=joining['length']==joining['Years since joining']
print(joining['accuracy'].head(20))
joined_accuracy_count=joining['accuracy'].sum()
print(joined_accuracy_count)
```
