![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

# Pandas DataFrame exercises


In [296]:
# Import the numpy package under the name np
import numpy as np

# Import the pandas package under the name pd
import pandas as pd

# Import the matplotlib package under the name plt
import matplotlib.pyplot as plt
%matplotlib inline

# Print the pandas version and the configuration
print(pd.__version__)

1.5.3


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame creation

### Create an empty pandas DataFrame


In [297]:
# your code goes here
data = pd.DataFrame(data=[None],
             index=[None],
             columns=[None])

In [298]:
df = pd.DataFrame(data=[None],
             index=[None],
             columns=[None])

In [299]:
assert df is not None

<img width=400 src="https://cdn.dribbble.com/users/4678/screenshots/1986600/avengers.png"></img>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)


### Create a `marvel_df` pandas DataFrame with the given marvel data


In [300]:
marvel_data = [
    ['Spider-Man', 'male', 1962],
    ['Captain America', 'male', 1941],
    ['Wolverine', 'male', 1974],
    ['Iron Man', 'male', 1963],
    ['Thor', 'male', 1963],
    ['Thing', 'male', 1961],
    ['Mister Fantastic', 'male', 1961],
    ['Hulk', 'male', 1962],
    ['Beast', 'male', 1963],
    ['Invisible Woman', 'female', 1961],
    ['Storm', 'female', 1975],
    ['Namor', 'male', 1939],
    ['Hawkeye', 'male', 1964],
    ['Daredevil', 'male', 1964],
    ['Doctor Strange', 'male', 1963],
    ['Hank Pym', 'male', 1962],
    ['Scarlet Witch', 'female', 1964],
    ['Wasp', 'female', 1963],
    ['Black Widow', 'female', 1964],
    ['Vision', 'male', 1968]
]

In [301]:
# your code goes here
marvel_df = pd.DataFrame(marvel_data, columns=['Name', 'Gender', 'Date'])

In [302]:
assert (marvel_df.values == marvel_data).any()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add column names to the `marvel_df`
 

In [303]:
# your code goes here
col_names = ['name', 'sex', 'first_appearance']
marvel_df = pd.DataFrame(marvel_data,columns = col_names)

In [304]:
assert (marvel_df.columns == col_names).all()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add index names to the `marvel_df` (use the character name as index)


In [305]:
marvel_df.index = [
    'Spider-Man',
    'Captain America',
    'Wolverine',
    'Iron Man',
    'Thor',
    'Thing',
    'Mister Fantastic',
    'Hulk',
    'Beast',
    'Invisible Woman',
    'Storm',
    'Namor',
    'Hawkeye',
    'Daredevil',
    'Doctor Strange',
    'Hank Pym',
    'Scarlet Witch',
    'Wasp',
    'Black Widow',
    'Vision',
]

In [306]:
assert (marvel_df.index == marvel_df['name']).all()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop the name column as it's now the index

In [307]:
marvel_df = marvel_df.iloc[:, 1:]

In [308]:
assert (marvel_df.columns == col_names[1:]).all()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop 'Namor' and 'Hank Pym' rows


In [309]:
marvel_df = marvel_df.drop(['Namor', 'Hank Pym'], axis = 0)

In [310]:
# check if the rows have been dropped
assert not marvel_df.index.isin(['Namor', 'Hank Pym']).all() 

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame selection, slicing and indexation

### Show the first 5 elements on `marvel_df`
 

In [311]:
marvel_df.head()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the last 5 elements on `marvel_df`


In [312]:
last_five = marvel_df.tail()

In [313]:
assert (last_five.index == ['Doctor Strange', 'Scarlet Witch', 'Wasp', 'Black Widow', 'Vision']).all()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show just the sex of the first 5 elements on `marvel_df`

In [314]:
gender = marvel_df[['sex']].head()

In [315]:
assert np.unique(gender.values) == 'male'

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first_appearance of all middle elements on `marvel_df` 

In [316]:
value = marvel_df.iloc[1:-1]

In [317]:
assert (value.index == ['Captain America', 'Wolverine', 'Iron Man', 'Thor', 'Thing',
       'Mister Fantastic', 'Hulk', 'Beast', 'Invisible Woman', 'Storm',
       'Hawkeye', 'Daredevil', 'Doctor Strange', 'Scarlet Witch', 'Wasp',
       'Black Widow']).all()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first and last elements on `marvel_df`


In [318]:
# your code goes here
first_last = marvel_df.iloc[[0,-1]]

In [319]:
assert (first_last.index == ["Spider-Man", "Vision"]).all()

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame manipulation and operations

### Modify the `first_appearance` of 'Vision' to year 1964

In [320]:
marvel_df.loc['Vision', 'first_appearance'] = 1964

In [321]:
assert marvel_df.loc['Vision', 'first_appearance'] == 1964

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add a new column to `marvel_df` called 'years_since' with the years since `first_appearance`


In [322]:
marvel_df['years_since'] = 2023 - marvel_df['first_appearance']

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the male characters


In [323]:
all_man = marvel_df[marvel_df['sex'] == 'male']

In [324]:
assert all_man['sex'].unique() == "male"

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with `first_appearance` after 1970


In [325]:
# your code goes here
mask = marvel_df['first_appearance'] > 1970
after = marvel_df[mask]

In [326]:
assert after['first_appearance'].min() > 1970

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the female characters with `first_appearance` after 1970

In [327]:
# your code goes here
result = marvel_df.loc[(marvel_df['sex'] == 'female') & (marvel_df['first_appearance'] > 1970)]

In [328]:
assert result['sex'].unique() == 'female' and result['first_appearance'].min() > 1970

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame summary statistics

### Show basic statistics of `marvel_df`

In [329]:
marvel_df.describe()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the mean value of `first_appearance`

In [330]:
# your code goes here
mean = marvel_df['first_appearace'].mean()

In [331]:
assert np.round(mean) == 1963

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the min value of `first_appearance`


In [333]:
# your code goes here
min = marvel_df['first_appearace'].mean()

In [334]:
assert min == 1941

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with the min value of `first_appearance`

In [335]:
# your code goes here
min_first_appearance = marvel_df['first_appearance'].idxmin()
characters_min_first_appearance = marvel_df.loc[min_first_appearance]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame basic plottings

### Reset index names of `marvel_df`


In [341]:
marvel_df = marvel_df.reset_index()

marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot the values of `first_appearance`


In [342]:
marvel_df['first_appearance'].plot(kind='line', color='red')
plt.title('First Appearance of Marvel Characters')
plt.xlabel('Character')
plt.ylabel('Year')
plt.show()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot a histogram (plot.hist) with values of `first_appearance`


In [343]:
marvel_df['first_appearance'].plot.hist()
plt.show()

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
