# Thinking about the question at hand

#### EXERCISE:
Since you are given life expectancy level data by country and year, you could ask questions about how much the average life expectancy changes over each year.

Before continuing, however, it's important to make sure that the following assumptions about the data are true:

* <code>'Life expectancy'</code> is the first column (index <code>0</code>) of the DataFrame.
* The other columns contain either null or numeric values.
* The numeric values are all greater than or equal to 0.
* There is only one instance of each country.

You can write a function that you can apply over the entire DataFrame to verify some of these assumptions. Note that spending the time to write such a script will help you when working with other datasets as well.

#### INSTRUCTIONS:
* Define a function called <code>check_null_or_valid()</code> that takes in one argument: <code>row_data</code>.
* Inside the function, convert <code>no_na</code> to a numeric data type using <code>pd.to_numeric()</code>.
* Write an assert statement to make sure the first column (index <code>0</code>) of the <code>g1800s</code> DataFrame is <code>'Life expectancy'</code>.
* Write an assert statement to test that all the values are valid for the <code>g1800s</code> DataFrame. Use the <code>check_null_or_valid()</code> function placed inside the <code>.apply()</code> method for this. Note that because you're applying it over the entire DataFrame, and not just one column, you'll have to chain the <code>.all()</code> method twice, and remember that you don't have to use <code>()</code> for functions placed inside <code>.apply()</code>.
* Write an assert statement to make sure that each country occurs only once in the data. Use the <code>.value_counts()</code> method on the <code>'Life expectancy'</code> column for this. Specifically, index <code>0</code> of <code>.value_counts()</code> will contain the most frequently occuring value. If this is equal to <code>1</code> for the <code>'Life expectancy'</code> column, then you can be certain that no country appears more than once in the data.

#### SCRIPT.PY:

In [9]:
import pandas as pd
g1800s = pd.read_csv("g1800.csv")
def check_null_or_valid(row_data):
    """Function that takes a row of data,
    drops all missing values,
    and checks if all remaining values are greater than or equal to 0
    """
    no_na = row_data.dropna()
    numeric = pd.to_numeric(no_na)
    ge0 = numeric >= 0
    return ge0

# Check whether the first column is 'Life expectancy'
assert g1800s.columns[0] == "Life expectancy"

# Check whether the values in the row are valid
assert g1800s.iloc[:, 1:].apply(check_null_or_valid, axis=1).all().all()

# Check that there is only one instance of each country
assert g1800s['Life expectancy'].value_counts()[0] == 1
