## In this task I will read data from a file `data.csv` in folder data that contains two columns: `name` and `age`

In [14]:
import pandas as pd

In [39]:
data = pd.read_csv("./data/data.csv")

## There are potential cases that needs a proper handle, including:
- Missing value in either columns.
- Wrong type of object in the column `age` (float numbers, string, ...)
- Illogical value for value in the column `age` (negative or too big)

To deal with either missing columns either columns, I drop the rows having that value

In [40]:
data = data.dropna(axis=0)
data

Unnamed: 0,name,age
0,John,25
1,Alice,30
2,Bob,22
3,Eva,28
4,Nguyen,abc


The age has to be ensure that they are in a proper type (in this case `Integer`). I perform a coerce operation here, with inappropriate value are removed.

In [41]:
data = data[data['age'].apply(pd.to_numeric, errors='coerce').notna()].dropna()

In [42]:
data = data.astype({'name': pd.StringDtype(), 'age': pd.Int16Dtype()})

For case of illogical value for `age`, I opt for removing negative value

In [43]:
data.dtypes

name    string[python]
age              Int16
dtype: object

In [44]:
data = data[data.apply(lambda x: x['age'] >= 0, axis=1)]

Then I calculate average age of the individuals in the data

In [47]:
data['age'].mean()

26.25