# Johnny's Diaper Data 

In [174]:
import pandas as pd

First, we import the Diaper Data.

In [175]:
diaperdata = pd.read_csv('https://raw.githubusercontent.com/johnnymango/IS362stuff/master/diaperdata.csv')
diaperdata.head()

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,Note
0,BM+Wet,No,11/29 13:30,2 months 20 days old,
1,Wet,No,11/29 10:00,2 months 20 days old,
2,Wet,No,11/29 08:00,2 months 20 days old,
3,BM+Wet,No,11/28 20:00,2 months 19 days old,
4,BM+Wet,No,11/28 17:00,2 months 19 days old,


The Note Colum is empty and will not be used - so it is dropped.

In [176]:
diaperdata= diaperdata.drop(['Note'], axis=1)
diaperdata.head()

Unnamed: 0,Status,Diaper Leaked,Date Time,Age
0,BM+Wet,No,11/29 13:30,2 months 20 days old
1,Wet,No,11/29 10:00,2 months 20 days old
2,Wet,No,11/29 08:00,2 months 20 days old
3,BM+Wet,No,11/28 20:00,2 months 19 days old
4,BM+Wet,No,11/28 17:00,2 months 19 days old


I'd like to be able to perform calculations on the Age -- but it is a string value.  I need to extract the month and days and convert them to numeric values.  I start by extracting the month portion of the string.

In [177]:
diaperdata['age_month'] = diaperdata.Age.str.split(' month').str.get(0)

In [179]:
diaperdata.head()

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,age_month
0,BM+Wet,No,11/29 13:30,2 months 20 days old,2
1,Wet,No,11/29 10:00,2 months 20 days old,2
2,Wet,No,11/29 08:00,2 months 20 days old,2
3,BM+Wet,No,11/28 20:00,2 months 19 days old,2
4,BM+Wet,No,11/28 17:00,2 months 19 days old,2


This string manipulation mostly works but variances in how the data is captured created some bad cells in my age_month column at the tail.

In [180]:
diaperdata.tail(20)

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,age_month
233,Wet,No,10/12 08:10,1 month 3 days old,1
234,BM+Wet,No,10/12 01:42,1 month 3 days old,1
235,BM+Wet,No,10/11 21:39,1 month 2 days old,1
236,Wet,No,10/11 18:30,1 month 2 days old,1
237,BM+Wet,No,10/11 17:15,1 month 2 days old,1
238,BM+Wet,No,10/11 16:50,1 month 2 days old,1
239,Wet,No,10/11 14:43,1 month 2 days old,1
240,BM+Wet,No,10/11 09:51,1 month 2 days old,1
241,BM+Wet,Yes,10/11 07:21,1 month 2 days old,1
242,BM+Wet,No,10/08 18:11,29 days old,29 days old


To fix the issue, I select the rows and update their values to zero.

In [182]:
diaperdata.ix[242:253, ['age_month']] = 0
diaperdata.tail(20)

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,age_month
233,Wet,No,10/12 08:10,1 month 3 days old,1
234,BM+Wet,No,10/12 01:42,1 month 3 days old,1
235,BM+Wet,No,10/11 21:39,1 month 2 days old,1
236,Wet,No,10/11 18:30,1 month 2 days old,1
237,BM+Wet,No,10/11 17:15,1 month 2 days old,1
238,BM+Wet,No,10/11 16:50,1 month 2 days old,1
239,Wet,No,10/11 14:43,1 month 2 days old,1
240,BM+Wet,No,10/11 09:51,1 month 2 days old,1
241,BM+Wet,Yes,10/11 07:21,1 month 2 days old,1
242,BM+Wet,No,10/08 18:11,29 days old,0


I perform the same string manipulation to extract the age in days in a new column.  While this works as well, it creates bad cells as well in the column.

In [183]:
diaperdata['age_days'] = diaperdata.Age.str.split(' ').str.get(2)
diaperdata.tail(20)

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,age_month,age_days
233,Wet,No,10/12 08:10,1 month 3 days old,1,3
234,BM+Wet,No,10/12 01:42,1 month 3 days old,1,3
235,BM+Wet,No,10/11 21:39,1 month 2 days old,1,2
236,Wet,No,10/11 18:30,1 month 2 days old,1,2
237,BM+Wet,No,10/11 17:15,1 month 2 days old,1,2
238,BM+Wet,No,10/11 16:50,1 month 2 days old,1,2
239,Wet,No,10/11 14:43,1 month 2 days old,1,2
240,BM+Wet,No,10/11 09:51,1 month 2 days old,1,2
241,BM+Wet,Yes,10/11 07:21,1 month 2 days old,1,2
242,BM+Wet,No,10/08 18:11,29 days old,0,old


The rows are updated to reflect the correct values.

In [185]:
diaperdata.ix[242:246, ['age_days']] = 29
diaperdata.ix[247:251, ['age_days']] = 28
diaperdata.ix[252, ['age_days']] = 27
diaperdata.ix[89:91, ['age_days']] = 0
diaperdata.tail(20)

Unnamed: 0,Status,Diaper Leaked,Date Time,Age,age_month,age_days
233,Wet,No,10/12 08:10,1 month 3 days old,1,3
234,BM+Wet,No,10/12 01:42,1 month 3 days old,1,3
235,BM+Wet,No,10/11 21:39,1 month 2 days old,1,2
236,Wet,No,10/11 18:30,1 month 2 days old,1,2
237,BM+Wet,No,10/11 17:15,1 month 2 days old,1,2
238,BM+Wet,No,10/11 16:50,1 month 2 days old,1,2
239,Wet,No,10/11 14:43,1 month 2 days old,1,2
240,BM+Wet,No,10/11 09:51,1 month 2 days old,1,2
241,BM+Wet,Yes,10/11 07:21,1 month 2 days old,1,2
242,BM+Wet,No,10/08 18:11,29 days old,0,29


The age column that held the original string is no longer needed so I drop it.

In [186]:
diaperdata = diaperdata.drop(['Age'], axis = 1)
diaperdata.head()

Unnamed: 0,Status,Diaper Leaked,Date Time,age_month,age_days
0,BM+Wet,No,11/29 13:30,2,20
1,Wet,No,11/29 10:00,2,20
2,Wet,No,11/29 08:00,2,20
3,BM+Wet,No,11/28 20:00,2,19
4,BM+Wet,No,11/28 17:00,2,19


To perform month on the age columns I need to convert them to numeric data types.

In [187]:
diaperdata['age_month']=diaperdata.age_month.astype(int)
diaperdata['age_days']=diaperdata.age_days.astype(int)
diaperdata.dtypes

Status           object
Diaper Leaked    object
Date Time        object
age_month         int32
age_days          int32
dtype: object