### We have sleep data and we need to answer the question: You have an exam tomorrow, should you get a good night sleep? Would it matter?

In [1]:
!ls

BBC Results .csv                     README.md
Honing My Data Analysis Skills.ipynb


In [2]:
# Import pandas for data analysis and numpy for numerical analysis
import pandas as pd
import numpy as np

In [3]:
# Read the BBC sleep data into a dataframe
sleep_data = pd.read_csv('BBC Results .csv')
sleep_data

Unnamed: 0,Timestamp,How many hours did you sleep last night?,Recognition Score,Temporal Memory Score
0,12/11/2012 18:16:31,7.0,91,86
1,12/13/2012 14:31:16,6.5,95,78
2,12/13/2012 14:31:30,7.0,95,56
3,12/13/2012 14:32:01,5.0,91,81
4,12/13/2012 14:34:07,8.5,100,75
...,...,...,...,...
9151,3/31/2015 19:22:14,8.0,0,0
9152,1/11/2016 6:18:34,8.0,,
9153,5/20/2016 13:48:07,1.0,ee,ee
9154,9/17/2016 5:37:22,5.5,,


In [4]:
# Shape of the data
sleep_data.shape

# 9,156 rows/observations and 4 rows

(9156, 4)

Which of the following conclusions is most likely?

1. The less sleep you have, the better the memory?

2. The more you sleep, the better the the memory?

3. People always get the same memory score

4. There is no relationship between sleep and memory 

In [5]:
# Info of the dataframe 
sleep_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9156 entries, 0 to 9155
Data columns (total 4 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Timestamp                                 9156 non-null   object 
 1   How many hours did you sleep last night?  9154 non-null   float64
 2   Recognition Score                         9097 non-null   object 
 3   Temporal Memory Score                     9083 non-null   object 
dtypes: float64(1), object(3)
memory usage: 286.2+ KB


The datatypes of the Timestamp, Recognition Score and Temporal Memory Score are objects. For an efficient analysis, the datatypes has to be converted to a proper datetime format and integer.

In [6]:
# Convert the Timestamp to datetime
sleep_data['Timestamp'] = pd.to_datetime(sleep_data['Timestamp'])

In [7]:
sleep_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9156 entries, 0 to 9155
Data columns (total 4 columns):
 #   Column                                    Non-Null Count  Dtype         
---  ------                                    --------------  -----         
 0   Timestamp                                 9156 non-null   datetime64[ns]
 1   How many hours did you sleep last night?  9154 non-null   float64       
 2   Recognition Score                         9097 non-null   object        
 3   Temporal Memory Score                     9083 non-null   object        
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 286.2+ KB


In [8]:
# Checking for null values
sleep_data.isnull().sum()

Timestamp                                    0
How many hours did you sleep last night?     2
Recognition Score                           59
Temporal Memory Score                       73
dtype: int64

In [9]:
# How to deal with missing/null/NaN values,extra values
# The sleep_data columns contains %, this makes it difficult to convert
sleep_data['Recognition Score'] = sleep_data['Recognition Score'].astype(str).str.replace('%', '')

In [10]:
sleep_data['Temporal Memory Score'] = sleep_data['Temporal Memory Score'].astype(str).str.replace('%', '')

In [11]:
# Converted object to numeric using the flag errors='coerce'
sleep_data['Recognition Score'] = pd.to_numeric(sleep_data['Recognition Score'], errors='coerce')

In [12]:
sleep_data['Temporal Memory Score'] = pd.to_numeric(sleep_data['Temporal Memory Score'], errors='coerce')

In [13]:
sleep_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9156 entries, 0 to 9155
Data columns (total 4 columns):
 #   Column                                    Non-Null Count  Dtype         
---  ------                                    --------------  -----         
 0   Timestamp                                 9156 non-null   datetime64[ns]
 1   How many hours did you sleep last night?  9154 non-null   float64       
 2   Recognition Score                         9092 non-null   float64       
 3   Temporal Memory Score                     9077 non-null   float64       
dtypes: datetime64[ns](1), float64(3)
memory usage: 286.2 KB


In [14]:
# Checking for null values
sleep_data.isnull().sum()

Timestamp                                    0
How many hours did you sleep last night?     2
Recognition Score                           64
Temporal Memory Score                       79
dtype: int64

In [15]:
sleep_data

Unnamed: 0,Timestamp,How many hours did you sleep last night?,Recognition Score,Temporal Memory Score
0,2012-12-11 18:16:31,7.0,91.0,86.0
1,2012-12-13 14:31:16,6.5,95.0,78.0
2,2012-12-13 14:31:30,7.0,95.0,56.0
3,2012-12-13 14:32:01,5.0,91.0,81.0
4,2012-12-13 14:34:07,8.5,100.0,75.0
...,...,...,...,...
9151,2015-03-31 19:22:14,8.0,0.0,0.0
9152,2016-01-11 06:18:34,8.0,,
9153,2016-05-20 13:48:07,1.0,,
9154,2016-09-17 05:37:22,5.5,,


In [16]:
sleep_data.describe()

Unnamed: 0,How many hours did you sleep last night?,Recognition Score,Temporal Memory Score
count,9154.0,9092.0,9077.0
mean,6.696308,92.174456,78.470823
std,2.27737,20.50372,15.473231
min,0.0,0.0,0.0
25%,6.0,87.0,70.0
50%,7.0,92.0,80.0
75%,8.0,100.0,87.0
max,23.5,200.0,789.0
