## Sleep Health and Lifestyle Dataset

Dataset Columns Description:

* Person ID: An identifier for each individual.

* Gender: The gender of the person (Male/Female).

* Age: The age of the person in years.

* Occupation: The occupation or profession of the person.

* Sleep Duration (hours): The number of hours the person sleeps per day.

* Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.

* Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

* Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

* BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).

* Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.

* Heart Rate (bpm): The resting heart rate of the person in beats per minute.

* Daily Steps: The number of steps the person takes per day.

* Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

* Details about Sleep Disorder Column:
    
  * None: The individual does not exhibit any specific sleep disorder.
    
  * Insomnia: The individual experiences difficulty falling asleep or staying asleep, leading to inadequate or poor-quality sleep.

  * Sleep Apnea: The individual suffers from pauses in breathing during sleep, resulting in disrupted sleep patterns and potential health risks.


In [59]:
import pandas as pd
import numpy as np

In [60]:
df=pd.read_csv("Sleep_health_and_lifestyle_dataset.csv")

In [61]:
df

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [62]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [63]:
df.tail()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
373,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


In [65]:
df.ndim

2

In [66]:
df.columns

Index(['Person ID', 'Gender', 'Age', 'Occupation', 'Sleep Duration',
       'Quality of Sleep', 'Physical Activity Level', 'Stress Level',
       'BMI Category', 'Blood Pressure', 'Heart Rate', 'Daily Steps',
       'Sleep Disorder'],
      dtype='object')

In [67]:
df.shape

(374, 13)

In [68]:
df.size

4862

In [69]:
df.nunique()

Person ID                  374
Gender                       2
Age                         31
Occupation                  11
Sleep Duration              27
Quality of Sleep             6
Physical Activity Level     16
Stress Level                 6
BMI Category                 4
Blood Pressure              25
Heart Rate                  19
Daily Steps                 20
Sleep Disorder               2
dtype: int64

In [70]:
df.isnull().sum()

Person ID                    0
Gender                       0
Age                          0
Occupation                   0
Sleep Duration               0
Quality of Sleep             0
Physical Activity Level      0
Stress Level                 0
BMI Category                 0
Blood Pressure               0
Heart Rate                   0
Daily Steps                  0
Sleep Disorder             219
dtype: int64

### description of the dataset

In [71]:
df.describe()

Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0


In [72]:
df.describe(include='object')

Unnamed: 0,Gender,Occupation,BMI Category,Blood Pressure,Sleep Disorder
count,374,374,374,374,155
unique,2,11,4,25,2
top,Male,Nurse,Normal,130/85,Sleep Apnea
freq,189,73,195,99,78


In [73]:
df.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Person ID,374.0,,,,187.5,108.108742,1.0,94.25,187.5,280.75,374.0
Gender,374.0,2.0,Male,189.0,,,,,,,
Age,374.0,,,,42.184492,8.673133,27.0,35.25,43.0,50.0,59.0
Occupation,374.0,11.0,Nurse,73.0,,,,,,,
Sleep Duration,374.0,,,,7.132086,0.795657,5.8,6.4,7.2,7.8,8.5
Quality of Sleep,374.0,,,,7.312834,1.196956,4.0,6.0,7.0,8.0,9.0
Physical Activity Level,374.0,,,,59.171123,20.830804,30.0,45.0,60.0,75.0,90.0
Stress Level,374.0,,,,5.385027,1.774526,3.0,4.0,5.0,7.0,8.0
BMI Category,374.0,4.0,Normal,195.0,,,,,,,
Blood Pressure,374.0,25.0,130/85,99.0,,,,,,,


### checking duplicates

In [74]:
df.duplicated().sum()

0

In [75]:
df.duplicated(subset=['Person ID']).sum()

0

In [76]:
df.dtypes

Person ID                    int64
Gender                      object
Age                          int64
Occupation                  object
Sleep Duration             float64
Quality of Sleep             int64
Physical Activity Level      int64
Stress Level                 int64
BMI Category                object
Blood Pressure              object
Heart Rate                   int64
Daily Steps                  int64
Sleep Disorder              object
dtype: object

In [77]:
df['Occupation'].unique()

array(['Software Engineer', 'Doctor', 'Sales Representative', 'Teacher',
       'Nurse', 'Engineer', 'Accountant', 'Scientist', 'Lawyer',
       'Salesperson', 'Manager'], dtype=object)

In [78]:
doctors=df[df['Occupation']=='Doctor']
doctors

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
7,8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
8,9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
9,10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
266,267,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia
276,277,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
277,278,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
341,342,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,


In [79]:
Software_Engineer=df[df['Occupation']=='Software Engineer']
Software_Engineer

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
84,85,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,
92,93,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,


## concat()
* The concat() function in pandas is used to concatenate (combine) two or more DataFrames or Series along a particular axis (row-wise or column-wise). * It’s commonly used when you need to stack data vertically (one DataFrame below another) or horizontally (one DataFrame beside another).
* When concatenating along rows, the DataFrames are stacked on top of each other. The number of columns should ideally be the same, but if they are       not, missing values will be filled with NaN.

In [80]:
df1=pd.concat([doctors,Software_Engineer])
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
7,8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
8,9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
9,10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
342,343,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
84,85,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,


In [81]:
#### if columns are not same

In [82]:
# Sample DataFrame 1
data1 = {
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}
df1 = pd.DataFrame(data1)

# Sample DataFrame 2
data2 = {
    'E': ['E0', 'E1', 'E2', 'E3'],
    'F': ['F0', 'F1', 'F2', 'F3'],
    'G': ['G0', 'G1', 'G2', 'G3'],
    'H': ['H0', 'H1', 'H2', 'H3']
}
df2 = pd.DataFrame(data2)

In [90]:
df1


Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3


In [91]:
df2

Unnamed: 0,A,B,D
0,A4,B4,D4
1,A5,B5,D5
2,A6,B6,D6
3,A7,B7,D7


In [92]:
df3=pd.concat([df1,df2],axis=1)

In [93]:
df3

Unnamed: 0,A,B,C,A.1,B.1,D
0,A0,B0,C0,A4,B4,D4
1,A1,B1,C1,A5,B5,D5
2,A2,B2,C2,A6,B6,D6
3,A3,B3,C3,A7,B7,D7


### join='inner'

In [96]:
# Sample DataFrame 1
data1 = {
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3']
}
df11 = pd.DataFrame(data1)

# Sample DataFrame 2
data2 = {
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}
df22 = pd.DataFrame(data2)

In [97]:
df11

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
3,A3,B3,C3


In [98]:
df22

Unnamed: 0,A,B,D
0,A4,B4,D4
1,A5,B5,D5
2,A6,B6,D6
3,A7,B7,D7


In [99]:
df33=pd.concat([df1,df2],join='inner').reset_index(drop=True)
df33

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4
5,A5,B5
6,A6,B6
7,A7,B7


In [100]:
df33=pd.concat([df1,df2],join='outer').reset_index(drop=True)
df33

Unnamed: 0,A,B,C,D
0,A0,B0,C0,
1,A1,B1,C1,
2,A2,B2,C2,
3,A3,B3,C3,
4,A4,B4,,D4
5,A5,B5,,D5
6,A6,B6,,D6
7,A7,B7,,D7


## merge()

In [101]:
# Sample DataFrames
df111 = pd.DataFrame({
    'Key': ['A', 'B', 'C'],
    'Value1': [1, 2, 3]
})

df222 = pd.DataFrame({
    'Key': ['B', 'C', 'D'],
    'Value2': [4, 5, 6]
})

In [102]:
df111

Unnamed: 0,Key,Value1
0,A,1
1,B,2
2,C,3


In [103]:
df222

Unnamed: 0,Key,Value2
0,B,4
1,C,5
2,D,6


In [108]:
df=pd.merge(df111,df222,on='Key',how="inner")
df

Unnamed: 0,Key,Value1,Value2
0,B,2,4
1,C,3,5


In [110]:
df1=pd.merge(df111,df222,on='Key',how="outer")
df1

Unnamed: 0,Key,Value1,Value2
0,A,1.0,
1,B,2.0,4.0
2,C,3.0,5.0
3,D,,6.0


In [112]:
df2=pd.merge(df111,df222,on='Key',how="right")
df2

Unnamed: 0,Key,Value1,Value2
0,B,2.0,4
1,C,3.0,5
2,D,,6


In [114]:
df3=pd.merge(df111,df222,on='Key',how="left")
df3

Unnamed: 0,Key,Value1,Value2
0,A,1,
1,B,2,4.0
2,C,3,5.0


### to save a dataset as csv file

In [115]:
df3.to_csv('key.csv')

In [116]:
pd.read_csv('key.csv')

Unnamed: 0.1,Unnamed: 0,Key,Value1,Value2
0,0,A,1,
1,1,B,2,4.0
2,2,C,3,5.0
