# **Введение**
В данном файле будет проведен разведочный анализ данных (EDA) для датасета, взятого с сайта **kaggle.com.**\

Исходные данные с kaggle.com: https://www.kaggle.com/datasets/adilshamim8/student-depression-dataset?resource=download \
Ссылка на датасет в Google Drive: https://drive.google.com/drive/folders/10_vT1Pt0w413sym9fv7I4KWLi_7YApd0?dmr=1&ec=wgc-drive-globalnav-goto \

Данный датасет представляет собой результаты опроса студентов по уровню депрессии, собранные в рамках исследования факторов психического здоровья студентов в виде таблицы по 18 признакам:

1. id - номер;
2. Gender - пол;
3. Age - возраст;
4. City - город;
5. Profession - профессия;
6. Academic Pressure - академическая нагрузка;
7. Work Pressure - рабочая нагрузка;
8. CGPA - средний балл;
9. Study Satisfaction - удовлетворение учебой;
10. Job Satisfaction - удовлетворение работой;
11. Sleep Duration - длительность сна;
12. Dietary Habits - пищевые привычки;
13. Degree - квалификация;
14. Have you ever had suicidal thoughts ? - возникали ли у вас когда-либо суицидальные мысли?;
15. Work/Study Hours - рабочие\учебные часы;
16. Financial Stress - финансовые проблемы;
17. Family History of Mental Illness - ментальные заболевания у родственников;
18. Depression - депрессия;

# **Цель анализа данных**
1. Определить зависимость наличия (или отсутствия) депрессии от таких признаков, как Academic Pressure, Work Pressure, CGPA, Sleep Duration, Work/Study Hours, в соответствии с нижеперечисленными требованиями:
- оценка структуры;
- оценка целостности и полноты;
- оценка выбросов и аномалий;
- метрики качества данных;
- выводы по разделам.


# 1. Загрузка датасета

Необходимо импортировать данные с Google Drive для дальнейшей работы с ними с помощью Pandas 

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

file_id = "1qkzC64D8GnLRwpQFl6vhD1C7tLgM6LM6"
file_url = f"https://drive.google.com/uc?id={file_id}"
df = pd.read_csv(file_url)

Выведем первые и последние 10 строк датасета

In [4]:
df.head(10)

Unnamed: 0,id,Gender,Age,City,Profession,Academic Pressure,Work Pressure,CGPA,Study Satisfaction,Job Satisfaction,Sleep Duration,Dietary Habits,Degree,Have you ever had suicidal thoughts ?,Work/Study Hours,Financial Stress,Family History of Mental Illness,Depression
0,2,Male,33.0,Visakhapatnam,Student,5.0,0.0,8.97,2.0,0.0,'5-6 hours',Healthy,B.Pharm,Yes,3.0,1.0,No,1
1,8,Female,24.0,Bangalore,Student,2.0,0.0,5.9,5.0,0.0,'5-6 hours',Moderate,BSc,No,3.0,2.0,Yes,0
2,26,Male,31.0,Srinagar,Student,3.0,0.0,7.03,5.0,0.0,'Less than 5 hours',Healthy,BA,No,9.0,1.0,Yes,0
3,30,Female,28.0,Varanasi,Student,3.0,0.0,5.59,2.0,0.0,'7-8 hours',Moderate,BCA,Yes,4.0,5.0,Yes,1
4,32,Female,25.0,Jaipur,Student,4.0,0.0,8.13,3.0,0.0,'5-6 hours',Moderate,M.Tech,Yes,1.0,1.0,No,0
5,33,Male,29.0,Pune,Student,2.0,0.0,5.7,3.0,0.0,'Less than 5 hours',Healthy,PhD,No,4.0,1.0,No,0
6,52,Male,30.0,Thane,Student,3.0,0.0,9.54,4.0,0.0,'7-8 hours',Healthy,BSc,No,1.0,2.0,No,0
7,56,Female,30.0,Chennai,Student,2.0,0.0,8.04,4.0,0.0,'Less than 5 hours',Unhealthy,'Class 12',No,0.0,1.0,Yes,0
8,59,Male,28.0,Nagpur,Student,3.0,0.0,9.79,1.0,0.0,'7-8 hours',Moderate,B.Ed,Yes,12.0,3.0,No,1
9,62,Male,31.0,Nashik,Student,2.0,0.0,8.38,3.0,0.0,'Less than 5 hours',Moderate,LLB,Yes,2.0,5.0,No,1


In [5]:
df.tail(10)

Unnamed: 0,id,Gender,Age,City,Profession,Academic Pressure,Work Pressure,CGPA,Study Satisfaction,Job Satisfaction,Sleep Duration,Dietary Habits,Degree,Have you ever had suicidal thoughts ?,Work/Study Hours,Financial Stress,Family History of Mental Illness,Depression
27891,140645,Female,28.0,Thane,Student,4.0,0.0,7.77,3.0,0.0,'Less than 5 hours',Unhealthy,MSc,No,2.0,5.0,No,1
27892,140669,Female,20.0,Indore,Student,3.0,0.0,7.72,5.0,0.0,'Less than 5 hours',Moderate,'Class 12',Yes,8.0,1.0,No,0
27893,140672,Female,24.0,Hyderabad,Student,3.0,0.0,6.02,2.0,0.0,'7-8 hours',Moderate,B.Arch,No,8.0,2.0,No,0
27894,140681,Male,23.0,Srinagar,Student,3.0,0.0,6.0,2.0,0.0,'More than 8 hours',Healthy,MBBS,Yes,12.0,4.0,No,0
27895,140684,Male,31.0,Lucknow,Student,2.0,0.0,7.27,5.0,0.0,'7-8 hours',Moderate,B.Com,Yes,6.0,1.0,Yes,0
27896,140685,Female,27.0,Surat,Student,5.0,0.0,5.75,5.0,0.0,'5-6 hours',Unhealthy,'Class 12',Yes,7.0,1.0,Yes,0
27897,140686,Male,27.0,Ludhiana,Student,2.0,0.0,9.4,3.0,0.0,'Less than 5 hours',Healthy,MSc,No,0.0,3.0,Yes,0
27898,140689,Male,31.0,Faridabad,Student,3.0,0.0,6.61,4.0,0.0,'5-6 hours',Unhealthy,MD,No,12.0,2.0,No,0
27899,140690,Female,18.0,Ludhiana,Student,5.0,0.0,6.88,2.0,0.0,'Less than 5 hours',Healthy,'Class 12',Yes,10.0,5.0,No,1
27900,140699,Male,27.0,Patna,Student,4.0,0.0,9.24,1.0,0.0,'Less than 5 hours',Healthy,BCA,Yes,2.0,3.0,Yes,1
