<a href="https://colab.research.google.com/github/fortune-uwha/eating-disorder-dataset/blob/main/Eating_disorder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction

There is a commonly held misconception that eating disorders are a lifestyle choice. Eating disorders are actually serious and often fatal illnesses that are associated with severe disturbances in people’s eating behaviors and related thoughts and emotions. Preoccupation with food, body weight, and shape may also signal an eating disorder. Common eating disorders include anorexia nervosa, bulimia nervosa, and binge-eating disorder-**National Institute of Mental Health(NIMH)**

**Data Source:** Data was obtained from Our World in Data and sourced from the Institute for Health Metrics and Evaluation, Global Burden of Disease(2019)

In [1]:
import numpy as np
import pandas as pd
# from jupyter_datatables import init_datatables_mode

In [2]:
#init_datatables_mode()

In [3]:
eating_disorder_df = pd.read_csv('/content/prevalence-of-eating-disorders-by-age.csv')
eating_disorder_df.head()

Unnamed: 0,Entity,Code,Year,Prevalence - Eating disorders - Sex: Both - Age: 20 to 24 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 10 to 14 (Percent),Prevalence - Eating disorders - Sex: Both - Age: All Ages (Percent),Prevalence - Eating disorders - Sex: Both - Age: 30 to 34 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 25 to 29 (Percent),Prevalence - Eating disorders - Sex: Both - Age: 5-14 years (Percent),Prevalence - Eating disorders - Sex: Both - Age: 50-69 years (Percent),Prevalence - Eating disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Eating disorders - Sex: Both - Age: 70+ years (Percent),Prevalence - Eating disorders - Sex: Both - Age: 15 to 19 (Percent)
0,Afghanistan,AFG,1990,0.31,0.1,0.12,0.3,0.31,0.05,0,0.13,0,0.28
1,Afghanistan,AFG,1991,0.3,0.1,0.12,0.29,0.29,0.04,0,0.13,0,0.27
2,Afghanistan,AFG,1992,0.29,0.09,0.12,0.28,0.28,0.04,0,0.12,0,0.26
3,Afghanistan,AFG,1993,0.28,0.09,0.12,0.27,0.27,0.04,0,0.12,0,0.25
4,Afghanistan,AFG,1994,0.27,0.09,0.12,0.26,0.26,0.04,0,0.11,0,0.25


In [4]:
def clean_column_headers(df) -> pd.DataFrame:
    """
    Cleans column headers by removing the string "Prevalence - Eating disorders - Sex: Both - Age: "
    from the beginning of each header and removing the string "(Percent)" from the end of each header.

    Args:
        df (pandas.DataFrame): The DataFrame to clean.

    Returns:
        pandas.DataFrame: The cleaned DataFrame.
    """
    df.columns = df.columns.str.replace("Prevalence - Eating disorders - Sex: Both - Age: ", "")
    df.columns = df.columns.str.replace("\(Percent\)", "", regex=True)
    
    # Strip whitespace from column names
    df.columns = df.columns.str.strip()
    return df

In [5]:
cleaned_eating_disorder_df = clean_column_headers(eating_disorder_df)
cleaned_eating_disorder_df

Unnamed: 0,Entity,Code,Year,20 to 24,10 to 14,All Ages,30 to 34,25 to 29,5-14 years,50-69 years,Age-standardized,70+ years,15 to 19
0,Afghanistan,AFG,1990,0.31,0.10,0.12,0.30,0.31,0.05,0,0.13,0,0.28
1,Afghanistan,AFG,1991,0.30,0.10,0.12,0.29,0.29,0.04,0,0.13,0,0.27
2,Afghanistan,AFG,1992,0.29,0.09,0.12,0.28,0.28,0.04,0,0.12,0,0.26
3,Afghanistan,AFG,1993,0.28,0.09,0.12,0.27,0.27,0.04,0,0.12,0,0.25
4,Afghanistan,AFG,1994,0.27,0.09,0.12,0.26,0.26,0.04,0,0.11,0,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6835,Zimbabwe,ZWE,2015,0.25,0.07,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6836,Zimbabwe,ZWE,2016,0.25,0.07,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6837,Zimbabwe,ZWE,2017,0.25,0.08,0.11,0.23,0.24,0.04,0,0.10,0,0.20
6838,Zimbabwe,ZWE,2018,0.25,0.08,0.11,0.23,0.24,0.04,0,0.10,0,0.20


In [6]:
# Transform the data
transformed_eating_disorder_df = pd.melt(cleaned_eating_disorder_df, id_vars=["Entity", "Code","Year"], value_vars=["20 to 24", "10 to 14", "All Ages", "30 to 34", "25 to 29", 
                                                                                 "5-14 years", "50-69 years", "Age-standardized", "70+ years", "15 to 19"],var_name="Age Range",value_name="Prevalence Range")

# Display the transformed DataFrame
transformed_eating_disorder_df

Unnamed: 0,Entity,Code,Year,Age Range,Prevalence Range
0,Afghanistan,AFG,1990,20 to 24,0.31
1,Afghanistan,AFG,1991,20 to 24,0.30
2,Afghanistan,AFG,1992,20 to 24,0.29
3,Afghanistan,AFG,1993,20 to 24,0.28
4,Afghanistan,AFG,1994,20 to 24,0.27
...,...,...,...,...,...
68395,Zimbabwe,ZWE,2015,15 to 19,0.20
68396,Zimbabwe,ZWE,2016,15 to 19,0.20
68397,Zimbabwe,ZWE,2017,15 to 19,0.20
68398,Zimbabwe,ZWE,2018,15 to 19,0.20


In [None]:
transformed_eating_disorder_df= transformed_eating_disorder_df.pivot(index=["Entity","Code","Age Range"],columns="Year", values="Prevalence Range")
#Reset index
transformed_eating_disorder_df.reset_index(inplace=True)

In [10]:
transformed_eating_disorder_df.head()

Year,Entity,Code,Age Range,1990,1991,1992,1993,1994,1995,1996,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Afghanistan,AFG,10 to 14,0.1,0.1,0.09,0.09,0.09,0.09,0.09,...,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.1,0.1,0.1
1,Afghanistan,AFG,15 to 19,0.28,0.27,0.26,0.25,0.25,0.24,0.24,...,0.24,0.24,0.25,0.25,0.26,0.26,0.27,0.26,0.27,0.27
2,Afghanistan,AFG,20 to 24,0.31,0.3,0.29,0.28,0.27,0.26,0.26,...,0.27,0.27,0.28,0.28,0.29,0.29,0.29,0.29,0.3,0.3
3,Afghanistan,AFG,25 to 29,0.31,0.29,0.28,0.27,0.26,0.25,0.25,...,0.25,0.26,0.26,0.27,0.27,0.28,0.28,0.28,0.28,0.28
4,Afghanistan,AFG,30 to 34,0.3,0.29,0.28,0.27,0.26,0.25,0.24,...,0.24,0.25,0.25,0.26,0.26,0.27,0.27,0.27,0.28,0.27


### Adding population data 

This is sourced from **the world Bank**(databank.worldbank.org). This would be used to estimate the population of people who are clinically diagnosed.



In [None]:
population_df = pd.read_excel(r'/content/population_data.xlsx')
population_df.tail(20)

Unnamed: 0,Unnamed: 1,1990,1999,2000,2012,Unnamed: 5,2013,2014,2015,2016,2017,2018,2019
247,Middle East & North Africa,256203998.0,314472124.0,321037453.0,414117603.0,,422790409.0,431664579.0,440506473.0,448917409.0,456885486.0,465073490.0,473201775.0
248,Middle East & North Africa (excluding high inc...,228846359.0,278356861.0,283899110.0,356239722.0,,363310225.0,370756356.0,378137339.0,385054538.0,391607422.0,398375344.0,405259403.0
249,Middle East & North Africa (IDA & IBRD countries),226868111.0,275508430.0,280976957.0,352259724.0,,359233517.0,366582958.0,373867247.0,380687450.0,387152617.0,393806257.0,400574097.0
250,Middle income,3944260596.0,4553317122.0,4617751027.0,5354644874.0,,5421202357.0,5487659096.0,5552696141.0,5616265003.0,5679375318.0,5740015870.0,5797827540.0
251,North America,277373464.0,309502571.0,312909974.0,348656682.0,,351207902.0,353888902.0,356507139.0,359245796.0,361731237.0,363967201.0,365995094.0
252,Not classified,..,..,..,..,,..,..,..,..,..,..,..
253,OECD members,1104791543.0,1191524686.0,1200179492.0,1305615104.0,,1314072465.0,1323005691.0,1331743060.0,1340532342.0,1348646085.0,1356241793.0,1362922537.0
254,Other small states,17806417.0,21041578.0,21437978.0,27617274.0,,28225516.0,28907278.0,29620768.0,30309723.0,30940039.0,31494217.0,32005203.0
255,Pacific island small states,1776243.0,2006341.0,2035672.0,2353058.0,,2379069.0,2405308.0,2431426.0,2457814.0,2484263.0,2510226.0,2536070.0
256,Post-demographic dividend,964048985.0,1015710581.0,1020793420.0,1082488958.0,,1087230525.0,1092179620.0,1097061415.0,1102020063.0,1106214534.0,1110126741.0,1113310738.0


#### Dropping the last row and the Nan column below




In [None]:
population_df =population_df.drop(population_df.index[-1])
population_df=population_df.drop("Unnamed: 5", axis=1)

In [None]:
population_df

Unnamed: 0,Unnamed: 1,1990,1999,2000,2012,2013,2014,2015,2016,2017,2018,2019
0,Afghanistan,10694796.0,19262847.0,19542982.0,30466479.0,31541209.0,32716210.0,33753499.0,34636207.0,35643418.0,36686784.0,37769499.0
1,Albania,3286542.0,3108778.0,3089027.0,2900401.0,2895092.0,2889104.0,2880703.0,2876101.0,2873457.0,2866376.0,2854191.0
2,Algeria,25518074.0,30346083.0,30774621.0,37260563.0,38000626.0,38760168.0,39543154.0,40339329.0,41136546.0,41927007.0,42705368.0
3,American Samoa,47818.0,57594.0,58230.0,53691.0,52995.0,52217.0,51368.0,50448.0,49463.0,48424.0,47321.0
4,Andorra,53569.0,65655.0,66097.0,71013.0,71367.0,71621.0,71746.0,72540.0,73837.0,75013.0,76343.0
...,...,...,...,...,...,...,...,...,...,...,...,...
257,Pre-demographic dividend,417144782.0,537918075.0,553089230.0,784822780.0,808542070.0,832469778.0,856420253.0,880891473.0,905987912.0,931467165.0,957503246.0
258,Small states,25640496.0,29583318.0,30055791.0,37059328.0,37740469.0,38493630.0,39276796.0,40032809.0,40727936.0,41379093.0,41965375.0
259,South Asia,1141434379.0,1379867738.0,1406946728.0,1708114582.0,1731137145.0,1753568847.0,1775178483.0,1796850154.0,1818868706.0,1840534093.0,1861598514.0
260,South Asia (IDA & IBRD),1141434379.0,1379867738.0,1406946728.0,1708114582.0,1731137145.0,1753568847.0,1775178483.0,1796850154.0,1818868706.0,1840534093.0,1861598514.0
