# **Data Preprocessing**

In [None]:
import pandas as pd
import numpy as np

## Mental Health Practitioners Data

The dataset that we are using is a crowdsourced list of non-judgemental mental health professionals in India (collected by iCALL). It’s broken up state and city-wise and so far covers Delhi NCR, Mumbai, Chennai, Kolkata, Bangalore, Assam, Hyderabad, Punjab, Madhya Pradesh, Maharashtra, Meghalaya, Rajasthan and Tamil Nadu.


---



The sheets in the excel database have been named according to the state/city of the place the details belongs to.

The list below consists of the names of all the sheets in the file.

In [None]:
cities = ['Mumbai','Chhatisgarh','Delhi NCR','Goa','Chennai & T.N.','Kolkata & W.B.','Bangalore','Assam','Hyderabad','Kerala','Punjab','Madhya Pradesh','Maharashtra','Rajasthan','Gujarat','Uttarakhand','Andhra Pradesh','Jharkhand','Bihar']

There are various questions that were asked to the person giving the details of the professional, but we will be only keeping the ones important to us like the age group, gender identity, etc.

The dictionary below will help us to decide which categories to keep as well as rename them.

In [None]:
columns_to_keep = {'What is their age range?' : 'age',
                   'Which gender do they identify with?' : 'gender',
       'What is the title of the mental health professional that you are recommending? ' : 'title',
       'Which language(s) are they conversant in? ' : 'languages',
       'What are their professional qualifications?' : 'qualifications',
       'What is their contact number?' : 'contact',
       'What is the address of the clinic/ hospital/ organization where they practice?' : 'address'}

The function below does the following things :
- Create a dataframe for each sheet in the excel file.
- Apply row column transformation so that the questions/categories become columns and professionals' names become rows.
- Remove all columns except the ones we intend to use.
- Rename those column titles from questions to more usable keywords.
- Delete some of the rows which originally consisted of merged data where 2 or more people had suggested the same professional. In this way, we avoid redundancy of the professional's details.
- At this stage, the name of the professional (originally column name) has been used as the index in the dataframe. We create a new column to store the names and reset the index back to usual integer indices.
- We also create a new column for storing the location (city/state) which will help us in accessing city-wise data after combining all the sheets.
- Lastly, we concatenate all the city-wise dataframes to create one complete dataframe that consists of the details of multiple mental health professionals from all over the country.

In [None]:
def create_prof_data(cities,columns):

  file_path = "/content/drive/MyDrive/Sem-3/Data Preprocessing/project/iCALL's crowdsourced list of Mental Health Professionals We Can Trust (23rd April 2021).xlsx"
  city_details = []

  for city in cities:

    # create dataframe of each city
    df_city = pd.read_excel(file_path, sheet_name = city, index_col = 0)

    # row column transpose
    df_city_T = df_city.T

    # delete all columns except required ones
    for col in df_city_T.columns:
      if col not in columns:
        del df_city_T[col]

    # rename kept columns
    df_city_T.rename(columns = columns, inplace = True)

    # remove rows which had originally been merged columns and hence renamed as 'Unnamed: 0/1/2...'
    for row in df_city_T.index:
      if row.startswith("Unnamed"):
        df_city_T.drop(row, inplace = True)

    # take the professional's name from index label and make it a new column 'name'
    df_city_T.reset_index(names = 'name', inplace = True)

    # add location as a column
    df_city_T['location'] = city

    city_details.append(df_city_T)

  df = pd.concat(city_details)
  return df

In [None]:
mental_health_prof_data = create_prof_data(cities, columns_to_keep)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"English, Marathi",MBBS MD,9987545314,,Mumbai
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"English, Hindi, Gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"English, Hindi and Marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"English, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"English, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai
...,...,...,...,...,...,...,...,...,...
8,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"English, Hindi, Gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat
0,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"English, Hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand
0,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"English, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh
0,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand


Currently the index in the combined dataframe is same as what the index of the values in each city-wise dataframe was. So we will be resetting the index to make it accurate for accessing in future.

In [None]:
mental_health_prof_data.reset_index(drop = True, inplace = True)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"English, Marathi",MBBS MD,9987545314,,Mumbai
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"English, Hindi, Gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"English, Hindi and Marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"English, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"English, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai
...,...,...,...,...,...,...,...,...,...
407,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"English, Hindi, Gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat
408,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"English, Hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand
409,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"English, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh
410,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand


In [None]:
mental_health_prof_data.isna().sum()

name               0
age                2
gender             9
title             20
languages         20
qualifications    56
contact           14
address            9
location           0
dtype: int64

As we can see below, some of the professionals' age or gender have been left unanswered. Instead of dropping rows with null values in either of these columns, we will be using / suggesting them to users who do not have a preference in these categories.

In [None]:
mental_health_prof_data[mental_health_prof_data.age.isna() | mental_health_prof_data.gender.isna()]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
25,Samriti Makkar Midha,30-39,,Counsellor/Psychotherapist,"English, Hindi","M.Sc. Psychology (Clinical), B.A. Psychology (...",96198 97851,Lower Parel,Mumbai
29,Veena Fernandis,40-49,,Counsellor/Psychotherapist,Hindi english,,981-971-4640,"Near matralaya, south bombay",Mumbai
39,Tejaswi Shetty,20-29,,Counsellor/Psychotherapist,"English, Hindi",M.A in Psychology; International Diploma in Me...,9819610714,10 am to 8:30 pm,Mumbai
13,Manju,40-49,,Counsellor/Psychotherapist,English,,91 80 25298686,"Parivarthan, Indiranagar",Bangalore
25,Manju.1,40-49,,Counsellor/Psychotherapist,English,,80 25298686,"Parivarthan, Indiranagar",Bangalore
32,Mahima Mallya,,Female,Counsellor/Psychotherapist,"English, Hindi",,9742643792,"Tattva Counselling, #606, 80 Feet Rd, Koramang...",Bangalore
10,Varsha Vemula,,,Counsellor/Psychotherapist,,,9490708947,Pause for Perspective,Hyderabad
12,Dr. Mahima Sukhwal,30-39,,Clinical Psychologist,"Hindi, English","MPhil, PhD in Clinical Psychology",9490708947,"Pause for Perspective, Kundan Bagh, Methodist ...",Hyderabad
1,Seema Gaikwad,30-39,,Counsellor/Psychotherapist,"English, Hindi and Marathi",,9850401528,"Near Kamla Nehru Park, Bhandarkar Rd.",Maharashtra
2,Dr. Anamika Papriwal,Don't know,,Counsellor/Psychotherapist,English & Hindi,M.A. PHD. in Psychology,9783944484,"5/261 Malviya Nagar, Jaipur, Rajasthan",Rajasthan


As we can see here, some of the people did not know or have the contact details of the professionals they were suggesting. We will be removing the values such as "dont know" or "dont have" as well missing values.

We can access these values be checking where the answers start with "dont" or is NaN.

In [None]:
mental_health_prof_data.contact.unique()

array([9987545314, 9821074495, 9987142188, 919869332886, 9820330802,
       9819733002, '98200 32178', 8082028808, 9833378163, 9860512131,
       '98213 68543', 9833533074, 9769267499, '25360935, 25450970',
       '98 67 399559', 9820323769, 8879746976, 2223755866, 912239539149,
       9930332514, 9833149322, 9833890973, 9833943934, 9920731799,
       919920049000, '96198 97851', '99 87 705530', 7303135717,
       9029011626, '981-971-4640', 9769239207, 9819154153, 919029077645,
       9930360163, 9930160068, 9869691723, 7666114082, 9987042534,
       9167972910, 9819610714, 7208109965, 9821425051, 9820969329,
       9821004720, 7506498345, 9819768243, 7303858435, nan, '7738166706',
       '99 20 370462', 9967345522, 8291709317, 7878388880, 9029069694,
       9820033095, 9029537954, 9833730555, 9920023841, 9137340466,
       '99207 95419', '022266009847 / 02226607217', 7738457481,
       7045664615, 8452960463, 9619046437, 8879565156, 9930746648,
       9920699868, '22 2867 9396', '022

In [None]:
mental_health_prof_data = mental_health_prof_data[~(mental_health_prof_data['contact'].str.lower().str.startswith("don't") | mental_health_prof_data['contact'].isna())]

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"English, Marathi",MBBS MD,9987545314,,Mumbai
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"English, Hindi, Gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"English, Hindi and Marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"English, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"English, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai
...,...,...,...,...,...,...,...,...,...
407,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"English, Hindi, Gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat
408,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"English, Hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand
409,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"English, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh
410,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand


In [None]:
mental_health_prof_data.address.unique()

array([nan, 'Shushrut and Lilavati Hospitals',
       'INHS Asvini hospital. Colaba', 'Private clinic, mulund',
       'Indlas Child Guidance Clinic',
       'Nityanand Nursing Home & Hiranandani Hospital',
       'Nath hospital, Thane',
       'KLS, Vile Parle (W) & Asha Parekh, Santacruz (W)',
       'Chicken ghar kalyan west',
       'Brahman Seva Sangh, Borivali (E) & Harmony Training and Counseling Centre, Nerul',
       "Indla's Child Guidance Clinic (ICGC), Saki Naka, Andheri (East)",
       'Pali Naka Bandra (W) Mumbai-50',
       'National hospital. 3 petrol pump. Naupada. Thane',
       'Krizalyz : The Journey\n Within. Counselling and Mental Health Services -  301, Shiv Om CHS, Panch Marg, off Versova- Yari Road,  Andheri West, Mumbai--61.',
       'Institute for Exceptional Children, 2A Chandra Niwas, Church Road, Andheri East, Mumbai',
       'Bandra and ghatkopar',
       'Address:\n  Heart To Heart Counselling Centre\n  10 Jerbai Baug, Ambedkar Road,\n  Near Gloria Churc

Now, there might be some places where neither the contact, nor the address has been given. These rows are useless as we cannot suggest them to a user, hence we will remove them.

In [None]:
mental_health_prof_data[mental_health_prof_data.contact.isna() & mental_health_prof_data.address.isna()]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
278,Pallavi Tomar,30-39,Female,Clinical Psychologist,English and Hindi,MPhil Clinical Psychology,,,Bangalore


In [None]:
mental_health_prof_data.dropna(subset=['contact', 'address'], how='all', inplace = True)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"English, Marathi",MBBS MD,9987545314,,Mumbai
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"English, Hindi, Gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"English, Hindi and Marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"English, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"English, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai
...,...,...,...,...,...,...,...,...,...
407,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"English, Hindi, Gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat
408,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"English, Hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand
409,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"English, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh
410,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand


Again, there are people who dont know or are unsure of the professional's qualifications or titles. We will be replacing answers like "i dont know", "not sure", "nil" with missing value.

In [None]:
mental_health_prof_data.qualifications.unique()

array(['MBBS MD', 'M.D.',
       'B.A. Psychology, M.A. Psychology and M.Phil. Clinical Psychology (RCI registered)',
       'Masters in psychology, EMDR trained therapist', 'MBBS, DNB, PDF',
       nan, 'M.A. counseling psychology', 'Pg psychology',
       'MA - Clinical Psychology', 'M.A. Clinical Psychology',
       'Masters in Medical and Psychiatric Social Work', 'Mbbs. Dpm',
       'MA in Medical and Psychiatric Social Work, TISS, PhD in Clinical Social Work (TISS)',
       'Masters in Psychology', 'MD diploma and UK qualifications', 'PhD',
       'Neuropsychiatrist',
       'Bachelors in Psychology, Masters in Counselling, Matrix Reimprinting Practitioner, Clinical Hypnotherapy, EFT Practitioner',
       'MBBS, MD Psychiatry', 'MA Counselling',
       'Both are Post Graduates in Psychology and Manali applied for PhD too.',
       'Ma clinical psychology',
       'Masters in Clinical Psychology + Advanced degree in REBT from Albert Ellis, NY',
       'M.Sc. Psychology (Clinical),

In [None]:
mental_health_prof_data[mental_health_prof_data.qualifications.isin(['not sure', "I don't know", "I don't know "])]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
77,Aleka Kumar,40-49,Female,Counsellor/Psychotherapist,english,not sure,,mostly sees clients at her home office. she wi...,Mumbai
79,Tanvi Thakkar,20-29,Female,Counsellor/Psychotherapist,"English, Hindi",I don't know,98195 49088,Brainwaves Assessment & Therapy,Mumbai
80,Kalyani Sohoni,20-29,Female,Clinical Psychologist,Marathi. Hindi. English,I don't know,9029030054,Nirmal polyclinic. Bandra east.,Mumbai
360,Suniti Barbara,20-29,Female,Counsellor/Psychotherapist,English and Hindi,I don't know,8390025415,Iscah wellness,Maharashtra


In [None]:
mental_health_prof_data['qualifications'] = mental_health_prof_data['qualifications'].replace(['not sure', "I don't know", "I don't know "], np.nan)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"English, Marathi",MBBS MD,9987545314,,Mumbai
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"English, Hindi, Gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"English, Hindi and Marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"English, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"English, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai
...,...,...,...,...,...,...,...,...,...
407,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"English, Hindi, Gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat
408,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"English, Hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand
409,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"English, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh
410,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand


In [None]:
mental_health_prof_data.isna().sum()

name               0
age                2
gender             9
title             20
languages         20
qualifications    60
contact           13
address            8
location           0
dtype: int64

In [None]:
mental_health_prof_data[mental_health_prof_data.qualifications.isin(['not sure', "I don't know", "I don't know "])]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location


In [None]:
mental_health_prof_data.title.unique()

array(['Psychiatrist', 'Clinical Psychologist',
       'Counsellor/Psychotherapist',
       'Counseling Psychologist/Psychotherapist',
       'Creative art therapist', 'BHMS, CCAH', 'Counselling Psychologist',
       'Experiential Counsellor', 'Neuropsychologist',
       'Relationship, communication and empowerment Coach',
       'REBT, CBT, Play Therapist and Family Therapist', "I don't know",
       nan, 'psychiatrist', 'Art therapist',
       'Psychiatrist and Psychotherapist', 'Psychologist',
       'Counselling Psychologist/Therapist',
       'Arts-Based Therapy Practitioner',
       'Counseling Psychologist and Career coach',
       'Psychoanalytic Psychotherapist',
       'Clinical Psychologist and Psychotherapist',
       'Expressive and Creative Arts therapist', 'Peer supporter',
       'Alternative therapy', 'Rehabilitation Psychologist',
       'Cognitive Behavioural Therapy',
       'Secretary of Schizophrenia Awareness Association (SAA) working towards raising awareness fo

In [None]:
mental_health_prof_data[mental_health_prof_data.title == "I don't know"]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location
131,Anjana Chabbria,30-39,Female,I don't know,English and Hindi,,8375837520,"Juhu, mumbai",Mumbai


In [None]:
mental_health_prof_data['title'] = mental_health_prof_data['title'].replace("I don't know", np.nan)

In [None]:
mental_health_prof_data.isna().sum()

name               0
age                2
gender             9
title             21
languages         20
qualifications    60
contact           13
address            8
location           0
dtype: int64

In [None]:
mental_health_prof_data[mental_health_prof_data.title == "I don't know"]

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location


In [None]:
mental_health_prof_data['age'] = mental_health_prof_data['age'].replace("Don't know", np.nan)

Extracting and creating new columns for each language spoken by the professionals

In [None]:
mental_health_prof_data.languages = mental_health_prof_data.languages.str.lower()

In [None]:
mental_health_prof_data.languages.unique()

array(['english, marathi', 'english, hindi, gujarati',
       'english, hindi and marathi', 'english, hindi',
       'english, marathi, hindi', 'english, hindi, marathi',
       'english marathi hindi', 'english, hindi, marathi, bengali',
       'marathi, hindi. english', 'english, hindi, punjabi, marathi',
       'hindi english marathi gujarati',
       'hindi, english, marathi, maybe punjabi', 'hindi, english',
       'english, hindi, bengali', 'hindi, english and marathi',
       'english, hindi, malayalam',
       'english, marathi, hindi (all fluently)',
       'english, hindi, gujrati, marathi, french', 'english and hindi',
       'english,hindi,marathi', 'hindi marathi english', 'hindi english',
       'english, hindi, marathi, gujarati', 'english hindi',
       'english hindi marathi gujarathi',
       'english, hindi, little in marathi and gujarathi',
       'english/hindi/gujarathi', 'english , hindi , marathi',
       'english, maybe hindi/marathi', 'english, hindi, gujrati'

In [None]:
languages = ['english','hindi','punjabi','telegu','tamil','malayalam','kannada','gujarati','sindhi','bengali','marathi','marwari','rajasthani','urdu','konkani','assamese','oriya','spanish','french']

In [None]:
mental_health_prof_data.reset_index(drop = True, inplace = True)

In [None]:
for language in languages:
  mental_health_prof_data[language] = mental_health_prof_data['languages'].apply(lambda x: 1 if isinstance(x, str) and language in x else 0)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location,english,...,bengali,marathi,marwari,rajasthani,urdu,konkani,assamese,oriya,spanish,french
0,Gaurav Kulkarni,30-39,Male,Psychiatrist,"english, marathi",MBBS MD,9987545314,,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
1,Dr Bharat Shah,40-49,Male,Psychiatrist,"english, hindi, gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"english, hindi and marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
3,Pushpa Venkatraman,40-49,Female,Counsellor/Psychotherapist,"english, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
4,Dr Manjiri Deshpande Shenoy,30-39,Female,Psychiatrist,"english, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
404,Rajul Jagdish.1,20-29,Female,Counsellor/Psychotherapist,"english, hindi, gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat,1,...,0,0,0,0,0,0,0,0,0,0
405,Dr. Akhil Chopra,30-39,Male,Consultant Psychiatrist,"english, hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand,1,...,0,0,0,0,0,0,0,0,0,0
406,Dr. Y.V. Ramana,60-69,Male,Psychiatrist,"english, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh,1,...,0,0,0,0,0,0,0,0,0,0
407,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand,0,...,1,0,0,0,0,0,0,0,0,0


As we can notice, some people have spelled the languages differently. In order to avoid losing out on important information, we will map these differently spelled languages to the ones that we have already created columns for.

In [None]:
for i in mental_health_prof_data.index:
  languages_value = mental_health_prof_data.loc[i, 'languages']
  if pd.notna(languages_value) and isinstance(languages_value, str):
    if 'eng' in languages_value:
      mental_health_prof_data.loc[i, 'english'] = 1
    if 'gujrati' in languages_value or 'gujarathi' in languages_value:
      mental_health_prof_data.loc[i, 'gujarati'] = 1
    if 'telgu' in languages_value or 'telugu' in languages_value:
      mental_health_prof_data.loc[i, 'telegu'] = 1
    if 'malyalam' in languages_value:
      mental_health_prof_data.loc[i, 'malayalam'] = 1
    if 'marati' in languages_value:
      mental_health_prof_data.loc[i, 'marathi'] = 1

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location,english,...,bengali,marathi,marwari,rajasthani,urdu,konkani,assamese,oriya,spanish,french
0,Gaurav Kulkarni,30-39,male,Psychiatrist,"english, marathi",MBBS MD,9987545314,,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
1,Dr Bharat Shah,40-49,male,Psychiatrist,"english, hindi, gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
2,Manik Bhadkamkar,20-29,female,Clinical Psychologist,"english, hindi and marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
3,Pushpa Venkatraman,40-49,female,Counsellor/Psychotherapist,"english, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
4,Dr Manjiri Deshpande Shenoy,30-39,female,Psychiatrist,"english, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
404,Rajul Jagdish.1,20-29,female,Counsellor/Psychotherapist,"english, hindi, gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat,1,...,0,0,0,0,0,0,0,0,0,0
405,Dr. Akhil Chopra,30-39,male,Consultant Psychiatrist,"english, hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand,1,...,0,0,0,0,0,0,0,0,0,0
406,Dr. Y.V. Ramana,60-69,male,Psychiatrist,"english, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh,1,...,0,0,0,0,0,0,0,0,0,0
407,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand,1,...,1,0,0,0,0,0,0,0,0,0


Lastly, we will be transforming the values in 'gender' column to 'male', 'female' or 'other' and also convert it lowercase to have ease of accessibility.

In [None]:
mental_health_prof_data.gender = mental_health_prof_data.gender.str.lower()

In [None]:
mental_health_prof_data.gender.unique()

array(['male', 'female', nan, 'both', 'cis-female', 'she/her/female ',
       'woman',
       'sunil is male. but caters equally well to patients from all 3 genders.',
       'she/her', 'i have not asked them this question.', 'man',
       'male & female', 'she/ her', 'female ', 'female.',
       'female assigned at birth, genderfluid', 'gender fluid',
       'female, demi/bisexual', 'both male and female', 'all'],
      dtype=object)

In [None]:
for i in mental_health_prof_data.index:
  gender = mental_health_prof_data.loc[i,'gender']
  if pd.notna(gender):
    if '&' in gender or 'and' in gender or 'both' in gender:
      mental_health_prof_data.loc[i,'gender'] = 'other'
    elif 'male' in gender or 'man' in gender:
      mental_health_prof_data.loc[i,'gender'] = 'male'
    elif 'female' in gender or 'she' in gender or 'woman' in gender:
      mental_health_prof_data.loc[i,'gender'] = 'female'
    elif gender == 'i have not asked them this question.':
      mental_health_prof_data.loc[i,'gender'] = np.nan
    else:
      mental_health_prof_data.loc[i,'gender'] = 'other'

In [None]:
mental_health_prof_data.gender.unique()

array(['male', nan, 'other', 'female'], dtype=object)

In [None]:
mental_health_prof_data

Unnamed: 0,name,age,gender,title,languages,qualifications,contact,address,location,english,...,bengali,marathi,marwari,rajasthani,urdu,konkani,assamese,oriya,spanish,french
0,Gaurav Kulkarni,30-39,male,Psychiatrist,"english, marathi",MBBS MD,9987545314,,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
1,Dr Bharat Shah,40-49,male,Psychiatrist,"english, hindi, gujarati",M.D.,9821074495,Shushrut and Lilavati Hospitals,Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
2,Manik Bhadkamkar,20-29,male,Clinical Psychologist,"english, hindi and marathi","B.A. Psychology, M.A. Psychology and M.Phil. C...",9987142188,INHS Asvini hospital. Colaba,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
3,Pushpa Venkatraman,40-49,male,Counsellor/Psychotherapist,"english, hindi","Masters in psychology, EMDR trained therapist",919869332886,"Private clinic, mulund",Mumbai,1,...,0,0,0,0,0,0,0,0,0,0
4,Dr Manjiri Deshpande Shenoy,30-39,male,Psychiatrist,"english, marathi, hindi","MBBS, DNB, PDF",9820330802,Indlas Child Guidance Clinic,Mumbai,1,...,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
404,Rajul Jagdish.1,20-29,male,Counsellor/Psychotherapist,"english, hindi, gujarati",M.Sc. in Psychological Counseling,8431455791,"Room, The Mindcare Space : 401, Shikhavali Apa...",Gujarat,1,...,0,0,0,0,0,0,0,0,0,0
405,Dr. Akhil Chopra,30-39,male,Consultant Psychiatrist,"english, hindi","MBBS, MD Psychiatry",7060967850,Mind Heal - A Neuro-Psychiatry & De-addiction ...,Uttarakhand,1,...,0,0,0,0,0,0,0,0,0,0
406,Dr. Y.V. Ramana,60-69,male,Psychiatrist,"english, telugu",,0863 225 6353,"19, door no# 12 opp bajaj service center, 89, ...",Andhra Pradesh,1,...,0,0,0,0,0,0,0,0,0,0
407,Dr Dipak Giri,40-49,male,Psychiatrist,"hindi, eng, bengali",Md Psychiatry,93342 38982,"Sadar Hospital, Jamshedpur",Jharkhand,1,...,1,0,0,0,0,0,0,0,0,0


We can now save this preprocessed dataset as a csv file.

In [None]:
mental_health_prof_data.to_csv('preprocessed data of mental health practitioners.csv')

## Google Form Data

This dataset was created by us by circulating a google form to know about people's music listening activities and self reported mental health problems.



---



In [None]:
df = pd.read_excel('/content/drive/MyDrive/Sem-3/Data Preprocessing/project/Survey on Effects of Music on Mental Health (Responses).xlsx')

In [None]:
df

Unnamed: 0,Timestamp,What is your age?,What is your primary music streaming platform?,How many hours a day (approximately) do you spend listening to music?,Do you listen to music while working?,What is your favorite genre of music?,Do you frequently explore different genres of music?,What other genres do you listen to? \n(You can select multiple genres. Kindly avoid selecting the one you answered as your favorite),Which other languages do you listen to songs in? \n(Kindly skip the question if you don't listen to songs in languages other than Hindi or English),"On a scale from 0 to 10, how would you rate the severity of your anxiety?","On a scale from 0 to 10, how would you rate the severity of your depression?","On a scale from 0 to 10, how would you rate the severity of your insomnia?","On a scale from 0 to 10, how would you rate the severity of your OCD (Obsessive-Compulsive Disorder) symptoms?",How does music affect your mental health?,"We appreciate your participation in this survey. Your responses are valuable for our research. Rest assured, we have not collected any personally identifiable information, such as your name or email address. Your privacy is important to us, and your data will be used solely for research purposes."
0,2023-09-02 20:21:53.727,18,Spotify,3.0,Yes,Pop,Sometimes,"Pop, Bollywood, Hip-Hop/Rap, R&B (Rhythm and B...",Punjabi,2,0,1,3.0,Improves,I understand.
1,2023-09-02 20:22:14.115,19,Apple Music,3.0,Yes,Hip-Hop/Rap,Sometimes,"Pop, Bollywood, Electronic/Dance",Punjabi,0,0,0,0.0,Improves,I understand.
2,2023-09-02 20:23:07.010,18,Spotify,1.0,No,Pop,Sometimes,"Bollywood, Classical",,4,1,1,6.0,No effect,I understand.
3,2023-09-02 20:23:17.846,19,Spotify,4.0,No,R&B (Rhythm and Blues),Yes,"Pop, Bollywood, Electronic/Dance, Classical","Punjabi, Korean",7,5,6,3.0,Improves,I understand.
4,2023-09-02 20:23:22.521,18,Apple Music,11.0,Yes,Bollywood,Yes,"Pop, Hip-Hop/Rap, Rock, Folk","Punjabi, Spanish, Korean",7,1,5,8.0,Improves,I understand.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
500,2023-09-08 15:52:56.863,52,Spotify,5.0,Yes,Bollywood,Yes,Bollywood,Spanish,5,5,5,5.0,Improves,I understand.
501,2023-09-08 23:50:16.708,21,Spotify,1.5,Yes,Electronic/Dance,Yes,"Pop, Hip-Hop/Rap, Electronic/Dance",,2,0,0,3.0,Improves,I understand.
502,2023-09-09 12:41:42.699,48,YouTube,0.5,No,Bollywood,No,"Classical, Devotional, Folk, Instrumental","Punjabi, Tamil",5,0,0,0.0,Improves,I understand.
503,2023-09-09 19:57:28.071,49,YouTube,0.5,No,Bollywood,Yes,"Hip-Hop/Rap, Classical, Devotional",,1,0,0,0.0,Improves,I understand.


Firstly, we will be renaming the columns to make them more accessible

In [None]:
df.rename(columns = {'What is your age?':'age',
       'What is your primary music streaming platform?':'primary_streaming_platform',
       'How many hours a day (approximately) do you spend listening to music?':'hours_spent',
       'Do you listen to music while working?':'music_while_working',
       'What is your favorite genre of music?':'favorite_genre',
       'Do you frequently explore different genres of music?':'explore_genres',
       "What other genres do you listen to? \n(You can select multiple genres. Kindly avoid selecting the one you answered as your favorite)":'other_genres',
       "Which other languages do you listen to songs in? \n(Kindly skip the question if you don't listen to songs in languages other than Hindi or English)":'other_languages',
       'On a scale from 0 to 10, how would you rate the severity of your anxiety?':'anxiety_level',
       'On a scale from 0 to 10, how would you rate the severity of your depression?':'depression_level',
       'On a scale from 0 to 10, how would you rate the severity of your insomnia?':'insomnia_level',
       'On a scale from 0 to 10, how would you rate the severity of your OCD (Obsessive-Compulsive Disorder) symptoms?':'ocd_level',
       'How does music affect your mental health?':'music_effects',
       'We appreciate your participation in this survey. Your responses are valuable for our research. Rest assured, we have not collected any personally identifiable information, such as your name or email address. Your privacy is important to us, and your data will be used solely for research purposes.':'permissions'},
         inplace=True)

Handling troll responses
- based on age
- based on hours a day spent listening to music

Considering that the oldest person ever alive was 120 years old and age cannot be negative, we will be dropping rows where age is less than 0 or more than 120.

In [None]:
age_criteria = df[ (df['age']>120) | (df['age']<0) ].index
df.drop(age_criteria, inplace=True)

Assuming that some of the people who entered values like 30, 25 or 40 meant those many minutes and not hours, we will be converting such values (in intervals of 5) to hours while dropping others such as 24 or 18.

In [None]:
def min_hr(mins):
  hr = mins/60
  return hr

In [None]:
for i in df.index:
  hours = df.loc[i,'hours_spent']
  if hours >= 10:
    if (hours%5==0):
      df.loc[i,'hours_spent'] = min_hr(hours)
    else:
      df.drop(i, inplace=True)

Some people have chosen a genre, but with different spellings. Whereas some have chosen a sub-genre of a broader category.

Here, we will be mapping all such values to their respective categories.

In [None]:
genre_map = {
    'indie':['Indie ','Indie'],
    'mixed':['All depending on mood','Depends on mood', 'My own play list','No particular genre ... ',
             'Depends upon mood','Depends on my mood','Anything sounding indian','Mixed, depend on mood',
             'Bollywood, Devotional, Meditational, Lo-fi','None ','Classical And Devotional And Folk',
             'Soothing with good lyrics, Hindi','Mix'],
    'ghazal':['Ghazal & Bhajan ','GHAZAL','Gajal '],
    'metal':['Metal ','Metal','Almost all kind of Metal'],
    'Bollywood':['Old hit Hindi films songs','Old Bollywood songs','Panjabi ','Hindi and tamil film songs'],
    'Rock':['Rock and pop rock','alternate rock','Alternative Rock'],
    'Hip-Hop/Rap':['Brazilian phonk/hardstyle/rap/uptempo.','Phonk'],
    'Pop':['K-pop and lofi ','K-pop','English Top charts '],
    'Classical':['Semi classical , light music ','Soothing Retro Songs']
}

In [None]:
for genre in genre_map:
  for i in df.index:
    if df.loc[i,'favorite_genre'] in genre_map[genre]:
      df.loc[i,'favorite_genre'] = genre

We can assume that people who left the 'other genres' question unanswered do not listen to other genres apart from their favorite.

And people who left the 'other languages' question unanswered do not listen to other languages apart from Hindi or English.

In [None]:
df.other_genres.fillna('No other genres', inplace=True)
df.other_languages.fillna('No other language', inplace=True)

Since we gave the user a choice to select more than one option in the 'other genres' and 'other languages' questions, we can create new columns for each genre/language to find out how many people listen to each of the genres/languages.

In [None]:
genres_list = ['Pop', 'Bollywood', 'Hip-Hop/Rap','Rock','Devotional','Classical','Instrumental',
               'R&B (Rhythm and Blues)','Folk', 'Electronic/Dance','No other genres']

In [None]:
for genre in genres_list:
  for i in df.index:
    if genre in df.loc[i,'other_genres']:
      df.loc[i,genre] = 1
    else:
      df.loc[i,genre] = 0

In [None]:
df.head()

Unnamed: 0,Timestamp,age,primary_streaming_platform,hours_spent,music_while_working,favorite_genre,explore_genres,other_genres,other_languages,anxiety_level,...,Bollywood,Hip-Hop/Rap,Rock,Devotional,Classical,Instrumental,R&B (Rhythm and Blues),Folk,Electronic/Dance,No other genres
0,2023-09-02 20:21:53.727,18,Spotify,3.0,Yes,Pop,Sometimes,"Pop, Bollywood, Hip-Hop/Rap, R&B (Rhythm and B...",Punjabi,2,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
1,2023-09-02 20:22:14.115,19,Apple Music,3.0,Yes,Hip-Hop/Rap,Sometimes,"Pop, Bollywood, Electronic/Dance",Punjabi,0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,2023-09-02 20:23:07.010,18,Spotify,1.0,No,Pop,Sometimes,"Bollywood, Classical",No other language,4,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,2023-09-02 20:23:17.846,19,Spotify,4.0,No,R&B (Rhythm and Blues),Yes,"Pop, Bollywood, Electronic/Dance, Classical","Punjabi, Korean",7,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
5,2023-09-02 20:24:09.217,18,Spotify,2.0,No,indie,Sometimes,"Bollywood, Hip-Hop/Rap",Spanish,1,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
langs_list = ['Punjabi','Spanish','Korean','Tamil','Telegu',
              'Malayalam','French','Portugese','Chinese','No other language']

In [None]:
for lang in langs_list:
  for i in df.index:
    if lang in df.loc[i,'other_languages']:
      df.loc[i,lang] = 1
    else:
      df.loc[i,lang] = 0

In [None]:
df.head()

Unnamed: 0,Timestamp,age,primary_streaming_platform,hours_spent,music_while_working,favorite_genre,explore_genres,other_genres,other_languages,anxiety_level,...,Punjabi,Spanish,Korean,Tamil,Telegu,Malayalam,French,Portugese,Chinese,No other language
0,2023-09-02 20:21:53.727,18,Spotify,3.0,Yes,Pop,Sometimes,"Pop, Bollywood, Hip-Hop/Rap, R&B (Rhythm and B...",Punjabi,2,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2023-09-02 20:22:14.115,19,Apple Music,3.0,Yes,Hip-Hop/Rap,Sometimes,"Pop, Bollywood, Electronic/Dance",Punjabi,0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2023-09-02 20:23:07.010,18,Spotify,1.0,No,Pop,Sometimes,"Bollywood, Classical",No other language,4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,2023-09-02 20:23:17.846,19,Spotify,4.0,No,R&B (Rhythm and Blues),Yes,"Pop, Bollywood, Electronic/Dance, Classical","Punjabi, Korean",7,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,2023-09-02 20:24:09.217,18,Spotify,2.0,No,indie,Sometimes,"Bollywood, Hip-Hop/Rap",Spanish,1,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now the issue is some people have selected their favorite genre again in the 'other genres' question which could lead to redundancy in data.

So we will be removing the value from the genre columns of a sample if it is already selected as the favorite genre. And if after that there is no other genre they listen to, we will add them to the 'no other genres' category.

In [None]:
for i in df.index:
  fav = df.loc[i,'favorite_genre']
  other = df.loc[i,'other_genres']
  if fav in other:
    df.loc[i,fav] = 0
  if any(df.loc[i,genres_list[:-1]])!=True:
    df.loc[i,genres_list[-1]] = 1

In [None]:
df.favorite_genre.unique()

array(['Pop', 'Hip-Hop/Rap', 'R&B (Rhythm and Blues)', 'indie',
       'Bollywood', 'Electronic/Dance', 'Rock', 'Devotional', 'mixed',
       'Classical', 'metal', 'Instrumental', 'Folk', 'ghazal'],
      dtype=object)

Once again, like before, we map the chosen streaming platforms to broader categories.

In [None]:
df.primary_streaming_platform.value_counts()

Spotify                           233
YouTube                           178
Downloaded Songs                   28
Gaana                              14
Apple Music                        11
Radio                               3
Amazon music                        2
Wynk                                2
Amazon                              2
Saavan                              2
Amazon Music                        2
Youtube and Spotify                 1
jiosavaan                           1
Devotional                          1
Wink                                1
JioSaavn                            1
Hindi old song                      1
Amazone Music                       1
SoundCloud                          1
Wynk, Jio Saavn                     1
Carva ,stored Binaca Geetmala       1
None                                1
Savvan                              1
Pandora                             1
Amazon music                        1
Friend                              1
music system

In [None]:
streaming_platform_mapping = {
    'YouTube': ['YouTube','Youtube'],
    'Spotify': ['Spotify','Youtube and Spotify'],
    'Gaana': ['Gaana'],
    'Apple Music': ['Apple Music','Jio Saavn or Apple Music'],
    'Downloaded Songs': ['Downloaded Songs'],
    'Amazon Music': ['Amazon Music','Amazon music ','Amazon music','Prime music','Amazon','Prime Music ','Amazone Music '],
    'JioSaavn' : ['JioSaavn','Jio Saavan','Savvan ','Saavan','jiosavaan'],
    'Others' : ['Innertune','Pandora','Carva ,stored Binaca Geetmala ','None','None ','Devotional','Hindi old song','SoundCloud ','Friend','music system '],
    'Radio' : ['Radio'],
    'Wynk' : ['Wynk','Wink','Wynk, Jio Saavn']
}

In [None]:
for sp in streaming_platform_mapping:
  for i in df.index:
    if df.loc[i,'primary_streaming_platform'] in streaming_platform_mapping[sp]:
      df.loc[i,'primary_streaming_platform'] = sp

In [None]:
df.primary_streaming_platform.value_counts()

Spotify             234
YouTube             178
Downloaded Songs     28
Gaana                14
Apple Music          12
Amazon Music         10
Others                9
JioSaavn              5
Wynk                  4
Radio                 3
Name: primary_streaming_platform, dtype: int64

Now since the anxiety and depression levels were self reported, we would change them to more common everyday used terms related to the most common symptoms of these issues. And also change them from integer scores to categorized levels.

In [None]:
for i in df.index:
  level = df.loc[i,'anxiety_level']
  new = 'persistent_state_of_worry_panic_fear'
  if level == 0:
    df.loc[i,new] = 'none'
  elif level >= 1 and level <= 3:
    df.loc[i,new] = 'mild'
  elif level >= 4 and level <= 6:
    df.loc[i,new] = 'moderate'
  elif level >= 7 and level <= 9:
    df.loc[i,new] = 'severe'
  else:
    df.loc[i,new] = 'extreme'

In [None]:
for i in df.index:
  level = df.loc[i,'depression_level']
  new = 'persistent_sadness_tiredness_loss_of_interest'
  if level >= 0 and level <= 1.6:
    df.loc[i,new] = 'normal'
  elif level > 1.6 and level <= 2.4:
    df.loc[i,new] = 'mild'
  elif level > 2.4 and level <= 2.9:
    df.loc[i,new] = 'borderline'
  elif level > 2.9 and level <= 4.4:
    df.loc[i,new] = 'moderate'
  elif level > 4.4 and level <= 6.0:
    df.loc[i,new] = 'severe'
  else:
    df.loc[i,new] = 'extreme'

In [None]:
df[['anxiety_level','persistent_state_of_worry_panic_fear','depression_level','persistent_sadness_tiredness_loss_of_interest']].sample(10)

Unnamed: 0,anxiety_level,persistent_state_of_worry_panic_fear,depression_level,persistent_sadness_tiredness_loss_of_interest
178,2,mild,0,normal
66,3,mild,1,normal
468,7,severe,7,extreme
203,2,mild,0,normal
168,4,moderate,0,normal
212,7,severe,5,severe
348,2,mild,0,normal
240,6,moderate,1,normal
207,3,mild,1,normal
347,6,moderate,0,normal


In [None]:
df.to_csv('preprocessed google form responses.csv')