# Simple Pre-Processing for the Raw results
This notebook applies some basic preproccessing on the raw survey results, mainly:
- Transforming the "JSON" files into "CSV" which is more friendly and common for data analysis tasks.
- Decoding the questions and answers values to natural language categories, in order to use for future work. 

**Let's GO!**

### Imports

In [210]:
import pandas as pd 
import json

### Read the Json Files

In [211]:
with open('./../results.json') as f:
    data = json.load(f)
SOD_Morocco = pd.DataFrame(data['results'])
SOD_Morocco.head()

Unnamed: 0,userId,community-q-6,profile-q-0,education-q-6,work-q-0,profile-q-6,education-q-3,tech-q-5,work-q-6,community-q-1,...,startTime,work-q-2,work-q-10,lastSubmit,profile-q-5,education-q-1,community-q-3,community-q-5,tech-q-6,work-q-12
0,01BV48HMOShv1AYfUrsb8yObpdK2,0.0,0.0,"[0, 3, 9]",1.0,0.0,"[0, 2, 3]",[],5.0,[0],...,1637612000000.0,4.0,4.0,1637613000000.0,5.0,0.0,9.0,1.0,[],2.0
1,01CNPyUuqBXvSlbwMeuoSOoh2Sg1,2.0,0.0,"[0, 2, 3, 4, 6, 8]",1.0,0.0,"[0, 2, 3, 4]","[5, 6, 18]",,[2],...,1636550000000.0,4.0,3.0,1636551000000.0,3.0,1.0,7.0,0.0,[],2.0
2,02855LEtvnh8H6o6xeU8pRbER2s1,,,,,,,,,,...,1638973000000.0,,,,,,,,,
3,032il0r4EzeudubezmM1kFr6SLj2,1.0,0.0,"[0, 1, 2, 3, 4, 5, 8]",6.0,0.0,"[0, 2, 3]","[1, 4]",,[2],...,1636925000000.0,,3.0,1636926000000.0,1.0,0.0,9.0,0.0,[5],
4,09vEQWutBeYMv1MjCJgLl0vJmLT2,,0.0,"[0, 1, 8]",,0.0,"[0, 3]",,,,...,1638035000000.0,,,1638035000000.0,0.0,0.0,,,,


In [212]:
SOD_Morocco_qst = pd.read_json("./../questions.json")
SOD_Morocco_qst.head()

Unnamed: 0,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-14,tech-q-15,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7
multiple,False,,,,,,,,,,...,True,True,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,What cloud platform are you using?,The production database you or your team are u...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in 2020/2021?,How many talks did you participate in as a spe...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[GCP, AWS, Azure, Oracle Cloud, IBM Cloud, Dig...","[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"


### Translate the survey codes and abbreviations

In [213]:
SOD_qst_dict = {
    "profile-q-0":"Gender",
    "profile-q-1":"Age", 
    "profile-q-2":"Country",
    "profile-q-3":"Job_Title",
    "profile-q-4":"Education",
    "profile-q-5":"Coding_Experience",
    "profile-q-6":"Coding_as_Hobby",
    "profile-q-7":"Immigration_Plans",
    "profile-q-8":"Reentery_Plans",
    "profile-q-9":"Fav_Coding_Drink",
    "education-q-0":"Coding_Education",
    "education-q-1":"Teaching_Jobs_Gap",
    "education-q-2":"Teaching_Problems_POV",
    "education-q-3":"Read_Write_Languages",
    "education-q-4":"English_Barrier",
    "education-q-5":"Local_Dialect_Content",
    "education-q-6":"Learning_Platforms",
    "tech-q-0":"Fav_Programming_Languages",
    "tech-q-1":"Wanted_Programming_Languages",
    "tech-q-2":"Daily_Programming_Languages",
    "tech-q-3":"Daily_FE_Web_Frameworks",
    "tech-q-4":"Wanted_FE_Web_Frameworks",
    "tech-q-5":"Daily_BE_Web_Frameworks",
    "tech-q-6":"Wanted_BE_Web_Frameworks",
    "tech-q-7":"Daily_Paltforms",
    "tech-q-8":"Wanted_Platforms",
    "tech-q-9":"Primary_OS",
    "tech-q-10":"Daily_IDE",
    "tech-q-11":"Continuous_Learning_Frequency",
    "tech-q-12":"Help_Sources",
    "tech-q-13":"Production_Env_Used",
    "tech-q-14":"Cloud_Platform_Used",
    "tech-q-15":"Production_DB_Used",
    "work-q-0":"Employment_Status",
    "work-q-1":"Overtime_Frequency",
    "work-q-2":"Job_Satisfaction",
    "work-q-3":"Side_Projects",
    "work-q-4":"Graduation_Employment_Gap",
    "work-q-5":"CDI_Morocco_Salary_Range_DH",
    "work-q-6":"Freelancer_Morocco_Salary_Range_DH",
    "work-q-7":"CDI_Abroad_Salary_Range_USD",
    "work-q-8":"Freelancer_Abroad_Salary_Range_USD",
    "work-q-9":"Preferred_Job_Criteria",
    "work-q-10":"Preferred_Company_Type",
    "work-q-11":"Agile_Methodology",
    "work-q-12":"Provided_Work_Model",
    "work-q-13":"Prefrred_Work_Model",
    "community-q-0":"Local_Community_Membership",
    "community-q-1":"Open_Source_Participation",
    "community-q-2":"Blog_Posting",
    "community-q-3":"IT_Primary_Social_Network",
    "community-q-4":"Event_Attendence_20_21",
    "community-q-5":"Talks_Participation_20_21",
    "community-q-6":"Online_InPerson_Preferrence",
    "community-q-7":"Moroccan_Community_AutoEval",
}

In [214]:
# Save the Survey question codes as a new row
temp_df = pd.DataFrame(SOD_Morocco_qst.columns, columns=["Survey_Code"],
             index=SOD_Morocco_qst.columns).transpose()
SOD_Morocco_qst = pd.concat([SOD_Morocco_qst, temp_df])
del temp_df
# Replace the titles of columns in the Questions DataFrame
SOD_Morocco_qst.columns = SOD_Morocco_qst.columns.to_series().map(SOD_qst_dict)
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Immigration_Plans,Reentery_Plans,Fav_Coding_Drink,...,Cloud_Platform_Used,Production_DB_Used,Local_Community_Membership,Open_Source_Participation,Blog_Posting,IT_Primary_Social_Network,Event_Attendence_20_21,Talks_Participation_20_21,Online_InPerson_Preferrence,Moroccan_Community_AutoEval
multiple,False,,,,,,,,,,...,True,True,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,What cloud platform are you using?,The production database you or your team are u...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in 2020/2021?,How many talks did you participate in as a spe...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[GCP, AWS, Azure, Oracle Cloud, IBM Cloud, Dig...","[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-14,tech-q-15,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


In [215]:
# Replace the titles of columns in the Answers DataFrame
# ".map" makes an exhaustive repalcement by default
# Adding ".fillna" to keep the original column name if the mapping fails
SOD_Morocco.columns = SOD_Morocco.columns.to_series().map(SOD_qst_dict).fillna(SOD_Morocco.columns.to_series())
SOD_Morocco.head()

Unnamed: 0,userId,Online_InPerson_Preferrence,Gender,Learning_Platforms,Employment_Status,Coding_as_Hobby,Read_Write_Languages,Daily_BE_Web_Frameworks,Freelancer_Morocco_Salary_Range_DH,Open_Source_Participation,...,startTime,Job_Satisfaction,Preferred_Company_Type,lastSubmit,Coding_Experience,Teaching_Jobs_Gap,IT_Primary_Social_Network,Talks_Participation_20_21,Wanted_BE_Web_Frameworks,Provided_Work_Model
0,01BV48HMOShv1AYfUrsb8yObpdK2,0.0,0.0,"[0, 3, 9]",1.0,0.0,"[0, 2, 3]",[],5.0,[0],...,1637612000000.0,4.0,4.0,1637613000000.0,5.0,0.0,9.0,1.0,[],2.0
1,01CNPyUuqBXvSlbwMeuoSOoh2Sg1,2.0,0.0,"[0, 2, 3, 4, 6, 8]",1.0,0.0,"[0, 2, 3, 4]","[5, 6, 18]",,[2],...,1636550000000.0,4.0,3.0,1636551000000.0,3.0,1.0,7.0,0.0,[],2.0
2,02855LEtvnh8H6o6xeU8pRbER2s1,,,,,,,,,,...,1638973000000.0,,,,,,,,,
3,032il0r4EzeudubezmM1kFr6SLj2,1.0,0.0,"[0, 1, 2, 3, 4, 5, 8]",6.0,0.0,"[0, 2, 3]","[1, 4]",,[2],...,1636925000000.0,,3.0,1636926000000.0,1.0,0.0,9.0,0.0,[5],
4,09vEQWutBeYMv1MjCJgLl0vJmLT2,,0.0,"[0, 1, 8]",,0.0,"[0, 3]",,,,...,1638035000000.0,,,1638035000000.0,0.0,0.0,,,,


### Translate the answers values

In [216]:
from math import nan
import sys
# Mapping the answer values to the original choices in the questions DataFrame
for col in SOD_Morocco_qst.columns:
    temp_dict = dict(
        zip(
            range(0, len(SOD_Morocco_qst[col]["choices"])), SOD_Morocco_qst.loc["choices", col]
        )
    )
    # Seperate single choice from multi choice answers
    if SOD_Morocco[col].dtype != object:
        SOD_Morocco[col] = SOD_Morocco[col].map(temp_dict)
    else:
        temp_list= []
        for i in SOD_Morocco.index:
            # Some Nan values exist, we check to skip them
            if type(SOD_Morocco[col][i]) is list:
                temp_list.append([temp_dict[x] for x in SOD_Morocco.loc[i, col]])
            else:
                temp_list.append(nan)
        SOD_Morocco[col] = pd.Series(temp_list, dtype = object)
            

In [217]:
SOD_Morocco.head()

Unnamed: 0,userId,Online_InPerson_Preferrence,Gender,Learning_Platforms,Employment_Status,Coding_as_Hobby,Read_Write_Languages,Daily_BE_Web_Frameworks,Freelancer_Morocco_Salary_Range_DH,Open_Source_Participation,...,startTime,Job_Satisfaction,Preferred_Company_Type,lastSubmit,Coding_Experience,Teaching_Jobs_Gap,IT_Primary_Social_Network,Talks_Participation_20_21,Wanted_BE_Web_Frameworks,Provided_Work_Model
0,01BV48HMOShv1AYfUrsb8yObpdK2,Live/online events,Male,"[YouTube, Official documentation, Others]",Employed full-time,Yes,"[Arabic, French, English]",[],3 000 - 4 000,[I don’t have an account on Github (or alterna...,...,1637612000000.0,Very satisfied,Company with >100 employees,1637613000000.0,7-10 years,Not enough,,1 - 3 talks,[],Hybrid
1,01CNPyUuqBXvSlbwMeuoSOoh2Sg1,Li ja bismillah,Male,"[YouTube, Books, Official documentation, Blog ...",Employed full-time,Yes,"[Arabic, French, English, Others]","[Django, Flask, Other]",,[I am maintaining my own project.],...,1636550000000.0,Very satisfied,Company with >30 and <100 employees,1636551000000.0,3-4 years,Enough to start,Slack,0,[],Hybrid
2,02855LEtvnh8H6o6xeU8pRbER2s1,,,,,,,,,,...,1638973000000.0,,,,,,,,,
3,032il0r4EzeudubezmM1kFr6SLj2,In-person events?,Male,"[YouTube, Paid platforms, Books, Official docu...",Student,Yes,"[Arabic, French, English]","[Spring Ecosystem, Jakarta EE / Java EE]",,[I am maintaining my own project.],...,1636925000000.0,,Company with >30 and <100 employees,1636926000000.0,Less than a year,Not enough,,0,[Django],
4,09vEQWutBeYMv1MjCJgLl0vJmLT2,,Male,"[YouTube, Paid platforms, Online Communities (...",,Yes,"[Arabic, English]",,,,...,1638035000000.0,,,1638035000000.0,I don't have any professional coding experience,Not enough,,,,


In [218]:
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Immigration_Plans,Reentery_Plans,Fav_Coding_Drink,...,Cloud_Platform_Used,Production_DB_Used,Local_Community_Membership,Open_Source_Participation,Blog_Posting,IT_Primary_Social_Network,Event_Attendence_20_21,Talks_Participation_20_21,Online_InPerson_Preferrence,Moroccan_Community_AutoEval
multiple,False,,,,,,,,,,...,True,True,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,What cloud platform are you using?,The production database you or your team are u...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in 2020/2021?,How many talks did you participate in as a spe...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[GCP, AWS, Azure, Oracle Cloud, IBM Cloud, Dig...","[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-14,tech-q-15,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


### Remove Empty users

In [219]:
# The data needs to be cleaned, some submitters/objects are empty
# You'll find the list of userId missing data below (253 users)
SOD_Morocco[SOD_Morocco['Gender'].isnull()]

Unnamed: 0,userId,Online_InPerson_Preferrence,Gender,Learning_Platforms,Employment_Status,Coding_as_Hobby,Read_Write_Languages,Daily_BE_Web_Frameworks,Freelancer_Morocco_Salary_Range_DH,Open_Source_Participation,...,startTime,Job_Satisfaction,Preferred_Company_Type,lastSubmit,Coding_Experience,Teaching_Jobs_Gap,IT_Primary_Social_Network,Talks_Participation_20_21,Wanted_BE_Web_Frameworks,Provided_Work_Model
2,02855LEtvnh8H6o6xeU8pRbER2s1,,,,,,,,,,...,1.638973e+12,,,,,,,,,
7,0IF3vGA0AwbxtOXPCT1c3rGRW093,,,,,,,,,,...,1.636989e+12,,,,,,,,,
18,1PIk6rQT81aLjG5HhhOnQE8OfF12,,,,,,,,,,...,1.636471e+12,,,,,,,,,
21,1X6tf7t29Ga5TaFfKr4yyIlQUOz1,,,,,,,,,,...,1.636101e+12,,,,,,,,,
23,1bYuMhEx7QSUy99wHhikDyL10CJ3,,,,,,,,,,...,1.636424e+12,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1334,z7ZbvpwG4SQHo2nq4cXGOEdokh93,,,,,,,,,,...,1.638893e+12,,,,,,,,,
1336,zAVATV2egGTMVjSmFK2HuE22dIF2,,,,,,,,,,...,1.636064e+12,,,,,,,,,
1340,zOXPV1cYwQbKQM7zpY8vS6KByR93,,,,,,,,,,...,1.636483e+12,,,,,,,,,
1346,zkqghvYFn6RiOKA5iU5wybeyRf12,,,,,,,,,,...,1.636674e+12,,,,,,,,,


In [220]:
# Number of submitters before cleaning the data
SOD_Morocco.shape[0]

1351

In [221]:
# Remove them from the dataset
SOD_Morocco = SOD_Morocco[SOD_Morocco['Gender'].notna()]

In [222]:
# Number of submitters after cleaning the data
SOD_Morocco.shape[0]

1098

### Save to CSV

In [223]:
SOD_Morocco.to_csv("results_preprocessed.csv", index = False)
# Choosing to keep the Index for the Answers DataFrame
SOD_Morocco_qst.to_csv("questions_preprocessed.csv", index_label ="Keys" )