# Simple Pre-Processing for the Raw results
This notebook applies some basic preproccessing on the raw survey results, mainly:
- Transforming the "JSON" files into "CSV" which is more friendly and common for data analysis tasks.
- Decoding the questions and answers values to natural language categories, in order to use for future work. 

**Let's GO!**

### Imports

In [8]:
import pandas as pd 
import json

### Read the Json Files

In [9]:
with open('./../results.json') as f:
    data = json.load(f)
SOD_Morocco = pd.DataFrame(data['results'])
SOD_Morocco.head()

Unnamed: 0,userId,community-q-7,community-q-3,tech-q-8,profile-q-3,profile-q-7,community-q-1,profile-q-2,profile-q-4,startTime,...,education-q-4,tech-q-11,education-q-5,tech-q-0,tech-q-5,profile-q-5,education-q-3,tech-q-16,work-q-6,profile-q-1
0,034cxKbcScSkMKWBkhu9xC6ubRE2,2.0,4.0,"[0, 1, 7, 9, 10, 12, 17]",4.0,1.0,[2],3.0,1.0,1672917044607,...,1.0,0.0,0.0,"[0, 1, 3, 4, 5, 9, 10]",,0.0,"[0, 2, 3, 4]",2.0,,1.0
1,04TsCbQGWxg9S2cudIT86e0CMH03,3.0,4.0,"[1, 9]",1.0,0.0,[0],3.0,1.0,1671819917339,...,1.0,1.0,0.0,"[0, 1, 2, 3, 4, 5, 6, 7, 10]","[7, 12]",0.0,"[0, 2, 3]",0.0,7.0,2.0
2,05THu8aCmBUOnxCCGXESKeqlcVQ2,3.0,0.0,[19],10.0,4.0,[0],1.0,1.0,1672216451960,...,1.0,3.0,2.0,[1],,0.0,"[0, 3]",2.0,,1.0
3,06zWHRXkimdBlINCzJ3warn55V53,,,,13.0,4.0,,5.0,1.0,1671789215475,...,1.0,,0.0,,,1.0,"[0, 2, 3]",,,1.0
4,08A0JzKQixVjdjCE49QboELB1do1,,,,2.0,3.0,,2.0,2.0,1672386895995,...,,,,,,3.0,,,,1.0


In [10]:
SOD_Morocco_qst = pd.read_json("./../questions.json")
SOD_Morocco_qst.head()

Unnamed: 0,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-15,tech-q-16,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7
multiple,False,,,,,,,,,,...,True,,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,The production database you or your team are u...,How often do you use AI tools for Dev (Github ...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in the last ...,How many talks did you participate in the last...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,True,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[Daily, Tried them but not interested, Never]","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"


### Translate the survey codes and abbreviations

In [11]:
SOD_qst_dict = {
    "profile-q-0":"Gender",
    "profile-q-1":"Age", 
    "profile-q-2":"Country",
    "profile-q-3":"Job_Title",
    "profile-q-4":"Education",
    "profile-q-5":"Coding_Experience",
    "profile-q-6":"Coding_as_Hobby",
    "profile-q-7":"Immigration_Plans",
    "profile-q-8":"Reentery_Plans",
    "profile-q-9":"Fav_Coding_Drink",
    "education-q-0":"Coding_Education",
    "education-q-1":"Teaching_Jobs_Gap",
    "education-q-2":"Teaching_Problems_POV",
    "education-q-3":"Read_Write_Languages",
    "education-q-4":"English_Barrier",
    "education-q-5":"Local_Dialect_Content",
    "education-q-6":"Learning_Platforms",
    "tech-q-0":"Fav_Programming_Languages",
    "tech-q-1":"Wanted_Programming_Languages",
    "tech-q-2":"Daily_Programming_Languages",
    "tech-q-3":"Daily_FE_Web_Frameworks",
    "tech-q-4":"Wanted_FE_Web_Frameworks",
    "tech-q-5":"Daily_BE_Web_Frameworks",
    "tech-q-6":"Wanted_BE_Web_Frameworks",
    "tech-q-7":"Daily_Paltforms",
    "tech-q-8":"Wanted_Platforms",
    "tech-q-9":"Primary_OS",
    "tech-q-10":"Daily_IDE",
    "tech-q-11":"Continuous_Learning_Frequency",
    "tech-q-12":"Help_Sources",
    "tech-q-13":"Production_Env_Used",
    "tech-q-14":"Cloud_Platform_Used",
    "tech-q-15":"Production_DB_Used",
    "tech-q-16":"AI_Tools_Usage_Frequency",
    "work-q-0":"Employment_Status",
    "work-q-1":"Overtime_Frequency",
    "work-q-2":"Job_Satisfaction",
    "work-q-3":"Side_Projects",
    "work-q-4":"Graduation_Employment_Gap",
    "work-q-5":"CDI_Morocco_Salary_Range_DH",
    "work-q-6":"Freelancer_Morocco_Salary_Range_DH",
    "work-q-7":"CDI_Abroad_Salary_Range_USD",
    "work-q-8":"Freelancer_Abroad_Salary_Range_USD",
    "work-q-9":"Preferred_Job_Criteria",
    "work-q-10":"Preferred_Company_Type",
    "work-q-11":"Agile_Methodology",
    "work-q-12":"Provided_Work_Model",
    "work-q-13":"Prefrred_Work_Model",
    "community-q-0":"Local_Community_Membership",
    "community-q-1":"Open_Source_Participation",
    "community-q-2":"Blog_Posting",
    "community-q-3":"IT_Primary_Social_Network",
    "community-q-4":"Event_Attendence_20_21",
    "community-q-5":"Talks_Participation_20_21",
    "community-q-6":"Online_InPerson_Preferrence",
    "community-q-7":"Moroccan_Community_AutoEval",
}

In [12]:
# Save the Survey question codes as a new row
temp_df = pd.DataFrame(SOD_Morocco_qst.columns, columns=["Survey_Code"],
             index=SOD_Morocco_qst.columns).transpose()
SOD_Morocco_qst = pd.concat([SOD_Morocco_qst, temp_df])
del temp_df
# Replace the titles of columns in the Questions DataFrame
SOD_Morocco_qst.columns = SOD_Morocco_qst.columns.to_series().map(SOD_qst_dict)
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Immigration_Plans,Reentery_Plans,Fav_Coding_Drink,...,Production_DB_Used,AI_Tools_Usage_Frequency,Local_Community_Membership,Open_Source_Participation,Blog_Posting,IT_Primary_Social_Network,Event_Attendence_20_21,Talks_Participation_20_21,Online_InPerson_Preferrence,Moroccan_Community_AutoEval
multiple,False,,,,,,,,,,...,True,,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,The production database you or your team are u...,How often do you use AI tools for Dev (Github ...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in the last ...,How many talks did you participate in the last...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,True,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[Daily, Tried them but not interested, Never]","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-15,tech-q-16,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


In [13]:
# Replace the titles of columns in the Answers DataFrame
# ".map" makes an exhaustive repalcement by default
# Adding ".fillna" to keep the original column name if the mapping fails
SOD_Morocco.columns = SOD_Morocco.columns.to_series().map(SOD_qst_dict).fillna(SOD_Morocco.columns.to_series())
SOD_Morocco.head()

Unnamed: 0,userId,Moroccan_Community_AutoEval,IT_Primary_Social_Network,Wanted_Platforms,Job_Title,Immigration_Plans,Open_Source_Participation,Country,Education,startTime,...,English_Barrier,Continuous_Learning_Frequency,Local_Dialect_Content,Fav_Programming_Languages,Daily_BE_Web_Frameworks,Coding_Experience,Read_Write_Languages,AI_Tools_Usage_Frequency,Freelancer_Morocco_Salary_Range_DH,Age
0,034cxKbcScSkMKWBkhu9xC6ubRE2,2.0,4.0,"[0, 1, 7, 9, 10, 12, 17]",4.0,1.0,[2],3.0,1.0,1672917044607,...,1.0,0.0,0.0,"[0, 1, 3, 4, 5, 9, 10]",,0.0,"[0, 2, 3, 4]",2.0,,1.0
1,04TsCbQGWxg9S2cudIT86e0CMH03,3.0,4.0,"[1, 9]",1.0,0.0,[0],3.0,1.0,1671819917339,...,1.0,1.0,0.0,"[0, 1, 2, 3, 4, 5, 6, 7, 10]","[7, 12]",0.0,"[0, 2, 3]",0.0,7.0,2.0
2,05THu8aCmBUOnxCCGXESKeqlcVQ2,3.0,0.0,[19],10.0,4.0,[0],1.0,1.0,1672216451960,...,1.0,3.0,2.0,[1],,0.0,"[0, 3]",2.0,,1.0
3,06zWHRXkimdBlINCzJ3warn55V53,,,,13.0,4.0,,5.0,1.0,1671789215475,...,1.0,,0.0,,,1.0,"[0, 2, 3]",,,1.0
4,08A0JzKQixVjdjCE49QboELB1do1,,,,2.0,3.0,,2.0,2.0,1672386895995,...,,,,,,3.0,,,,1.0


### Translate the answers values

In [14]:
from math import nan
import sys
# Mapping the answer values to the original choices in the questions DataFrame
for col in SOD_Morocco_qst.columns:
    temp_dict = dict(
        zip(
            range(0, len(SOD_Morocco_qst[col]["choices"])), SOD_Morocco_qst.loc["choices", col]
        )
    )
    # Seperate single choice from multi choice answers
    if SOD_Morocco[col].dtype != object:
        SOD_Morocco[col] = SOD_Morocco[col].map(temp_dict)
    else:
        temp_list= []
        for i in SOD_Morocco.index:
            # Some Nan values exist, we check to skip them
            if type(SOD_Morocco[col][i]) is list:
                temp_list.append([temp_dict[x] for x in SOD_Morocco.loc[i, col]])
            else:
                temp_list.append(nan)
        SOD_Morocco[col] = pd.Series(temp_list, dtype = object)
            

In [15]:
SOD_Morocco.head()

Unnamed: 0,userId,Moroccan_Community_AutoEval,IT_Primary_Social_Network,Wanted_Platforms,Job_Title,Immigration_Plans,Open_Source_Participation,Country,Education,startTime,...,English_Barrier,Continuous_Learning_Frequency,Local_Dialect_Content,Fav_Programming_Languages,Daily_BE_Web_Frameworks,Coding_Experience,Read_Write_Languages,AI_Tools_Usage_Frequency,Freelancer_Morocco_Salary_Range_DH,Age
0,034cxKbcScSkMKWBkhu9xC6ubRE2,Good,Youtube,"[Node.js, .NET, Docker, React Native, Unity 3D...",Mobile developer,"Yes, in the next 24 months",[I am maintaining my own project.],Tanger-Tétouan-Al Hoceïma,"Bac +2/+3, Bachelor’s degree (B.A., B.S., B.En...",1672917044607,...,No,Every few months,Spoken,"[JavaScript, HTML/CSS 😉, Python, Java, Bash/Sh...",,I don't have any professional coding experience,"[Arabic, French, English, Others]",Never,,18 to 24 years
1,04TsCbQGWxg9S2cudIT86e0CMH03,Excellent,Youtube,"[.NET, React Native]",Full-stack developer,"Yes, in the next 12 months",[I don’t have an account on Github (or alterna...,Tanger-Tétouan-Al Hoceïma,"Bac +2/+3, Bachelor’s degree (B.A., B.S., B.En...",1671819917339,...,No,Once a year,Spoken,"[JavaScript, HTML/CSS 😉, SQL, Python, Java, Ba...","[Laravel, Express.js]",I don't have any professional coding experience,"[Arabic, French, English]",Daily,> 5 000,25 to 34 years
2,05THu8aCmBUOnxCCGXESKeqlcVQ2,Excellent,Facebook,[Other],Designer,No,[I don’t have an account on Github (or alterna...,Casablanca-Settat,"Bac +2/+3, Bachelor’s degree (B.A., B.S., B.En...",1672216451960,...,No,Once a decade,Both,[HTML/CSS 😉],,I don't have any professional coding experience,"[Arabic, English]",Never,,18 to 24 years
3,06zWHRXkimdBlINCzJ3warn55V53,,,,Data scientist or machine learning specialist,No,,Fès-Meknès,"Bac +2/+3, Bachelor’s degree (B.A., B.S., B.En...",1671789215475,...,No,,Spoken,,,Less than a year,"[Arabic, French, English]",,,18 to 24 years
4,08A0JzKQixVjdjCE49QboELB1do1,,,,Front-end developer,Currently working outside Morocco,,Marrakech-Safi,"Bac +5, Master’s degree (M.A., M.S., M.Eng., M...",1672386895995,...,,,,,,3-4 years,,,,18 to 24 years


In [16]:
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Immigration_Plans,Reentery_Plans,Fav_Coding_Drink,...,Production_DB_Used,AI_Tools_Usage_Frequency,Local_Community_Membership,Open_Source_Participation,Blog_Posting,IT_Primary_Social_Network,Event_Attendence_20_21,Talks_Participation_20_21,Online_InPerson_Preferrence,Moroccan_Community_AutoEval
multiple,False,,,,,,,,,,...,True,,False,True,,False,,,,
label,What is your gender?,What is your age?,Where are you currently located?,What is your occupation?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you have any plans to work outside Morocco?,"If you are working abroad, do you have any pla...",What do you usually drink while coding?,...,The production database you or your team are u...,How often do you use AI tools for Dev (Github ...,How far are you involved in any local develope...,Did you ever participate in an open-source pro...,Did you get the chance to write blog posts?,"When it comes to IT, what is your primary soci...",How many IT events did you attend in the last ...,How many talks did you participate in the last...,Do you prefer?,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,True,True,True,,True,...,,True,True,True,True,True,True,True,True,True
choices,"[Male, Female]","[Younger than 18 years, 18 to 24 years, 25 to ...","[Rabat-Salé-Kénitra, Casablanca-Settat, Marrak...","[Back-end developer, Full-stack developer, Fro...","[Self-taught, Bac +2/+3, Bachelor’s degree (B....",[I don't have any professional coding experien...,"[Yes, No]","[Yes, in the next 12 months, Yes, in the next ...","[Yes, Still hesitating, No]","[Tea, Coffee, Water, Energy drinks, None, Other]",...,"[Oracle Database, Mysql/MariaDB, PostgreSQL, S...","[Daily, Tried them but not interested, Never]","[No, I am not interested, I know some, but not...",[I don’t have an account on Github (or alterna...,"[No, I am not interested., Still thinking abou...","[Facebook, Instagram, WhatsApp, Twitter, Youtu...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,tech-q-15,tech-q-16,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


### Remove Empty users

In [17]:
# The data needs to be cleaned, some submitters/objects are empty
# You'll find the list of userId missing data below (253 users)
SOD_Morocco[SOD_Morocco['Gender'].isnull()]

Unnamed: 0,userId,Moroccan_Community_AutoEval,IT_Primary_Social_Network,Wanted_Platforms,Job_Title,Immigration_Plans,Open_Source_Participation,Country,Education,startTime,...,English_Barrier,Continuous_Learning_Frequency,Local_Dialect_Content,Fav_Programming_Languages,Daily_BE_Web_Frameworks,Coding_Experience,Read_Write_Languages,AI_Tools_Usage_Frequency,Freelancer_Morocco_Salary_Range_DH,Age
11,0LMR7zjT1BNr2AD0sPPkUvWKjh82,,,,,,,,,1672826958615,...,,,,,,,,,,
54,1lMPJiLYinO6kOC7rck2bq6Pgwq1,,,,,,,,,1671471802233,...,,,,,,,,,,
57,1vtKBZU40ZfOphG4sGs1b9xjDL33,,,,,,,,,1671967743464,...,,,,,,,,,,
58,1y5p0vwqbCRJAWllKaqN5ZFUbHy2,,,,,,,,,1672937471013,...,,,,,,,,,,
60,21LIp341cNM90idrvtkDWSAzrQm1,,,,,,,,,1671653412868,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1952,z7xb8CTUWBWAcgAboIB5reKNiRD2,,,,,,,,,1674000001001,...,,,,,,,,,,
1954,zC7P3u7c92ZnhLeroR8KdzNwVDF2,,,,,,,,,1672518532681,...,,,,,,,,,,
1959,zQj0iQFsMjPC4QX5LJklNXDlBI93,,,,,,,,,1672255214838,...,,,,,,,,,,
1974,zpRBRZZOCrasGDRRvTK7kc0cyHp2,,,,,,,,,1671781877524,...,,,,,,,,,,


In [18]:
# Number of submitters before cleaning the data
SOD_Morocco.shape[0]

1984

In [19]:
# Remove them from the dataset
SOD_Morocco = SOD_Morocco[SOD_Morocco['Gender'].notna()]

In [20]:
# Number of submitters after cleaning the data
SOD_Morocco.shape[0]

1617

### Save to CSV

In [21]:
SOD_Morocco.to_csv("results_preprocessed.csv", index = False)
# Choosing to keep the Index for the Answers DataFrame
SOD_Morocco_qst.to_csv("questions_preprocessed.csv", index_label ="Keys" )