# Simple Pre-Processing for the Raw results
This notebook applies some basic preproccessing on the raw survey results, mainly:
- Transforming the "JSON" files into "CSV" which is more friendly and common for data analysis tasks.
- Decoding the questions and answers values to natural language categories, in order to use for future work. 

**Let's GO!**

### Imports

In [31]:
import pandas as pd 
import json

### Read the Json Files

In [32]:
with open('./../results.json') as f:
    data = json.load(f)
SOD_Morocco = pd.DataFrame(data['results'])
SOD_Morocco.head()

Unnamed: 0,userId,community-q-4,lastSubmit,tech-q-1,profile-q-6,profile-q-10,profile-q-15,tech-q-11,community-q-1,tech-q-3,...,community-q-3,profile-q-8,tech-q-6,work-q-2,work-q-6,work-q-0,profile-q-11,profile-q-12,profile-q-1,__collections__
0,01jKBLnYhGTy3xztjuhVIf2jass1,0.0,1603211692917,[0],0,[3],3,[5],[4],[7],...,0.0,0,"[0, 1]",2.0,"[0, 1, 3, 4, 7]",0.0,"[0, 1, 2, 3]",1,2,
1,020TufnjwQTTT0C7jzE0BYz39j53,,1604010020407,,0,[1],2,,,,...,,0,,,,,[0],0,2,
2,03RLF3jf97R8Hb49KJCOAWoipiB2,0.0,1603210798458,"[6, 8, 11]",0,"[0, 1, 2]",2,"[1, 2, 4]",[4],[17],...,1.0,1,"[1, 5]",3.0,"[0, 1, 2, 3, 7]",0.0,"[0, 2, 3]",1,2,
3,03iq2uFIwNOqgNw01FF2AViQr9B3,,1603449369739,,0,"[1, 2, 3]",1,,,,...,,0,,,"[1, 2, 4, 7]",4.0,"[0, 2, 3]",1,2,
4,03lbWrhd7nQ0FFjsaap7q062bQ13,1.0,1603892118793,"[12, 17]",0,"[1, 2, 3]",0,[5],[1],"[0, 17]",...,1.0,1,"[3, 7]",2.0,"[1, 7]",1.0,"[0, 2, 3]",1,2,


In [33]:
SOD_Morocco_qst = pd.read_json("./../questions.json")
SOD_Morocco_qst.head()

Unnamed: 0,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,work-q-8,work-q-9,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7
label,What is your gender?,What is your age?,Where are you currently located?,What is your current role?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you think school gives you the most importa...,Do you have any plans to work outside Morocco ...,"If you are working abroad, do you have any pla...",...,What is the impact of COVID-19 Pandemic in rem...,Which of the following sentence describes your...,Are you part of any local developer’s community?,Did you ever participate in an open-source pro...,Did you have the chance to write blog posts?,How many IT events did you attend in 2019/2020?,How many talks did you participate in as a spe...,Do you prefer ?,Are you part of one of the Moroccan Facebook D...,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,,True,True,True,,...,,,True,True,True,True,True,True,True,True
multiple,False,,,,,,,,,,...,,,False,True,,,,,,
choices,"[Male, Female]","[Younger than 15 years, 15 to 19 years, 20 to ...","[Morocco, Europe, US, Others]","[Developer, back-end, Developer, full-stack, D...","[Self-taught, Bac +2, Bac +3, Bac +5, Bac +8]","[Less than 1 year, 1-3 years, 3-5 years, 5-10 ...","[Yes, No]","[Not enough, Enough to start, Everything I nee...","[Yes, No]","[Yes, No]",...,"[We were doing remote before, The company star...",[The current Pandemic impacted very negatively...,"[Yes, NO]","[Yes, few PRs in multiple projects, I am maint...","[Yes more than 10 blog posts, Yes less than 1...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Yes, NO, I don’t know Facebook Developer circ...","[Bad, Not Bad, Good, Excellent]"


### Translate the survey codes and abbreviations

In [34]:
SOD_qst_dict = {"profile-q-1":"Age",
                "profile-q-0":"Gender",
                "profile-q-2":"Country",
                "profile-q-4":"Education",
                "profile-q-3":"Job_Title",
                "profile-q-5":"Coding_Experience",
                "profile-q-6":"Coding_as_Hobby",
                "profile-q-7":"Teaching_Jobs_Gap",
                "profile-q-8":"Immigration_Plans",
                "profile-q-9":"Reentery_Plans",
                "profile-q-10":"Teaching_Problems_POV",
                "profile-q-11":"Read_Write_Languages",
                "profile-q-12":"English_Barrier",
                "profile-q-13":"Local_Dialect_Content",
                "profile-q-14":"Learning_Platforms",
                "profile-q-15":"Fav_Coding_Drink",
                "tech-q-0":"Fav_Programming_Languages",
                "tech-q-1":"Wanted_Programming_Languages",
                "tech-q-2":"Daily_Programming_Languages",
                "tech-q-3":"Daily_Web_Frameworks",
                "tech-q-4":"Wanted_Web_Frameworks",
                "tech-q-5":"Daily_Paltforms",
                "tech-q-6":"Wanted_Platforms",
                "tech-q-7":"Primary_OS",
                "tech-q-8":"Daily_IDE",
                "tech-q-9":"Continuous_Learning_Frequency",
                "tech-q-10":"Help_Sources",
                "tech-q-11":"Professional_Prod_Env",
                "work-q-0":"Employment_Status",
                "work-q-1":"Overtime_Frequency",
                "work-q-2":"Job_Satisfaction",
                "work-q-3":"Side_Projects",
                "work-q-4":"Graduation_Employment_Gap",
                "work-q-5":"Salary_Range_DH",
                "work-q-6":"Preferred_Job_Criteria",
                "work-q-7":"Preferred_Company_Type",
                "work-q-8":"Covid_Remote_Adoption",
                "work-q-9":"Covid_Productivity",
                "community-q-0":"Local_Community_Membership",
                "community-q-1":"Open_Source_Participation",
                "community-q-2":"Blog_Posting",
                "community-q-3":"Event_Attendence_19_20",
                "community-q-4":"Talks_Participation_19_20",
                "community-q-5":"Online_InPerson_Preferrence",
                "community-q-6":"Moroccan_DevC_Membership",
                "community-q-7":"Moroccan_Community_AutoEval",
                }

In [35]:
# Save the Survey question codes as a new row
temp_df = pd.DataFrame(SOD_Morocco_qst.columns, columns=["Survey_Code"],
             index=SOD_Morocco_qst.columns).transpose()
SOD_Morocco_qst = pd.concat([SOD_Morocco_qst, temp_df])
del temp_df
# Replace the titles of columns in the Questions DataFrame
SOD_Morocco_qst.columns = SOD_Morocco_qst.columns.to_series().map(SOD_qst_dict)
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Teaching_Jobs_Gap,Immigration_Plans,Reentery_Plans,...,Covid_Remote_Adoption,Covid_Productivity,Local_Community_Membership,Open_Source_Participation,Blog_Posting,Event_Attendence_19_20,Talks_Participation_19_20,Online_InPerson_Preferrence,Moroccan_DevC_Membership,Moroccan_Community_AutoEval
label,What is your gender?,What is your age?,Where are you currently located?,What is your current role?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you think school gives you the most importa...,Do you have any plans to work outside Morocco ...,"If you are working abroad, do you have any pla...",...,What is the impact of COVID-19 Pandemic in rem...,Which of the following sentence describes your...,Are you part of any local developer’s community?,Did you ever participate in an open-source pro...,Did you have the chance to write blog posts?,How many IT events did you attend in 2019/2020?,How many talks did you participate in as a spe...,Do you prefer ?,Are you part of one of the Moroccan Facebook D...,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,,True,True,True,,...,,,True,True,True,True,True,True,True,True
multiple,False,,,,,,,,,,...,,,False,True,,,,,,
choices,"[Male, Female]","[Younger than 15 years, 15 to 19 years, 20 to ...","[Morocco, Europe, US, Others]","[Developer, back-end, Developer, full-stack, D...","[Self-taught, Bac +2, Bac +3, Bac +5, Bac +8]","[Less than 1 year, 1-3 years, 3-5 years, 5-10 ...","[Yes, No]","[Not enough, Enough to start, Everything I nee...","[Yes, No]","[Yes, No]",...,"[We were doing remote before, The company star...",[The current Pandemic impacted very negatively...,"[Yes, NO]","[Yes, few PRs in multiple projects, I am maint...","[Yes more than 10 blog posts, Yes less than 1...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Yes, NO, I don’t know Facebook Developer circ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,work-q-8,work-q-9,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


In [36]:
# Replace the titles of columns in the Answers DataFrame
# ".map" makes an exhaustive repalcement by default
# Adding ".fillna" to keep the original column name if the mapping fails
SOD_Morocco.columns = SOD_Morocco.columns.to_series().map(SOD_qst_dict).fillna(SOD_Morocco.columns.to_series())
# Drop extra column
SOD_Morocco.drop(columns=["__collections__"], inplace=True)
SOD_Morocco.head()

Unnamed: 0,userId,Talks_Participation_19_20,lastSubmit,Wanted_Programming_Languages,Coding_as_Hobby,Teaching_Problems_POV,Fav_Coding_Drink,Professional_Prod_Env,Open_Source_Participation,Daily_Web_Frameworks,...,Continuous_Learning_Frequency,Event_Attendence_19_20,Immigration_Plans,Wanted_Platforms,Job_Satisfaction,Preferred_Job_Criteria,Employment_Status,Read_Write_Languages,English_Barrier,Age
0,01jKBLnYhGTy3xztjuhVIf2jass1,0.0,1603211692917,[0],0,[3],3,[5],[4],[7],...,0.0,0.0,0,"[0, 1]",2.0,"[0, 1, 3, 4, 7]",0.0,"[0, 1, 2, 3]",1,2
1,020TufnjwQTTT0C7jzE0BYz39j53,,1604010020407,,0,[1],2,,,,...,,,0,,,,,[0],0,2
2,03RLF3jf97R8Hb49KJCOAWoipiB2,0.0,1603210798458,"[6, 8, 11]",0,"[0, 1, 2]",2,"[1, 2, 4]",[4],[17],...,1.0,1.0,1,"[1, 5]",3.0,"[0, 1, 2, 3, 7]",0.0,"[0, 2, 3]",1,2
3,03iq2uFIwNOqgNw01FF2AViQr9B3,,1603449369739,,0,"[1, 2, 3]",1,,,,...,,,0,,,"[1, 2, 4, 7]",4.0,"[0, 2, 3]",1,2
4,03lbWrhd7nQ0FFjsaap7q062bQ13,1.0,1603892118793,"[12, 17]",0,"[1, 2, 3]",0,[5],[1],"[0, 17]",...,0.0,1.0,1,"[3, 7]",2.0,"[1, 7]",1.0,"[0, 2, 3]",1,2


### Translate the answers values

In [37]:
from math import nan
import sys
# Mapping the answer values to the original choices in the questions DataFrame
for col in SOD_Morocco_qst.columns:
    temp_dict = dict(
        zip(
            range(0, len(SOD_Morocco_qst[col]["choices"])), SOD_Morocco_qst.loc["choices", col]
        )
    )
    # Seperate single choice from multi choice answers
    if SOD_Morocco[col].dtype != object:
        SOD_Morocco[col] = SOD_Morocco[col].map(temp_dict)
    else:
        temp_list= []
        for i in SOD_Morocco.index:
            # Some Nan values exist, we check to skip them
            if type(SOD_Morocco[col][i]) is list:
                temp_list.append([temp_dict[x] for x in SOD_Morocco.loc[i, col]])
            else:
                temp_list.append(nan)
        SOD_Morocco[col] = pd.Series(temp_list, dtype = object)
            

In [38]:
SOD_Morocco.head()

Unnamed: 0,userId,Talks_Participation_19_20,lastSubmit,Wanted_Programming_Languages,Coding_as_Hobby,Teaching_Problems_POV,Fav_Coding_Drink,Professional_Prod_Env,Open_Source_Participation,Daily_Web_Frameworks,...,Continuous_Learning_Frequency,Event_Attendence_19_20,Immigration_Plans,Wanted_Platforms,Job_Satisfaction,Preferred_Job_Criteria,Employment_Status,Read_Write_Languages,English_Barrier,Age
0,01jKBLnYhGTy3xztjuhVIf2jass1,0,1603211692917,[JavaScript],Yes,[Companies engagement with academia],,[Not Sure],[I don’t have an account on Github (or alterna...,[Spring Ecosystem],...,Every few months,0,Yes,"[Node.js, .NET]",Neither satisfied nor dissatisfied,"[Languages, frameworks, and other technologies...",Employed full-time,"[Arabic, Amazigh, French, English]",No,20 to 24 years
1,020TufnjwQTTT0C7jzE0BYz39j53,,1604010020407,,Yes,[Teachers],Water,,,,...,,,Yes,,,,,[Arabic],Yes,20 to 24 years
2,03RLF3jf97R8Hb49KJCOAWoipiB2,0,1603210798458,"[C#, TypeScript, Go]",Yes,"[Students, Teachers, Academic defined subjects...",Water,"[on-premise, Hybrid Cloud, Managed PaaS (herok...",[I don’t have an account on Github (or alterna...,[Other],...,Once a year,1-3,No,"[.NET, Unity 3D]",Slightly satisfied,"[Languages, frameworks, and other technologies...",Employed full-time,"[Arabic, French, English]",No,20 to 24 years
3,03iq2uFIwNOqgNw01FF2AViQr9B3,,1603449369739,,Yes,"[Teachers, Academic defined subjects (system),...",Coffee,,,,...,,,Yes,,,"[Office environment or company culture, Flex t...",Student,"[Arabic, French, English]",No,20 to 24 years
4,03lbWrhd7nQ0FFjsaap7q062bQ13,1 - 3 talks,1603892118793,"[Kotlin, R]",Yes,"[Teachers, Academic defined subjects (system),...",Tea,[Not Sure],[I am maintaining my own project],"[jQuery, Other]",...,Every few months,1-3,No,"[TensorFlow, Flutter]",Neither satisfied nor dissatisfied,"[Office environment or company culture, Salary]","Freelancer, or self-employed","[Arabic, French, English]",No,20 to 24 years


In [39]:
SOD_Morocco_qst.head()

Unnamed: 0,Gender,Age,Country,Job_Title,Education,Coding_Experience,Coding_as_Hobby,Teaching_Jobs_Gap,Immigration_Plans,Reentery_Plans,...,Covid_Remote_Adoption,Covid_Productivity,Local_Community_Membership,Open_Source_Participation,Blog_Posting,Event_Attendence_19_20,Talks_Participation_19_20,Online_InPerson_Preferrence,Moroccan_DevC_Membership,Moroccan_Community_AutoEval
label,What is your gender?,What is your age?,Where are you currently located?,What is your current role?,What is your highest diploma?,How many years have you been coding profession...,Is coding a hobby for you?,Do you think school gives you the most importa...,Do you have any plans to work outside Morocco ...,"If you are working abroad, do you have any pla...",...,What is the impact of COVID-19 Pandemic in rem...,Which of the following sentence describes your...,Are you part of any local developer’s community?,Did you ever participate in an open-source pro...,Did you have the chance to write blog posts?,How many IT events did you attend in 2019/2020?,How many talks did you participate in as a spe...,Do you prefer ?,Are you part of one of the Moroccan Facebook D...,How do you evaluate the Moroccan Tech Community?
required,True,True,True,True,True,,True,True,True,,...,,,True,True,True,True,True,True,True,True
multiple,False,,,,,,,,,,...,,,False,True,,,,,,
choices,"[Male, Female]","[Younger than 15 years, 15 to 19 years, 20 to ...","[Morocco, Europe, US, Others]","[Developer, back-end, Developer, full-stack, D...","[Self-taught, Bac +2, Bac +3, Bac +5, Bac +8]","[Less than 1 year, 1-3 years, 3-5 years, 5-10 ...","[Yes, No]","[Not enough, Enough to start, Everything I nee...","[Yes, No]","[Yes, No]",...,"[We were doing remote before, The company star...",[The current Pandemic impacted very negatively...,"[Yes, NO]","[Yes, few PRs in multiple projects, I am maint...","[Yes more than 10 blog posts, Yes less than 1...","[0, 1-3, More than 3]","[0, 1 - 3 talks, More than 3]","[Live/online events, In-person events?, Li ja ...","[Yes, NO, I don’t know Facebook Developer circ...","[Bad, Not Bad, Good, Excellent]"
Survey_Code,profile-q-0,profile-q-1,profile-q-2,profile-q-3,profile-q-4,profile-q-5,profile-q-6,profile-q-7,profile-q-8,profile-q-9,...,work-q-8,work-q-9,community-q-0,community-q-1,community-q-2,community-q-3,community-q-4,community-q-5,community-q-6,community-q-7


### Save to CSV

In [40]:
SOD_Morocco.to_csv("results_preprocessed.csv", index = False)
# Choosing to keep the Index for the Answers DataFrame
SOD_Morocco_qst.to_csv("questions_preprocessed.csv", index_label ="Keys" )