# I. Project Team Members

| Prepared by | Email | Prepared for |
| :-: | :-: | :-: |
| **Hardefa Rogonondo** | hardefarogonondo@gmail.com | **Tech Talent Recommendation Engine** |

# II. Notebook Target Definition

This notebook outlines the data preparation steps for the Tech Talent Recommendation Engine Project. We begin by ingesting the comprehensive dataset derived from the Stack Overflow Annual Developer Survey. This dataset encompasses a vast array of attributes related to tech professionals, highlighting their technical skills, workplace preferences, interactions with various technologies, and more. As we progress through this notebook, we systematically remove redundant and unused columns, and meticulously clean data entries that manifest significant null values in our primary columns. Complemented by comprehensive quality checks, these measures ensure the dataset's optimal state for subsequent recommendation tasks. The final product of this notebook is a streamlined, well-structured dataset, ideally positioned to facilitate precise talent recommendations using the curated data.

# III. Notebook Setup

## III.A. Import Libraries

In [1]:
import pandas as pd
import pickle

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## III.B. Import Data

In [2]:
df = pd.read_csv('../../data/raw/survey_results_public.csv')
df.head()

Unnamed: 0,ResponseId,Q120,MainBranch,Age,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,OrgSize,PurchaseInfluence,TechList,BuyNewTool,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,SOAI,AISelect,AISent,AIAcc,AIBen,AIToolInterested in Using,AIToolCurrently Using,AIToolNot interested in Using,AINextVery different,AINextNeither different nor similar,AINextSomewhat similar,AINextVery similar,AINextSomewhat different,TBranch,ICorPM,WorkExp,Knowledge_1,Knowledge_2,Knowledge_3,Knowledge_4,Knowledge_5,Knowledge_6,Knowledge_7,Knowledge_8,Frequency_1,Frequency_2,Frequency_3,TimeSearching,TimeAnswering,ProfessionalTech,Industry,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,I agree,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,I agree,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,Hobby;Contribute to open-source projects;Boots...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Friend or fam...,Formal documentation provided by the owner of ...,Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,I have a great deal of influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per month or weekly,"Yes, definitely","I don't think it's super necessary, but I thin...",Yes,Indifferent,Other (please explain),Somewhat distrust,Learning about a codebase;Writing code;Debuggi...,Writing code;Committing and reviewing code,,,,,,,Yes,People manager,10.0,Strongly agree,Agree,Strongly agree,Agree,Agree,Agree,Agree,Strongly agree,1-2 times a week,10+ times a week,Never,15-30 minutes a day,15-30 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,285000.0
2,3,I agree,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Professional development or self-paced l...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Formal documentation provided by the owner of ...,,27.0,23.0,"Developer, back-end","5,000 to 9,999 employees",I have some influence,Given a list,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,250000.0,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,Markdown File;Stack Overflow for Teams,Markdown File,Microsoft Teams;Slack;Zoom,Slack;Zoom,,,,,Stack Overflow;Stack Exchange;Stack Overflow f...,A few times per month or weekly,Yes,Less than once per month or monthly,Neutral,,"No, and I don't plan to",,,,,,,,,,,,Yes,Individual contributor,23.0,Strongly agree,Neither agree nor disagree,Agree,Agree,Agree,Agree,Agree,Agree,6-10 times a week,6-10 times a week,3-5 times a week,30-60 minutes a day,30-60 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,250000.0
3,4,I agree,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Colleague;Friend or family member;Other online...,Formal documentation provided by the owner of ...,,12.0,7.0,"Developer, front-end",100 to 499 employees,I have some influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,156000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Jira,Jira,Discord;Google Meet;Microsoft Teams;Slack;Zoom,Discord;Google Meet;Slack;Zoom,,,,,Stack Overflow;Stack Exchange,A few times per week,Yes,Less than once per month or monthly,"No, not really",I'm wearing of Stack Overflow using AI.,"No, and I don't plan to",,,,,,,,,,,,Yes,Individual contributor,7.0,Strongly agree,Strongly disagree,Strongly agree,Strongly agree,Agree,Neither agree nor disagree,Agree,Agree,1-2 times a week,10+ times a week,1-2 times a week,15-30 minutes a day,30-60 minutes a day,Automated testing;Continuous integration (CI) ...,,Appropriate in length,Easy,156000.0
4,5,I agree,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,Hobby;Contribute to open-source projects;Profe...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Online Courses or Certi...,Formal documentation provided by the owner of ...,Other;Codecademy;edX,6.0,4.0,"Developer, full-stack",20 to 99 employees,I have some influence,Investigate,Start a free trial;Ask developers I know/work ...,Philippines,PHP\tPhilippine peso,1320000.0,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,Confluence;Jira;Notion,Confluence;Jira;Notion,Discord;Google Meet;Slack;Zoom,Discord;Google Meet;Slack;Zoom,ChatGPT,ChatGPT,,,Stack Overflow;Stack Exchange,A few times per week,No,,Neutral,Using AI to suggest better answer to my questi...,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,Project planning;Testing code;Committing and r...,Learning about a codebase;Writing code;Documen...,,,,,,,Yes,Individual contributor,6.0,Agree,Strongly agree,Agree,Agree,Neither agree nor disagree,Agree,Strongly agree,Agree,1-2 times a week,1-2 times a week,3-5 times a week,60-120 minutes a day,30-60 minutes a day,Microservices;Automated testing;Observability ...,Other,Appropriate in length,Neither easy nor difficult,23456.0


# IV. Data Preparation

## IV.A. Data Shape Inspection

In [3]:
df.shape

(89184, 84)

## IV.B. Data Information Inspection

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89184 entries, 0 to 89183
Data columns (total 84 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   ResponseId                           89184 non-null  int64  
 1   Q120                                 89184 non-null  object 
 2   MainBranch                           89184 non-null  object 
 3   Age                                  89184 non-null  object 
 4   Employment                           87898 non-null  object 
 5   RemoteWork                           73810 non-null  object 
 6   CodingActivities                     73764 non-null  object 
 7   EdLevel                              87973 non-null  object 
 8   LearnCode                            87663 non-null  object 
 9   LearnCodeOnline                      70084 non-null  object 
 10  LearnCodeCoursesCert                 37076 non-null  object 
 11  YearsCode                   

## IV.C. Data Definition

| Variables | Columns Definition |
| :-: | :-: |
| ResponseId | Unique identifier for the respondent. |
| Q120 | Respondent's consent agreement for their data to be made public. |
| MainBranch | Professional status of the respondent. |
| Age | Age range of the respondent. |
| Employment | Employment status of the respondent. |
| RemoteWork | Remote work preference/status. |
| CodingActivities | Types of coding activities participated in. |
| EdLevel | Highest educational level attained. |
| LearnCode | Resources used to learn coding. |
| LearnCodeOnline | Online platforms/resources for learning to code. |
| LearnCodeCoursesCert | Formal documentation or certificates for coding courses. |
| YearsCode | Number of years the respondent has been coding. |
| YearsCodePro | Number of years the respondent has been coding professionally. |
| DevType | Types of developer roles respondent identifies with. |
| OrgSize | Size of the organization where respondent works. |
| PurchaseInfluence | Degree of influence in purchasing decisions at work. |
| TechList | Typical approach when considering new technology purchases at their organization. |
| BuyNewTool | Methods or considerations for acquiring new tools. |
| Country | Respondent's country of residence. |
| Currency | Currency used in the respondent's country. |
| CompTotal | Total compensation of the respondent. |
| LanguageHaveWorkedWith | Programming languages the respondent has worked with. |
| LanguageWantToWorkWith | Programming languages the respondent wishes to work with. |
| DatabaseHaveWorkedWith | Databases the respondent has worked with. |
| DatabaseWantToWorkWith | Databases the respondent wishes to work with. |
| PlatformHaveWorkedWith | Platforms the respondent has worked with. |
| PlatformWantToWorkWith | Platforms the respondent wishes to work with. |
| WebframeHaveWorkedWith | Web frameworks the respondent has worked with. |
| WebframeWantToWorkWith | Web frameworks the respondent wishes to work with. |
| MiscTechHaveWorkedWith | Miscellaneous technologies the respondent has worked with. |
| MiscTechWantToWorkWith | Miscellaneous technologies the respondent wishes to work with. |
| ToolsTechHaveWorkedWith | Tools/technologies the respondent has worked with. |
| ToolsTechWantToWorkWith | Tools/technologies the respondent wishes to work with. |
| NEWCollabToolsHaveWorkedWith | Development environment the respondent has worked with. |
| NEWCollabToolsWantToWorkWith | Development environment the respondent wishes to work with. |
| OpSysPersonal use | Operating system used personally by the respondent. |
| OpSysProfessional use | Operating system used professionally by the respondent. |
| OfficeStackAsyncHaveWorkedWith | Collaboration tools the respondent has worked with. |
| OfficeStackAsyncWantToWorkWith | Collaboration tools the respondent wishes to work with. |
| OfficeStackSyncHaveWorkedWith | Communication tools the respondent has worked with. |
| OfficeStackSyncWantToWorkWith | Communication tools the respondent wishes to work with. |
| AISearchHaveWorkedWith | AI search technologies respondent has worked with. |
| AISearchWantToWorkWith | AI search technologies respondent wishes to work with. |
| AIDevHaveWorkedWith | AI development tools/technologies respondent has worked with. |
| AIDevWantToWorkWith | AI development tools/technologies respondent wishes to work with. |
| NEWSOSites | StackOverflow sites/topics respondent has engaged with. |
| SOVisitFreq | Frequency of visiting StackOverflow. |
| SOAccount | Whether the respondent has a StackOverflow account. |
| SOPartFreq | Participation frequency on StackOverflow. |
| SOComm | Whether the respondent feels a sense of belonging to the Stack Overflow community. |
| SOAI | Respondent's opinion on Stack Overflow using AI tools. |
| AISelect | The respondent's engagement level with AI tools. |
| AISent | The respondent's stance on using AI tools. |
| AIAcc | Benefits the respondent hopes to achieve by using AI tools. |
| AIBen | Level of trust the respondent has in the accuracy of the output from AI tools used. |
| AIToolInterested in Using | AI tools respondent is interested in using. |
| AIToolCurrently Using | AI tools respondent is currently using. |
| AIToolNot interested in Using | AI tools respondent is not interested in using. |
| AINextVery different | Perception of AI's future being very different to now. |
| AINextNeither different nor similar | Neutral perception of AI's future. |
| AINextSomewhat similar | Perception of AI's future being somewhat similar to now. |
| AINextVery similar | Perception of AI's future being very similar to now. |
| AINextSomewhat different | Perception of AI's future being somewhat different. |
| TBranch | Whether the respondent is participating in the Professional Developer Series. |
| ICorPM | Individual Contributor or People Manager status. |
| WorkExp | Number of years the respondent has been working. |
| Knowledge_1 | Respondent's frequency of interactions with people outside of their immediate team. |
| Knowledge_2 | Degree to which the respondent feels that knowledge silos within the organization hinder idea dissemination. |
| Knowledge_3 | Respondent's ability to find up-to-date information within the organization to aid in their job. |
| Knowledge_4 | Efficiency with which the respondent can find answers using existing tools and resources. |
| Knowledge_5 | Respondent's certainty about which system or resource to use for finding specific information or answers. |
| Knowledge_6 | Frequency with which the respondent finds themselves answering repetitive questions. |
| Knowledge_7 | Impact of waiting for answers on the respondent's workflow and productivity. |
| Knowledge_8 | Respondent's confidence in having the necessary tools and resources to understand and work on any part of the company's code/system/platform. |
| Frequency_1 | Frequency with which the respondent requires help from individuals outside of their immediate team. |
| Frequency_2 | Frequency of the respondent's interactions with people outside of their immediate team. |
| Frequency_3 | Frequency at which the respondent encounters knowledge silos within the organization. |
| TimeSearching | Time spent searching for information/solutions. |
| TimeAnswering | Time spent answering queries. |
| ProfessionalTech | Features or initiatives present within the respondent's company. |
| Industry | Industry respondent works in. |
| SurveyLength | Perception of the survey length. |
| SurveyEase | How easy the respondent found the survey. |
| ConvertedCompYearly | Converted yearly compensation. |

## IV.D. Redundant and Unused Column Removal

In [5]:
def unused_column_removal(df, cols_to_remove):
    df.drop(columns=cols_to_remove, inplace=True)
    return df

In [6]:
cols_to_remove = ["Q120", "CodingActivities", "LearnCode", "LearnCodeOnline", "LearnCodeCoursesCert", "PurchaseInfluence", "TechList", "BuyNewTool", "NEWSOSites", "SOVisitFreq", "SOAccount",
                  "SOPartFreq", "SOComm", "SOAI", "AISelect", "AISent", "AIAcc", "AIBen", "AIToolInterested in Using", "AIToolCurrently Using", "AIToolNot interested in Using",
                  "AINextVery different", "AINextNeither different nor similar", "AINextSomewhat similar", "AINextVery similar", "AINextSomewhat different", "TBranch", "ICorPM", "WorkExp",
                  "Knowledge_1", "Knowledge_2", "Knowledge_3", "Knowledge_4", "Knowledge_5", "Knowledge_6", "Knowledge_7", "Knowledge_8", "Frequency_1", "Frequency_2", "Frequency_3",
                  "TimeSearching", "TimeAnswering", "ProfessionalTech", "Industry", "SurveyLength", "SurveyEase", "ConvertedCompYearly"]

In [7]:
unused_column_removal(df, cols_to_remove)
df.shape

(89184, 37)

In [8]:
df.head()

Unnamed: 0,ResponseId,MainBranch,Age,Employment,RemoteWork,EdLevel,YearsCode,YearsCodePro,DevType,OrgSize,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith
0,1,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot
2,3,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",27.0,23.0,"Developer, back-end","5,000 to 9,999 employees",United States of America,USD\tUnited States dollar,250000.0,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,Markdown File;Stack Overflow for Teams,Markdown File,Microsoft Teams;Slack;Zoom,Slack;Zoom,,,,
3,4,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",12.0,7.0,"Developer, front-end",100 to 499 employees,United States of America,USD\tUnited States dollar,156000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Jira,Jira,Discord;Google Meet;Microsoft Teams;Slack;Zoom,Discord;Google Meet;Slack;Zoom,,,,
4,5,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",6.0,4.0,"Developer, full-stack",20 to 99 employees,Philippines,PHP\tPhilippine peso,1320000.0,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,Confluence;Jira;Notion,Confluence;Jira;Notion,Discord;Google Meet;Slack;Zoom,Discord;Google Meet;Slack;Zoom,ChatGPT,ChatGPT,,


## IV.E. Data Cleaning

In [9]:
cols_to_check = df.columns.difference(["ResponseId", "MainBranch", "Age"])
df = df.dropna(subset=cols_to_check, how='all')
df.shape

(87973, 37)

In [10]:
df.head()

Unnamed: 0,ResponseId,MainBranch,Age,Employment,RemoteWork,EdLevel,YearsCode,YearsCodePro,DevType,OrgSize,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith
1,2,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",18,9,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot
2,3,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",27,23,"Developer, back-end","5,000 to 9,999 employees",United States of America,USD\tUnited States dollar,250000.0,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,Markdown File;Stack Overflow for Teams,Markdown File,Microsoft Teams;Slack;Zoom,Slack;Zoom,,,,
3,4,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",12,7,"Developer, front-end",100 to 499 employees,United States of America,USD\tUnited States dollar,156000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Jira,Jira,Discord;Google Meet;Microsoft Teams;Slack;Zoom,Discord;Google Meet;Slack;Zoom,,,,
4,5,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",6,4,"Developer, full-stack",20 to 99 employees,Philippines,PHP\tPhilippine peso,1320000.0,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,Confluence;Jira;Notion,Confluence;Jira;Notion,Discord;Google Meet;Slack;Zoom,Discord;Google Meet;Slack;Zoom,ChatGPT,ChatGPT,,
5,6,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Some college/university study without earning ...,21,21,"Developer, back-end",100 to 499 employees,United Kingdom of Great Britain and Northern I...,GBP\tPound sterling,78000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,Go;Rust,BigQuery;Cloud Firestore;PostgreSQL;Redis,,Amazon Web Services (AWS);Cloudflare;Google Cloud,,Angular;Express;NestJS;Node.js,,,,Docker;Homebrew;Kubernetes;npm;pnpm;Terraform,Cargo,Helix;Neovim,,Other (Please Specify):,MacOS,Jira;Markdown File;Notion;Stack Overflow for T...,,Google Meet;Microsoft Teams;Slack;Zoom,,ChatGPT;Google Bard AI;Neeva AI,,GitHub Copilot;Tabnine,


## IV.F. Export Data

In [11]:
df.to_pickle('../../data/processed/df.pkl')