# Introduction
This notebook loads and inspects the Stack Overflow Developer Survey public dataset.
- Load survey data and schema
- Inspect structure and basic metadata to plan downstream analysis

In [46]:
# Import pandas and set display options for notebook output
import pandas as pd

# Make wide tables easier to read in the notebook output
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 50)

In [47]:
# Read the main survey CSV and the schema file.
# Note: pandas may emit a DtypeWarning for mixed-type columns; this is expected for large survey exports.
stackOverflow_survey = pd.read_csv('data/survey_results_public.csv')
# The schema file maps columns to questions that were asked
stackOverflow_survey_schema = pd.read_csv('data/survey_results_schema.csv')

  stackOverflow_survey = pd.read_csv('data/survey_results_public.csv')


## Preview: survey dataframe
The cell below displays the first few columns and rows of the loaded survey DataFrame. Use this to confirm the data loaded as expected.

In [48]:
# Display the DataFrame 
stackOverflow_survey

Unnamed: 0,ResponseId,MainBranch,Age,EdLevel,Employment,EmploymentAddl,WorkExp,LearnCodeChoose,LearnCode,LearnCodeAI,AILearnHow,YearsCode,DevType,OrgSize,ICorPM,RemoteWork,PurchaseInfluence,TechEndorseIntro,TechEndorse_1,TechEndorse_2,TechEndorse_3,TechEndorse_4,TechEndorse_5,TechEndorse_6,TechEndorse_7,...,AIAgentChange,AIAgent_Uses,AgentUsesGeneral,AIAgentImpactSomewhat agree,AIAgentImpactNeutral,AIAgentImpactSomewhat disagree,AIAgentImpactStrongly agree,AIAgentImpactStrongly disagree,AIAgentChallengesNeutral,AIAgentChallengesSomewhat disagree,AIAgentChallengesStrongly agree,AIAgentChallengesSomewhat agree,AIAgentChallengesStrongly disagree,AIAgentKnowledge,AIAgentKnowWrite,AIAgentOrchestration,AIAgentOrchWrite,AIAgentObserveSecure,AIAgentObsWrite,AIAgentExternal,AIAgentExtWrite,AIHuman,AIOpen,ConvertedCompYearly,JobSat
0,1,I am a developer by profession,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed,"Caring for dependents (children, elderly, etc.)",8.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",AI CodeGen tools or AI-enabled apps,14.0,"Developer, mobile",20 to 99 employees,People manager,Remote,"Yes, I influenced the purchase of a substantia...",Work,10.0,7.0,9.0,6.0,3.0,11.0,12.0,...,Not at all or minimally,Software engineering,,AI agents have increased my productivity.;AI a...,AI agents have helped me automate repetitive t...,,,,I am concerned about the accuracy of the infor...,Integrating AI agents with my existing tools a...,The cost of using certain AI agent platforms i...,,,,,Vertex AI,,,,ChatGPT,,When I don’t trust AI’s answers,"Troubleshooting, profiling, debugging",61256.0,10.0
1,2,I am a developer by profession,25-34 years old,"Associate degree (A.A., A.S., etc.)",Employed,,2.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",AI CodeGen tools or AI-enabled apps,10.0,"Developer, back-end",500 to 999 employees,Individual contributor,"Hybrid (some in-person, leans heavy to flexibi...",No,Personal Project,13.0,1.0,2.0,9.0,4.0,3.0,12.0,...,Not at all or minimally,,,,,,,,It takes significant time and effort to learn ...,,I am concerned about the accuracy of the infor...,Integrating AI agents with my existing tools a...,,,,,,,,,,When I don’t trust AI’s answers;When I want to...,All skills. AI is a flop.,104413.0,9.0
2,3,I am a developer by profession,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Independent contractor, freelancer, or self-em...",None of the above,10.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",AI CodeGen tools or AI-enabled apps;Technical ...,12.0,"Developer, front-end",,,,No,Work,12.0,2.0,3.0,7.0,5.0,10.0,13.0,...,"Yes, somewhat",Software engineering,Multi-platform search enablement,AI agents have increased my productivity.;AI a...,AI agents have improved the quality of my code...,AI agents have improved collaboration within m...,,,It takes significant time and effort to learn ...,My company's IT and/or InfoSec teams have stri...,,I am concerned about the accuracy of the infor...,,Redis,,,,,,ChatGPT;Claude Code;GitHub Copilot;Google Gemini,,When I don’t trust AI’s answers;When I want to...,"Understand how things actually work, problem s...",53061.0,8.0
3,4,I am a developer by profession,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed,None of the above,4.0,"Yes, I am not new to coding but am learning ne...","Other online resources (e.g. standard search, ...","Yes, I learned how to use AI-enabled tools for...",AI CodeGen tools or AI-enabled apps;Videos (no...,5.0,"Developer, back-end","10,000 or more employees",Individual contributor,Remote,No,Personal Project,2.0,12.0,6.0,5.0,13.0,3.0,8.0,...,Not at all or minimally,Software engineering,Language processing,AI agents have accelerated my learning about n...,AI agents have increased my productivity.;AI a...,AI agents have helped me automate repetitive t...,,,It takes significant time and effort to learn ...,,I am concerned about the accuracy of the infor...,Integrating AI agents with my existing tools a...,,,,,,,,ChatGPT;Claude Code,,When I don’t trust AI’s answers;When I want to...,,36197.0,6.0
4,5,I am a developer by profession,35-44 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc.)",21.0,"No, I am not new to coding and did not learn n...",,"Yes, I learned how to use AI-enabled tools for...",Technical documentation (is generated for/by t...,22.0,Engineering manager,,,,"Yes, I endorsed a tool that was open-source an...",Work,6.0,3.0,1.0,9.0,10.0,8.0,7.0,...,"Yes, to a great extent",,,,,,,,Integrating AI agents with my existing tools a...,,I am concerned about the accuracy of the infor...,It takes significant time and effort to learn ...,,,,,,,,,,When I don’t trust AI’s answers,"critical thinking, the skill to define the tas...",60000.0,7.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49186,49187,I code primarily as a hobby,18-24 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,
49187,49188,I am a developer by profession,45-54 years old,Some college/university study without earning ...,Employed,"Caring for dependents (children, elderly, etc.)",25.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools req...",,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,
49188,49189,I am a developer by profession,35-44 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed,None of the above,17.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",AI CodeGen tools or AI-enabled apps;Videos (no...,31.0,"Architect, software or solutions","10,000 or more employees",People manager,"Hybrid (some in-person, leans heavy to flexibi...","Yes, I endorsed a tool that was open-source an...",Personal Project,9.0,2.0,1.0,5.0,3.0,7.0,8.0,...,Not at all or minimally,,,,,,,AI agents have increased my productivity.,,,,,,,,,,,,,,,,,9.0
49189,49190,I am a developer by profession,25-34 years old,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)","Independent contractor, freelancer, or self-em...",None of the above,2.0,"Yes, I am not new to coding but am learning ne...",Videos (not associated with specific online co...,"Yes, I learned how to use AI-enabled tools for...",,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,


## Preview: Schema mapping
The schema maps question IDs (QID...) to the full question text. This helps when interpreting column names from the main survey file.

In [49]:
# Show the schema mapping to understand question IDs and labels
stackOverflow_survey_schema

Unnamed: 0,qid,qname,question,type,sub,sq_id
0,QID18,TechEndorse_1,What attracts you to a technology or causes yo...,RO,AI integration or AI Agent capabilities,1.0
1,QID18,TechEndorse_2,What attracts you to a technology or causes yo...,RO,Easy-to-use API,2.0
2,QID18,TechEndorse_3,What attracts you to a technology or causes yo...,RO,Robust and complete API,3.0
3,QID18,TechEndorse_4,What attracts you to a technology or causes yo...,RO,Customizable and manageable codebase,4.0
4,QID18,TechEndorse_5,What attracts you to a technology or causes yo...,RO,Reputation for quality,5.0
...,...,...,...,...,...,...
134,QID103,AIAgentObsWrite,Was the tool or tools for AI agent observabili...,TE,,
135,QID92,AIAgentExternal,You indicated you use or develop AI agents as ...,MC,,
136,QID104,AIAgentExtWrite,"Was the out-of-the-box agents, copilots or ass...",TE,,
137,QID100,AIHuman,"In the future, if AI can do most coding tasks,...",MC,,


In [50]:
# Print a concise summary of the DataFrame: dtypes, non-null counts, memory usage
# Useful for spotting columns with many nulls or mixed types
print(stackOverflow_survey.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49191 entries, 0 to 49190
Columns: 172 entries, ResponseId to JobSat
dtypes: float64(52), int64(1), object(119)
memory usage: 64.6+ MB
None


In [51]:
# Print schema DataFrame info to confirm the mapping structure
print(stackOverflow_survey_schema.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 139 entries, 0 to 138
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   qid       139 non-null    object 
 1   qname     139 non-null    object 
 2   question  139 non-null    object 
 3   type      139 non-null    object 
 4   sub       49 non-null     object 
 5   sq_id     49 non-null     float64
dtypes: float64(1), object(5)
memory usage: 6.6+ KB
None


In [52]:
# - They report the number of rows and columns for the survey and schema DataFrames.
# - Use `stackOverflow_survey.shape` programmatically to get (rows, columns).

print("Survey has ", stackOverflow_survey.shape[0], " rows and ", stackOverflow_survey.shape[1], " columns.")
print("Schema has ", stackOverflow_survey_schema.shape[0], " rows and ", stackOverflow_survey_schema.shape[1], " columns.")

Survey has  49191  rows and  172  columns.
Schema has  139  rows and  6  columns.


## Understanding Columns and Rows

In [None]:
# List all column names to inspect available fields (Index object)
# You can convert to a plain list with `list(stackOverflow_survey.columns)` if needed
stackOverflow_survey.columns

Index(['ResponseId', 'MainBranch', 'Age', 'EdLevel', 'Employment',
       'EmploymentAddl', 'WorkExp', 'LearnCodeChoose', 'LearnCode',
       'LearnCodeAI',
       ...
       'AIAgentOrchestration', 'AIAgentOrchWrite', 'AIAgentObserveSecure',
       'AIAgentObsWrite', 'AIAgentExternal', 'AIAgentExtWrite', 'AIHuman',
       'AIOpen', 'ConvertedCompYearly', 'JobSat'],
      dtype='object', length=172)

In [61]:
# Select multiple columns — returns a DataFrame containing the specified columns
print(stackOverflow_survey[['Age', 'EdLevel', 'Employment']])
# Confirm the object type is DataFrame
print("Type of the data is:", type(stackOverflow_survey[['Age', 'EdLevel', 'Employment']] ))

                   Age                                            EdLevel  \
0      25-34 years old    Master’s degree (M.A., M.S., M.Eng., MBA, etc.)   
1      25-34 years old                Associate degree (A.A., A.S., etc.)   
2      35-44 years old       Bachelor’s degree (B.A., B.S., B.Eng., etc.)   
3      35-44 years old       Bachelor’s degree (B.A., B.S., B.Eng., etc.)   
4      35-44 years old    Master’s degree (M.A., M.S., M.Eng., MBA, etc.)   
...                ...                                                ...   
49186  18-24 years old       Bachelor’s degree (B.A., B.S., B.Eng., etc.)   
49187  45-54 years old  Some college/university study without earning ...   
49188  35-44 years old    Master’s degree (M.A., M.S., M.Eng., MBA, etc.)   
49189  25-34 years old     Professional degree (JD, MD, Ph.D, Ed.D, etc.)   
49190  18-24 years old  Some college/university study without earning ...   

                                              Employment  
0               

In [None]:
# Selecting a single column returns a Series (one-dimensional)
print(stackOverflow_survey['Age'])
print("Type of the data is:", type(stackOverflow_survey['Age']))

0        25-34 years old
1        25-34 years old
2        35-44 years old
3        35-44 years old
4        35-44 years old
              ...       
49186    18-24 years old
49187    45-54 years old
49188    35-44 years old
49189    25-34 years old
49190    18-24 years old
Name: Age, Length: 49191, dtype: object
Type of the data is : <class 'pandas.core.series.Series'>


In [None]:
# Row selection using .iloc (integer-location based): returns the first row as a Series
stackOverflow_survey.iloc[0]

ResponseId                                                           1
MainBranch                              I am a developer by profession
Age                                                    25-34 years old
EdLevel                Master’s degree (M.A., M.S., M.Eng., MBA, etc.)
Employment                                                    Employed
                                            ...                       
AIAgentExtWrite                                                    NaN
AIHuman                                When I don’t trust AI’s answers
AIOpen                           Troubleshooting, profiling, debugging
ConvertedCompYearly                                            61256.0
JobSat                                                            10.0
Name: 0, Length: 172, dtype: object

In [None]:
print(stackOverflow_survey.iloc[[0,1,2,3,4]])  

   ResponseId                      MainBranch              Age  \
0           1  I am a developer by profession  25-34 years old   
1           2  I am a developer by profession  25-34 years old   
2           3  I am a developer by profession  35-44 years old   
3           4  I am a developer by profession  35-44 years old   
4           5  I am a developer by profession  35-44 years old   

                                           EdLevel  \
0  Master’s degree (M.A., M.S., M.Eng., MBA, etc.)   
1              Associate degree (A.A., A.S., etc.)   
2     Bachelor’s degree (B.A., B.S., B.Eng., etc.)   
3     Bachelor’s degree (B.A., B.S., B.Eng., etc.)   
4  Master’s degree (M.A., M.S., M.Eng., MBA, etc.)   

                                          Employment  \
0                                           Employed   
1                                           Employed   
2  Independent contractor, freelancer, or self-em...   
3                                           Employed  

In [None]:
# Label-based selection with .loc: rows specified by index labels and columns by name
stackOverflow_survey.loc[[0,1,4,8,9,10], ['Age', 'EdLevel', 'Employment']]

Unnamed: 0,Age,EdLevel,Employment
0,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed
1,25-34 years old,"Associate degree (A.A., A.S., etc.)",Employed
4,35-44 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em..."
8,25-34 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed
9,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed
10,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed


In [None]:
# Slice rows and columns by labels with .loc (endpoints are inclusive)
stackOverflow_survey.loc[0:6, 'Age':'WorkExp']

Unnamed: 0,Age,EdLevel,Employment,EmploymentAddl,WorkExp
0,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed,"Caring for dependents (children, elderly, etc.)",8.0
1,25-34 years old,"Associate degree (A.A., A.S., etc.)",Employed,,2.0
2,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Independent contractor, freelancer, or self-em...",None of the above,10.0
3,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed,None of the above,4.0
4,35-44 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc.)",21.0
5,45-54 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc....",15.0
6,25-34 years old,Some college/university study without earning ...,"Independent contractor, freelancer, or self-em...",None of the above,9.0


In [None]:
# Slice rows and columns by integer positions with .iloc (end index is exclusive)
stackOverflow_survey.iloc[0:8, 2:7]

Unnamed: 0,Age,EdLevel,Employment,EmploymentAddl,WorkExp
0,25-34 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed,"Caring for dependents (children, elderly, etc.)",8.0
1,25-34 years old,"Associate degree (A.A., A.S., etc.)",Employed,,2.0
2,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Independent contractor, freelancer, or self-em...",None of the above,10.0
3,35-44 years old,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Employed,None of the above,4.0
4,35-44 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc.)",21.0
5,45-54 years old,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc....",15.0
6,25-34 years old,Some college/university study without earning ...,"Independent contractor, freelancer, or self-em...",None of the above,9.0
7,35-44 years old,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)",Employed,Engaged in paid work (20-29 hours per week);Tr...,22.0


In [67]:
round(stackOverflow_survey['Age'].value_counts() * 100 / 49191, 1)

Age
25-34 years old      33.6
35-44 years old      26.9
18-24 years old      18.7
45-54 years old      12.8
55-64 years old       5.3
65 years or older     1.9
Prefer not to say     0.8
Name: count, dtype: float64