# EDA - Stack Overflow Developers Survey 2024


Brief Description of the Dataset:

The Stack Overflow Developer Survey 2024 collected responses from over 65,000 developers across 185 countries between May 19 and June 20, 2024. The survey covered various topics, including demographics, education, technology preferences, work environment, and perspectives on artificial intelligence. 

- *Analyze developer demographics and distribution.*
- *Identify trends in programming language and technology usage.*
- *Examine employment patterns and work preferences.*
- *Explore attitudes toward artificial intelligence.*

#### 1. Importing Libraries


For Data handling and analysis

In [6]:
import pandas as pd  # Data manipulation
import numpy as np  # Numerical operations


For data visualization

In [7]:
import matplotlib.pyplot as plt  # Basic plotting
import seaborn as sns  # Statistical visualizations
import plotly.express as px  # Interactive visualizations
import plotly.graph_objects as go  # More control over Plotly visuals


####  2. Data loading and Initial Inspection

In [15]:
file_path = r"D:\Work\code\EDA\survey_results_public.csv"
survey_data = pd.read_csv(file_path)
schema_file = pd.read_csv("survey_results_schema.csv")

In [17]:
survey_data


Unnamed: 0,ResponseId,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
0,1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,...,,,,,,,,,,
1,2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
2,3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,,,,,,,Appropriate in length,Easy,,
3,4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,...,,,,,,,Too long,Easy,,
4,5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,...,,,,,,,Too short,Easy,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65432,65433,I am a developer by profession,18-24 years old,"Employed, full-time",Remote,Apples,Hobby;School or academic work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","On the job training;School (i.e., University, ...",,...,,,,,,,,,,
65433,65434,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects,,,,...,,,,,,,,,,
65434,65435,I am a developer by profession,25-34 years old,"Employed, full-time",In-person,Apples,Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Stack Overflow;Social ...,...,,,,,,,,,,
65435,65436,I am a developer by profession,18-24 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Apples,Hobby;Contribute to open-source projects;Profe...,"Secondary school (e.g. American high school, G...",On the job training;Other online resources (e....,Technical documentation;Blogs;Written Tutorial...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,


In [36]:
print("\n".join(survey_data.columns))


ResponseId
MainBranch
Age
Employment
RemoteWork
Check
CodingActivities
EdLevel
LearnCode
LearnCodeOnline
TechDoc
YearsCode
YearsCodePro
DevType
OrgSize
PurchaseInfluence
BuyNewTool
BuildvsBuy
TechEndorse
Country
Currency
CompTotal
LanguageHaveWorkedWith
LanguageWantToWorkWith
LanguageAdmired
DatabaseHaveWorkedWith
DatabaseWantToWorkWith
DatabaseAdmired
PlatformHaveWorkedWith
PlatformWantToWorkWith
PlatformAdmired
WebframeHaveWorkedWith
WebframeWantToWorkWith
WebframeAdmired
EmbeddedHaveWorkedWith
EmbeddedWantToWorkWith
EmbeddedAdmired
MiscTechHaveWorkedWith
MiscTechWantToWorkWith
MiscTechAdmired
ToolsTechHaveWorkedWith
ToolsTechWantToWorkWith
ToolsTechAdmired
NEWCollabToolsHaveWorkedWith
NEWCollabToolsWantToWorkWith
NEWCollabToolsAdmired
OpSysPersonal use
OpSysProfessional use
OfficeStackAsyncHaveWorkedWith
OfficeStackAsyncWantToWorkWith
OfficeStackAsyncAdmired
OfficeStackSyncHaveWorkedWith
OfficeStackSyncWantToWorkWith
OfficeStackSyncAdmired
AISearchDevHaveWorkedWith
AISearchDevWantTo

In [21]:
# pd.read_csv(schema_file)
schema_file

Unnamed: 0,qid,qname,question,force_resp,type,selector
0,QID2,MainBranch,Which of the following options best describes ...,True,MC,SAVR
1,QID127,Age,What is your age?*,True,MC,SAVR
2,QID296,Employment,Which of the following best describes your cur...,True,MC,MAVR
3,QID308,RemoteWork,Which best describes your current work situation?,False,MC,SAVR
4,QID341,Check,Just checking to make sure you are paying atte...,True,MC,SAVR
...,...,...,...,...,...,...
82,QID337,JobSatPoints_7,"Learning and using new technology, including p...",,MC,MAVR
83,QID337,JobSatPoints_8,"Designing and building environments, databases...",,MC,MAVR
84,QID337,JobSatPoints_9,"Being a power user of a tool, developer langua...",,MC,MAVR
85,QID337,JobSatPoints_10,Working with new and/or top-quality hardware,,MC,MAVR


In [30]:
schema_subset = schema_file[['qname', 'question']]
schema_subset.head()


Unnamed: 0,qname,question
0,MainBranch,Which of the following options best describes ...
1,Age,What is your age?*
2,Employment,Which of the following best describes your cur...
3,RemoteWork,Which best describes your current work situation?
4,Check,Just checking to make sure you are paying atte...


#### 3. Data Cleaning and Preprocessing

- Missing Values: Identify and handle missing data.
- Data Types: Ensure each column is in the correct format.
- Outlier Detection: Identify outliers that may affect the analysis.

In [106]:
# Objective 1: Developer Demographics and Global Distribution
demographics_cols = ['ResponseId', 'MainBranch', 'Age', 'EdLevel']

# Objective 2: Trends in Programming Language and Technology Usage
tech_usage_cols = [
    'CodingActivities', 'LearnCode', 'LearnCodeOnline',
    'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
    'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith'
]

# Objective 3: Employment Patterns and Work Preferences
employment_cols = ['Employment', 'RemoteWork', 'WorkExp', 'ConvertedCompYearly', 'JobSat']

# Objective 4: Attitudes Toward Artificial Intelligence
ai_cols = ['AISelect', 'AISent', 'AIBen', 'AIAcc', 'AIComplex', 'AIChallenges', 'AIThreat', 'AIEthics']

# Combine all necessary columns (using set to avoid duplicates)
required_columns = list(set(demographics_cols + tech_usage_cols + employment_cols + ai_cols))
print("Selected columns for analysis:", required_columns)

# Create a new DataFrame with only these columns
survey_data_subset = survey_data[required_columns]

# Display the first few rows of the subset
survey_data_subset.head()


Selected columns for analysis: ['DatabaseWantToWorkWith', 'LanguageWantToWorkWith', 'ResponseId', 'AISelect', 'EdLevel', 'LearnCodeOnline', 'Age', 'AIEthics', 'LearnCode', 'AIThreat', 'MainBranch', 'AISent', 'LanguageHaveWorkedWith', 'AIChallenges', 'AIComplex', 'WorkExp', 'JobSat', 'DatabaseHaveWorkedWith', 'AIAcc', 'CodingActivities', 'AIBen', 'Employment', 'ConvertedCompYearly', 'RemoteWork']


Unnamed: 0,DatabaseWantToWorkWith,LanguageWantToWorkWith,ResponseId,AISelect,EdLevel,LearnCodeOnline,Age,AIEthics,LearnCode,AIThreat,...,AIComplex,WorkExp,JobSat,DatabaseHaveWorkedWith,AIAcc,CodingActivities,AIBen,Employment,ConvertedCompYearly,RemoteWork
0,,,1,Yes,Primary/elementary school,,Under 18 years old,,Books / Physical media,,...,,,,,,Hobby,Increase productivity,"Employed, full-time",,Remote
1,PostgreSQL,Bash/Shell (all shells);Go;HTML/CSS;Java;JavaS...,2,"No, and I don't plan to","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Technical documentation;Blogs;Books;Written Tu...,35-44 years old,,Books / Physical media;Colleague;On the job tr...,,...,,17.0,,Dynamodb;MongoDB;PostgreSQL,,Hobby;Contribute to open-source projects;Other...,,"Employed, full-time",,Remote
2,Firebase Realtime Database,C#,3,"No, and I don't plan to","Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Technical documentation;Blogs;Books;Written Tu...,45-54 years old,,Books / Physical media;Colleague;On the job tr...,,...,,,,Firebase Realtime Database,,Hobby;Contribute to open-source projects;Other...,,"Employed, full-time",,Remote
3,MongoDB;MySQL;PostgreSQL,HTML/CSS;Java;JavaScript;PowerShell;Python;SQL...,4,Yes,Some college/university study without earning ...,Stack Overflow;How-to videos;Interactive tutorial,18-24 years old,Circulating misinformation or disinformation;M...,"Other online resources (e.g., videos, blogs, f...",No,...,Bad at handling complex tasks,,,MongoDB;MySQL;PostgreSQL;SQLite,Somewhat trust,,Increase productivity;Greater efficiency;Impro...,"Student, full-time",,
4,PostgreSQL;SQLite,C++;HTML/CSS;JavaScript;Lua;Python,5,"No, and I don't plan to","Secondary school (e.g. American high school, G...",Technical documentation;Blogs;Written Tutorial...,18-24 years old,,"Other online resources (e.g., videos, blogs, f...",,...,,,,PostgreSQL;SQLite,,,,"Student, full-time",,


In [107]:
# Check missing values in the subset
missing_counts = survey_data_subset.isnull().sum()
print("Missing values per column:\n", missing_counts)

# Calculate percentage of missing values for each column
missing_percent = (survey_data_subset.isnull().mean() * 100).round(2)
print("Percentage of missing values:\n", missing_percent)


Missing values per column:
 DatabaseWantToWorkWith    22879
LanguageWantToWorkWith     9685
ResponseId                    0
AISelect                   4530
EdLevel                    4653
LearnCodeOnline           16200
Age                           0
AIEthics                  23889
LearnCode                  4949
AIThreat                  20748
MainBranch                    0
AISent                    19564
LanguageHaveWorkedWith     5692
AIChallenges              27906
AIComplex                 28416
WorkExp                   35779
JobSat                    36311
DatabaseHaveWorkedWith    15183
AIAcc                     28135
CodingActivities          10971
AIBen                     28543
Employment                    0
ConvertedCompYearly       42002
RemoteWork                10631
dtype: int64
Percentage of missing values:
 DatabaseWantToWorkWith    34.96
LanguageWantToWorkWith    14.80
ResponseId                 0.00
AISelect                   6.92
EdLevel                    7.11


In [108]:
# 1. Drop columns with too many missing values (>40%) or irrelevant ones
cols_to_drop = ['AIComplex', 'AIBen', 'AIChallenges', 'LearnCodeOnline']
survey_data_subset = survey_data_subset.drop(columns=cols_to_drop, errors='ignore')

# 2. Fill missing values for AI-related categorical columns
survey_data_subset['AIEthics'] = survey_data_subset['AIEthics'].fillna("No opinion")
survey_data_subset['AIThreat'] = survey_data_subset['AIThreat'].fillna("Not specified")
survey_data_subset['AISent'] = survey_data_subset['AISent'].fillna("Neutral")

# 3. Fill missing values for workplace & education categorical columns
categorical_cols = [
    'EdLevel', 'Employment', 'RemoteWork', 
    'LanguageHaveWorkedWith', 'LanguageWantToWorkWith', 
    'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith'
]

for col in categorical_cols:
    if col in survey_data_subset.columns:
        survey_data_subset[col] = survey_data_subset[col].astype('category')
        
        # Check if "Unknown" already exists before adding
        if "Unknown" not in survey_data_subset[col].cat.categories:
            survey_data_subset[col] = survey_data_subset[col].cat.add_categories(["Unknown"])
        
        survey_data_subset[col] = survey_data_subset[col].fillna("Unknown")

# 4. Fill missing values for numerical columns with median
numerical_cols = ['WorkExp', 'ConvertedCompYearly']
for col in numerical_cols:
    if col in survey_data_subset.columns:
        survey_data_subset[col] = survey_data_subset[col].fillna(survey_data_subset[col].median())

# 5. Convert Age column from category labels to numeric values
age_mapping = {
    "Under 18 years old": 17,
    "18-24 years old": 21,
    "25-34 years old": 29,
    "35-44 years old": 39,
    "45-54 years old": 49,
    "55-64 years old": 59,
    "65 years or older": 70,
    "Prefer not to say": None  # Or use np.nan
}

# Remap Age
survey_data_subset['Age'] = survey_data_subset['Age'].map(age_mapping)

# Ensure it's numeric
survey_data_subset['Age'] = pd.to_numeric(survey_data_subset['Age'], errors='coerce')

# 6. Remove Duplicates
survey_data_subset = survey_data_subset.drop_duplicates(subset=['ResponseId'], keep='first')

# 7. Check missing values after processing
missing_percent_after = (survey_data_subset.isnull().mean() * 100).round(2)
print("Percentage of missing values after cleaning:\n", missing_percent_after)


Percentage of missing values after cleaning:
 DatabaseWantToWorkWith     0.00
LanguageWantToWorkWith     0.00
ResponseId                 0.00
AISelect                   6.92
EdLevel                    0.00
Age                        0.49
AIEthics                   0.00
LearnCode                  7.56
AIThreat                   0.00
MainBranch                 0.00
AISent                     0.00
LanguageHaveWorkedWith     0.00
WorkExp                    0.00
JobSat                    55.49
DatabaseHaveWorkedWith     0.00
AIAcc                     43.00
CodingActivities          16.77
Employment                 0.00
ConvertedCompYearly        0.00
RemoteWork                 0.00
dtype: float64


In [110]:
# Drop 'AIAcc' and 'JobSat' due to high missing values
survey_data_subset = survey_data_subset.drop(columns=['AIAcc', 'JobSat'], errors='ignore')

# Fill missing values for remaining categorical columns
survey_data_subset['CodingActivities'] = survey_data_subset['CodingActivities'].fillna("Unknown")
survey_data_subset['LearnCode'] = survey_data_subset['LearnCode'].fillna("Unknown")

# Ensure 'None' is a valid category before filling missing values
for col in ['LanguageWantToWorkWith', 'DatabaseWantToWorkWith']:
    if col in survey_data_subset.columns:
        survey_data_subset[col] = survey_data_subset[col].astype('category')
        
        # Add "None" as a valid category if not already present
        if "None" not in survey_data_subset[col].cat.categories:
            survey_data_subset[col] = survey_data_subset[col].cat.add_categories(["None"])
        
        # Fill missing values
        survey_data_subset[col] = survey_data_subset[col].fillna("None")

# Final missing values check
missing_percent_after = (survey_data_subset.isnull().mean() * 100).round(2)
print("Percentage of missing values after final cleaning:\n", missing_percent_after)


Percentage of missing values after final cleaning:
 DatabaseWantToWorkWith    0.00
LanguageWantToWorkWith    0.00
ResponseId                0.00
AISelect                  6.92
EdLevel                   0.00
Age                       0.49
AIEthics                  0.00
LearnCode                 0.00
AIThreat                  0.00
MainBranch                0.00
AISent                    0.00
LanguageHaveWorkedWith    0.00
WorkExp                   0.00
DatabaseHaveWorkedWith    0.00
CodingActivities          0.00
Employment                0.00
ConvertedCompYearly       0.00
RemoteWork                0.00
dtype: float64


In [112]:
survey_data_subset.columns

Index(['DatabaseWantToWorkWith', 'LanguageWantToWorkWith', 'ResponseId',
       'AISelect', 'EdLevel', 'Age', 'AIEthics', 'LearnCode', 'AIThreat',
       'MainBranch', 'AISent', 'LanguageHaveWorkedWith', 'WorkExp',
       'DatabaseHaveWorkedWith', 'CodingActivities', 'Employment',
       'ConvertedCompYearly', 'RemoteWork'],
      dtype='object')

In [113]:
print(survey_data_subset.dtypes)


DatabaseWantToWorkWith    category
LanguageWantToWorkWith    category
ResponseId                   int64
AISelect                    object
EdLevel                   category
Age                        float64
AIEthics                    object
LearnCode                   object
AIThreat                    object
MainBranch                  object
AISent                      object
LanguageHaveWorkedWith    category
WorkExp                    float64
DatabaseHaveWorkedWith    category
CodingActivities            object
Employment                category
ConvertedCompYearly        float64
RemoteWork                category
dtype: object


#### 4. EDA (Exploratory Data Analysis)

In [114]:
# Summary statistics for numerical columns
print(survey_data_subset.describe())

# Summary statistics for categorical columns
print(survey_data_subset.describe(include="category"))


         ResponseId           Age       WorkExp  ConvertedCompYearly
count  65437.000000  65115.000000  65437.000000         6.543700e+04
mean   32719.000000     32.681210     10.118098         7.257636e+04
std    18890.179119     11.083389      6.293516         1.122207e+05
min        1.000000     17.000000      0.000000         1.000000e+00
25%    16360.000000     21.000000      9.000000         6.500000e+04
50%    32719.000000     29.000000      9.000000         6.500000e+04
75%    49078.000000     39.000000      9.000000         6.500000e+04
max    65437.000000     70.000000     50.000000         1.625660e+07
       DatabaseWantToWorkWith LanguageWantToWorkWith  \
count                   65437                  65437   
unique                   8479                  22770   
top                   Unknown                Unknown   
freq                    22879                   9685   

                                             EdLevel LanguageHaveWorkedWith  \
count              

### 4.1: Developer Demographics & Distribution

In [115]:
# 1. Age Distribution
fig = px.histogram(
    survey_data_subset, 
    x="Age", 
    nbins=20, 
    title="Age Distribution of Developers", 
    labels={"Age": "Age"},
    color_discrete_sequence=["#636EFA"]  # Custom color
)

fig.update_layout(
    title_font=dict(size=18, family="Arial", color="black"),
    xaxis_title="Age",
    yaxis_title="Count",
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False),
    plot_bgcolor="rgba(0,0,0,0)",
    paper_bgcolor="white"
)
fig.show()

# 2. Education Level Distribution
edlevel_counts = survey_data_subset["EdLevel"].value_counts().reset_index()
edlevel_counts.columns = ["EdLevel", "Count"]

fig = px.bar(
    edlevel_counts, 
    x="Count", 
    y="EdLevel", 
    orientation="h", 
    title="Education Level of Developers", 
    labels={"EdLevel": "Education Level", "Count": "Count"},
    color="Count",
    color_continuous_scale="viridis"
)

fig.update_layout(
    title_font=dict(size=18, family="Arial", color="black"),
    xaxis_title="Number of Developers",
    yaxis_title="",
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False),
    plot_bgcolor="rgba(0,0,0,0)",
    paper_bgcolor="white"
)
fig.show()

# 3. MainBranch Distribution
mainbranch_counts = survey_data_subset["MainBranch"].value_counts().reset_index()
mainbranch_counts.columns = ["MainBranch", "Count"]

fig = px.bar(
    mainbranch_counts, 
    x="Count", 
    y="MainBranch", 
    orientation="h", 
    title="Types of Developers (MainBranch)", 
    labels={"MainBranch": "Developer Type", "Count": "Count"},
    color="Count",
    color_continuous_scale="magma"
)

fig.update_layout(
    title_font=dict(size=18, family="Arial", color="black"),
    xaxis_title="Number of Developers",
    yaxis_title="",
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False),
    plot_bgcolor="rgba(0,0,0,0)",
    paper_bgcolor="white"
)
fig.show()


#### Insights 

##### 4.1.1 Age Distribution of Developers  
- **General Trend:** The most common age is **29**, with **21 and 30** also having high populations.  
- **Outliers:** Some developers are **as old as 70**, which is an extreme case.  
- **Skewness:** The data is **skewed towards younger developers**, with fewer entries for older age groups.  
- **Key Observations:** There's **not much data available for the 30-40 age range**, indicating a possible gap in responses or a shift in career paths.  
---

##### 4.1.2 Education Level of Developers  
- **Most Common Degree:** **Bachelor’s degree** is the most common, with **around 24k respondents** holding one.  
- **Higher Education vs. No Degree:** The majority have at least a **bachelor’s degree**, but alternative education paths exist.  
---

##### 4.1.3 Developer Type (MainBranch Distribution)  
- **Most Common Developer Type:** **Professional developers** dominate, with **50k+** respondents.  
- **Workforce Distribution:** The second-largest group is those who **write code as part of their work or studies**, at **6,511**—a massive disparity.  
- **Any Surprises:** The large gap between professional developers and other groups suggests **a strong industry demand for coding roles**.  

### 4.2: Trends in Programming Language and Technology Usage


In [126]:
# Function to count occurrences of multiple-choice answers
def count_occurrences(column):
    all_values = survey_data_subset[column].dropna().str.split(';').explode()
    return all_values.value_counts().reset_index().rename(columns={'index': column, 0: 'count'})  # Changed 'Count' to 'count'

# 1. Most Used Programming Languages
lang_used_counts = count_occurrences('LanguageHaveWorkedWith')
fig1 = px.bar(lang_used_counts, x='count', y='LanguageHaveWorkedWith', orientation='h',
              title='Most Used Programming Languages', labels={'LanguageHaveWorkedWith': 'Programming Language'})
fig1.show()

# 2. Most Desired Programming Languages
lang_want_counts = count_occurrences('LanguageWantToWorkWith')
fig2 = px.bar(lang_want_counts, x='count', y='LanguageWantToWorkWith', orientation='h',
              title='Most Desired Programming Languages', labels={'LanguageWantToWorkWith': 'Programming Language'})
fig2.show()

# 3. Most Used Databases
db_used_counts = count_occurrences('DatabaseHaveWorkedWith')
fig3 = px.bar(db_used_counts, x='count', y='DatabaseHaveWorkedWith', orientation='h',  # Changed 'Count' to 'count'
              title='Most Used Databases', labels={'DatabaseHaveWorkedWith': 'Database'})
fig3.show()

# 4. Most Desired Databases
db_want_counts = count_occurrences('DatabaseWantToWorkWith')
fig4 = px.bar(db_want_counts, x='count', y='DatabaseWantToWorkWith', orientation='h',  # Changed 'Count' to 'count'
              title='Most Desired Databases', labels={'DatabaseWantToWorkWith': 'Database'})
fig4.show()

#### Insights 

##### 4.2.1 Most Used Programming Languages  
- **General Trend:** The data shows that industry-leading languages such as **JavaScript, SQL, Java and C** dominate usage, indicating their widespread adoption across various development domains.  
- **Outliers:** Some niche or specialized languages appear in the lower ranks but with minimal overall adoption.  
- **Skewness:** The distribution is highly skewed toward a few dominant languages, suggesting that most developers rely on these core languages for their daily work.  
- **Key Observations:** JavaScript’s continued dominance underscores its role in web development, while Python's popularity highlights its versatility in data science, automation, and general scripting.

---

##### 4.2.2 Most Desired Programming Languages  
- **General Trend:** There is growing interest in modern languages such as **TypeScript and Rust**, alongside sustained desire for **Python**.  
- **Outliers:** A few specialized languages might attract significant interest from a smaller subset of developers, standing out as potential emerging trends.  
- **Skewness:** The desire is concentrated around languages that promise improved productivity, safety, and performance improvements over older alternatives.
---

##### 4.2.3 Most Used Databases  
- **General Trend:** Traditional relational databases such as **MySQL and PostgreSQL** remain prevalent, with NoSQL databases like **MongoDB** also featuring prominently.  
- **Outliers:** Some less conventional or newer database systems appear but in very limited numbers.  
- **Skewness:** Usage is highly concentrated on a few well-established database systems, reflecting industry reliability and standardization.  
- **Key Observations:** The reliance on relational databases highlights their robustness in handling structured data, while the adoption of NoSQL solutions points to an increasing need for flexibility and scalability in data management.

---

##### 4.2.4 Most Desired Databases  
- **General Trend:** Developers tend to prefer databases that offer both performance and scalability, with **PostgreSQL** frequently topping the list.  
- **Outliers:** A few niche or emerging database technologies are desired by a small subset of respondents, indicating potential shifts in technology preference.  
- **Skewness:** The desire for databases is focused on modern, cloud-friendly technologies that can handle large-scale data efficiently.  
- **Key Observations:** The popularity of PostgreSQL suggests it is viewed as a balanced choice offering advanced features and reliable performance, while rising interest in NoSQL databases reflects evolving requirements in handling unstructured or semi-structured data.



### 4.3: Employment Patterns & Work Preferences

In [133]:
# 4.3.1 Remote Work Distribution
remote_counts = survey_data_subset['RemoteWork'].value_counts().reset_index()
remote_counts.columns = ['RemoteWork', 'count']

fig_remote = px.bar(remote_counts, 
                    x='count', 
                    y='RemoteWork', 
                    orientation='h',
                    title='Remote Work Preferences',
                    labels={'RemoteWork': 'Remote Work Preference', 'count': 'Number of Respondents'},
                    color='count', 
                    color_continuous_scale='Greens')
fig_remote.update_layout(
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12,
    font=dict(size=10),
    yaxis=dict(categoryorder='total ascending')
)
fig_remote.show()

# 4.3.2 Work Experience Distribution
fig_workexp = px.histogram(survey_data_subset, 
                           x='WorkExp', 
                           nbins=20,
                           title='Work Experience Distribution',
                           labels={'WorkExp': 'Years of Work Experience', 'count': 'Number of Respondents'},
                           color_discrete_sequence=['#636EFA'])
fig_workexp.update_layout(
    xaxis_title="Years of Work Experience",
    yaxis_title="Number of Respondents",
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='white',
    title_font_size=14,
    xaxis=dict(title_font=dict(size=12)),
    yaxis=dict(title_font=dict(size=12)),
    font=dict(size=10)
)
fig_workexp.show()


#### Insights 

##### 4.3.1 Remote Work Distribution  
- **General Trend:** A significant proportion of developers prefer **[remote or hybrid work]**, indicating the growing acceptance of flexible work arrangements.  
- **Outliers:** There is a smaller segment of respondents who report **[in-person or on-site]** work exclusively.  
- **Skewness:** The distribution shows a clear bias towards **[remote/hybrid work]** over traditional on-site work.  
- **Key Observations:** This trend highlights the shift in work preferences, likely influenced by recent global events and the evolution of digital workplaces.

---

##### 4.3.2 Work Experience Distribution  
- **General Trend:** The histogram reveals that most developers have **[5-9]** of work experience, with a gradual tapering off at higher experience levels.  
- **Outliers:** There are some respondents with significantly higher work experience, acting as outliers, which may indicate veteran developers.  
- **Skewness:** The distribution appears to be **[right-skewed]** (choose based on your visualization), suggesting that a majority are relatively early in their careers.  
- **Key Observations:** The concentration in the lower experience range implies that the survey has a strong representation of early to mid-career professionals.

---

### 4.4 AI & Machine Learning Usage Analysis

In [142]:
# 4.4.1 AI Usage Among Developers
ai_usage_counts = survey_data_subset['AISelect'].value_counts().reset_index()
ai_usage_counts.columns = ['AI Usage', 'count']

fig_ai_usage = px.bar(ai_usage_counts, 
                      x='count', 
                      y='AI Usage', 
                      orientation='h',
                      title='AI Usage Among Developers',
                      labels={'AI Usage': 'AI Tool Usage', 'count': 'Number of Respondents'},
                      color='count', 
                      color_continuous_scale='Blues')

fig_ai_usage.update_layout(yaxis=dict(categoryorder='total ascending'))
fig_ai_usage.show()


# 4.4.2 Sentiment Toward AI
ai_sent_counts = survey_data_subset['AISent'].value_counts().reset_index()
ai_sent_counts.columns = ['AISent', 'count']

fig_ai_sent = px.bar(ai_sent_counts, 
                     x='count', 
                     y='AISent', 
                     orientation='h',
                     title='Sentiment Toward AI',
                     labels={'AISent': 'Sentiment', 'count': 'Number of Respondents'},
                     color='count', 
                     color_continuous_scale='Purples')

fig_ai_sent.update_layout(yaxis=dict(categoryorder='total ascending'))
fig_ai_sent.show()


# 4.4.3 AI as a Threat
ai_threat_counts = survey_data_subset['AIThreat'].value_counts().reset_index()
ai_threat_counts.columns = ['AIThreat', 'count']

fig_ai_threat = px.bar(ai_threat_counts, 
                       x='count', 
                       y='AIThreat', 
                       orientation='h',
                       title='Do Developers See AI as a Threat?',
                       labels={'AIThreat': 'Threat Perception', 'count': 'Number of Respondents'},
                       color='count', 
                       color_continuous_scale='Reds')

fig_ai_threat.update_layout(yaxis=dict(categoryorder='total ascending'))
fig_ai_threat.show()


#### Insights

##### 4.4.1 AI Usage Among Developers  
- **General Trend:** A significant number of developers use AI tools in their workflow, showing an increasing reliance on AI-powered solutions.  
- **Outliers:** Some responses indicate **little to no use of AI**, suggesting that traditional development practices are still prevalent for certain roles.  
- **Skewness:** The data is **skewed toward AI adoption**, indicating a strong industry trend favoring AI-driven development.  
- **Key Observations:** AI is becoming an essential tool in software development, and its adoption is likely to continue growing.

---

##### 4.4.2 Sentiment Toward AI  
- **General Trend:** The majority of developers have a **neutral to favorable** opinion about AI, reflecting a balanced perspective on its benefits and risks.  
- **Outliers:** Some developers hold **strongly unfavorable** views on AI, possibly due to concerns over job security or ethical considerations.  
- **Skewness:** The sentiment data appears **centered**, with a slight skew towards **positive views on AI**.  
- **Key Observations:** AI is widely accepted, but concerns remain among a small portion of developers.

---

##### 4.4.3 AI as a Threat  
- **General Trend:** A large portion of developers **do not see AI as an immediate threat**, though some remain uncertain about its long-term impact.  
- **Outliers:** A subset of respondents **strongly believes AI is a major threat**, possibly due to automation concerns.  
- **Skewness:** The distribution leans towards **AI being seen as non-threatening**, but **uncertainty is notable**.  
- **Key Observations:** While AI adoption is increasing, there is still debate over its implications, especially concerning job security and ethical issues.

---


# **Summary**  

This analysis of the Stack Overflow Developer Survey 2024 provides key insights into developer preferences, work trends, and AI adoption.  

- **Programming & Databases:** Python and JavaScript remain dominant, with PostgreSQL and MySQL as top database choices.  
- **Employment & Remote Work:** Most developers have a strong preference for remote or hybrid setups.  
- **AI Trends:** Sentiment towards AI is mostly positive, though concerns about AI as a threat exist.  

The findings highlight shifting industry trends, the growing role of AI, and evolving work preferences among developers. 🚀  
