# Stack Overflow Developer Survey 2020

This is inspired by Corey Schafer's video [Python Data Science Tutorial: Analyzing the 2019 Stack Overflow Developer Survey](https://www.youtube.com/watch?v=_P7X8tMplsw) where he performed basic analysis of Stack Overflow survey results without using `pandas`.

In this notebook, we will answer the question, **Who are VBA Developers?**

According to [Stack Overflow Developer Survey 2020](https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-dreaded), 80.4% of developers who are developing with Visual Basic for Applications (VBA) have not expressed interest in continuing to do so. This makes VBA the Most Dreaded Language for two years in a row.

In particular, I'm interested knowing about the following:

1. Other than VBA, what languages have they worked with in the past year (`LanguageWorkedWith`)?
1. What languages do they desire to work with next year (`LanguageDesireNextYear`)?
1. How frequently do VBA developers learn a new language or framework (`NEWLearn`)?
1. Primary operating system (`OpSys`)?
1. What kind of developer are they (`DevType`)?
1. How do they best describe themselves (`MainBranch`)?
1. Where do they live (`Country`)?
1. How do they describe their employment status (`Employment`)?
1. How many years have they been coding in total (`YearsCode`)?
1. What is their primary field of study (`UndergradMajor`)?


In [None]:
import csv
from dataclasses import dataclass
from collections import Counter

In [None]:
@dataclass
class Respondent():
    """dataclass for each respondent"""
    language_workedwith: str
    language_desirenextyear: str
    new_learn: str
    os_primary: str
    dev_type: str
    main_branch: str
    country: str
    employment_status: str
    years_coding: str
    undergrad_major: str

In [None]:
survey_results_public = '../input/stack-overflow-developer-survey-2020/developer_survey_2020/survey_results_public.csv'
LANG = 'VBA'

with open(survey_results_public) as file:
    results = csv.DictReader(file)
    
    VBA_Developers = [
        Respondent(
                    response['LanguageWorkedWith'],
                    response['LanguageDesireNextYear'],
                    response['NEWLearn'],
                    response['OpSys'],
                    response['DevType'],
                    response['MainBranch'],
                    response['Country'],
                    response['Employment'],
                    response['YearsCode'],
                    response['UndergradMajor'],
                )
        for response in results
        if LANG in response['LanguageWorkedWith']
    ]
    
total_devs = len(VBA_Developers)

## Q0: What percentage of VBA developers did not express interest in continuing with VBA?

This question is not listed above.

The purpose of this question is to whether or not the way we got the data from the CSV file is correct. Stack Overflow already provided the answer (80.4%). We need to be able to arrive at the same number.

So far, what we have done is to read from the source data, store each response in a `dataclass`, and collect all responses in a `list`.


In [None]:
devs_dread_vba = 0

for dev in VBA_Developers:
    if LANG not in dev.language_desirenextyear:
        devs_dread_vba += 1

print(f'There are {total_devs:,} {LANG} Developers who answered the survey.')
print(f'And {devs_dread_vba:,} of whom do not want to continue with {LANG}! :(')
print(f'That\'s {devs_dread_vba/len(VBA_Developers)*100:,.1f}%!')

## Report: Who Are VBA Developers?

In [None]:
def create_counter(data_attribute, source):
    """
    Returns a Counter() object of a given attribute.
    """
    responses = []
    for item in source:
        responses += (getattr(item, data_attribute).split(";"))
    return Counter(responses)

In [None]:
def print_counter_results(counter: Counter, top: int = None):
    """
    A function to print the results of a counter.
    """
    for item, result in counter.most_common(top):
        first_col_len = len(item)
        rate = f"{result/total_devs:.2%}"
        if first_col_len < 35:
            print(f'{item:<35} | {result:>8,} | {rate}')
        elif first_col_len > 60:
            print(f'{item[:34]:<35} | {result:>8,} | {rate}')
            print(f'{item[34:68]:<35} |          |')
            print(f'{item[68:]:<35} |          |')
        else:
            print(f'{item[:34]:<35} | {result:>8,} | {rate}')
            print(f'{item[34:68]:<35} |          |')

In [None]:
def display_results():
    for feature in Respondent.__annotations__.keys():
        print(f"Breakdown of {LANG} developers based on {feature}\n")
        feature_counter = create_counter(feature, VBA_Developers)
        print_counter_results(feature_counter, top=10)
        print('\n')
        
display_results()

## Reddit request

[/u/sancarn requested:](https://www.reddit.com/r/vba/comments/l21vmr/i_made_a_jupyter_notebook_which_breaks_down_some/gkczo5f?utm_source=share&utm_medium=web2x&context=3)

> I think the most interesting questions would be, using the same logic:
> 1. What is the most dreaded language for "Data or business analyst"
> 2. What is the most dreaded language for "I am not primarily a developer, but I write code sometimes as part of my work"?
>
> The reason I ask these questions is to me it is totally obvious that a full stack/back end/desktop application/front end developer wouldn't want to use VBA. They have much better tools at their disposal. The important question really is, how is VBA rated where it's most used, imo.



In [None]:
with open(survey_results_public) as file:
    results = csv.DictReader(file)
    
    DATA_ANALYSTS = [
        Respondent(
                    response['LanguageWorkedWith'],
                    response['LanguageDesireNextYear'],
                    response['NEWLearn'],
                    response['OpSys'],
                    response['DevType'],
                    response['MainBranch'],
                    response['Country'],
                    response['Employment'],
                    response['YearsCode'],
                    response['UndergradMajor'],
                )
        for response in results
        if 'Data or business analyst' in response['DevType']
    ]
        

In [None]:
TOTAL_DA = len(DATA_ANALYSTS)
print(f'There are {TOTAL_DA:,} Data or business analysts in the overall survey.')

In [None]:
da_lang_workedwith = create_counter('language_workedwith', DATA_ANALYSTS)
da_lang_desire = create_counter('language_desirenextyear', DATA_ANALYSTS)

In [None]:
hell_yes = {}
hell_no = {}

for lang, count in da_lang_workedwith.most_common():
    rate = (count, da_lang_desire[lang])
    if count < da_lang_desire[lang]:
        hell_yes[lang] = rate
    else:
        hell_no[lang] = rate

print("Languages with increased interest.")
for lang, num in hell_yes.items():
    if lang != "NA":
        print(f'More Data or business analysts want to use {lang:<10} : {num[1] - num[0]:>3} increase from {num[0]:>5,}.')

print("\nLanguages with decreased interest.")
for lang, num in hell_no.items():
    print(f'Data or business analysts want to drop {lang:<23} : {num[0] - num[1]:>3} decrease from {num[0]:>5,}.') 

print('')
print(''.join([str(hell_yes['NA'][0]), ' did not specify any language they worked with.']))
print(''.join([str(hell_yes['NA'][1]), ' did not specify any language they desire next year.']))

# Where do we go from here?

As an accountant who works with VBA a LOT but doesn't really have much collaboration with other people, it is personally meaningful to me to have an idea of what other VBA coders are like. To be honest, VBA being the most dreaded language is not something I would be surprised to discover.

Some final thoughts for anyone who would like to improve this notebook

- There are other variables in the survey which wasn't taken into account in this notebook.
    * To add other data, just add attributes to the `Respondent` dataclass, or try `setattr()`.
- This notebook could also use some visualization.
- Set `LANG` to a different language to use the code for a different language.

## Functions
- Function `create_counter()`: the first argument must be a string which exists as an attribute of the `Respondent` dataclass. The second argument must be an iterable containing `Respondent` objects.
- Function `print_counter_results()`: pass a `Counter` object as first argument; optional argument `top` limits printed results based on given integer. This function could definitely use some improvements especially with regards to `string` and `print`.