Analyzing Stack Overflow Annual Developer Survey Data with MariaDB

⚠️ [UNMAINTAINED] This repository has been moved and is currently maintained here. ⚠️

This repository provides data and information that will enable you to transform, import and analyze the raw Stack Overflow Annual Developer Survey data with MariaDB ColumnStore.

Starting at Step 2, Parse and transform the data, this repository will walk you through the process of preparing, importing and, ultimately, being able to analyze the raw Stack Overflow Annual Developer Survey data for 2020 (which is included in the developer_survey_2020 folder).

Note: This repository will be updated to include the raw 2021 data once it becomes available here.

SkySQL is the first and only database-as-a-service (DBaaS) to bring the full power of MariaDB Platform to the cloud, including its support for transactional, analytical and hybrid workloads. Built on Kubernetes, and optimized for cloud infrastructure and services, SkySQL combines ease of use and self-service with enterprise reliability and world-class support – everything needed to safely run mission-critical databases in the cloud, and with enterprise governance.

Get started with SkySQL!

IMPORTANT: Once you've registered for MariaDB SkySQL you will need to create a new analytics service so that you can take advantage of the MariaDB columnar storage engine, ColumnStore. For more information on how to do this check out this walk-through, or check out this short video on launching a new SkySQL service - don't worry it only takes a couple of minutes!

Create the schema

The survey result data contained in newly created answers.csv file will need to be imported to MariaDB. To accomodate that you will need to create a new database, survey_data, that contains a single table, answers.

To create the new database and table you can either copy and execute the following code within a database client of your choice.

DROP DATABASE IF EXISTS survey_data;
CREATE DATABASE survey_data;

CREATE TABLE answers (
    respondent_id INT unsigned NOT NULL, 
    question_id VARCHAR(25) NOT NULL,
    answer VARCHAR(65) NOT NULL
) ENGINE=ColumnStore DEFAULT CHARSET=utf8;

or use the MariaDB Client to execute the schema.sql script contained within this repository.

For example:

Locally

$ mariadb --host 127.0.0.1 --user root -pPassword123! < schema.sql

SkySQL

mariadb --host analytics-1.mdb0001265.db.skysql.net --port 5001 --user DB00004537 -p --ssl-ca ~/Downloads/skysql_chain.pem < schema.sql

Note: Remember to update the command above with your database location, user and SSL information accordingly!

Import the data

After you've created the new schema, you can import the answers.csv data using the MariaDB Client.

For example:

Locally

mariadb --host 127.0.0.1 --port 3306 --user root -pPassword123! -e "LOAD DATA LOCAL INFILE 'answers.csv' INTO TABLE answers FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n'" survey_data

SkySQL

mariadb --host analytics-1.mdb0001265.db.skysql.net --port 5001 --user DB00004537 -p --ssl-ca ~/Downloads/skysql_chain.pem -e "LOAD DATA LOCAL INFILE 'answers.csv' INTO TABLE answers FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n'" survey_data

Note: Remember to update the command above with your database location, user and SSL information accordingly!

Analyze the data

Once the data has been successfully imported into MariaDB there are many ways that you can use the data!

SQL

You can use a database client to execute SQL queries directly on the results data contained in the answers table.

For exmple, using the MariaDB Client:

Start by connecting to your MariaDB database instance.

$ mariadb --host 127.0.0.1 --user root -pPassword123!

Execute the following query to SELECT the top 10 programming langauges that have been used by respondents that have also used MariaDB.

SELECT
	answer, COUNT(answer) AS respondent_count
FROM
	survey_data.answers
WHERE 
	question_id = "LanguageWorkedWith" AND 
	respondent_id IN (SELECT respondent_id FROM answers WHERE question_id = "DatabaseWorkedWith" AND answer = "MariaDB")
GROUP BY
	answer
ORDER BY
	COUNT(answer) DESC
LIMIT 10;

+-----------------------+------------------+
| answer                | respondent_count |
+-----------------------+------------------+
| JavaScript            |             6878 |
| HTML/CSS              |             6597 |
| SQL                   |             6239 |
| PHP                   |             5149 |
| Python                |             4204 |
| Java                  |             4028 |
| Bash/Shell/PowerShell |             3746 |
| TypeScript            |             2558 |
| C#                    |             2396 |
| C++                   |             2277 |
+-----------------------+------------------+

Python & Jupyter Lab

You can also use modern data analysis and visualization tools like Jupyter Lab, in combination with MariaDB Connector/Python and Python libraries like Plotly and Pandas.

For more information on how you can do this please check out the following resources:

Using Data Analysis and Visualization with MariaDB Connector/Python (GitHub Repository)
Deep dive: Taking advantage of MariaDB Connector for Python (Webinar)

Support and Contribution

Please feel free to submit PR's, issues or requests to this project project or projects within the official MariaDB Corporation GitHub organization.

If you have any other questions, comments, or looking for more information on MariaDB please check out:

Or reach out to us diretly via:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
developer_survey_2020		developer_survey_2020
media		media
LICENSE		LICENSE
README.md		README.md
parse_and_transform.py		parse_and_transform.py
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

developer_survey_2020

developer_survey_2020

media

media

LICENSE

LICENSE

README.md

README.md

parse_and_transform.py

parse_and_transform.py

schema.sql

schema.sql

Repository files navigation

Analyzing Stack Overflow Annual Developer Survey Data with MariaDB

Table of Contents

Requirements

Parsing and transforming the data

Preparing the database

Docker Container

MariaDB SkySQL

Create the schema

Import the data

Analyze the data

SQL

Python & Jupyter Lab

Support and Contribution

License

About

Releases

Packages

Languages

License

mariadb-corporation/dev-example-columnstore-developer-survey-data

Folders and files

Latest commit

History

Repository files navigation

Analyzing Stack Overflow Annual Developer Survey Data with MariaDB

Table of Contents

Requirements

Parsing and transforming the data

Preparing the database

Docker Container

MariaDB SkySQL

Create the schema

Import the data

Analyze the data

SQL

Python & Jupyter Lab

Support and Contribution

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages