Analysis of Stack Overflow 2018 Developer Survey data to explore the differences between data scientists and non-data scientists.
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DS Survey Analysis.ipynb Added summary and expanded analysis in notebook. Sep 23, 2018
README.md Added blog link to README.md Sep 23, 2018

README.md

Data Scientist Survey Analysis - README

Installation

The code was written in Python 3 and requires the following packages: Pandas, Numpy, Collections, Matplotlib, Seaborn, Scipy and Warnings.

Project Motivation

The motivation behind this analysis is to explore how data scientists compare with other non-data scientist software developers ("non-data scientists") with regard to demographics, programming languages used, coding experience and job satisfaction. Consequently, in this analysis, I set out to answer the following questions, using data collected by Stack Overflow as part of their 2018 Annual Developer Survey:

  1. How does the demographic profile of data scientists differ from that of non-data scientists?
  2. What programming languages do data scientists favour and how do they differ from those used by non-data scientists?
  3. How much coding experience do data scientists have compared to non-data scientists?
  4. Are data scientists more satisfied with their jobs/careers than non-data scientists?

File Descriptions

All analysis is contained in the Jupyter notebook DS Survey Analysis.ipynb.

To run this code, it is first necessary to download the 2018 Stack Overflow Develop Survey dataset from https://insights.stackoverflow.com/survey. The folder containing this data (developer_survey_2018) should then be saved in the current working directory in a folder named "Data".

Results

The main findings of this analysis are summarised in a blog post available here.

Licensing, Authors, Acknowledgements

The dataset used in this analysis was created by Stack Overflow and made available for download under the Open Database License (ODbL).

The code contained in this repository may be used freely with acknowledgement.