# Jupyter Notebook 2015 UX Survey Results

<div class="alert alert-info"><b>Note:</b> This notebook contains the same information as <a href="report_dashboard.ipynb">report_dashboard.ipynb</a> but includes more interactive widgets from the jupyter-incubator/declarativewidgets project. Many thanks to <a href="https://github.com/peller">@peller</a> and <a href="https://github.com/aluu317">@aluu317</a> for improving the experience.</div>

## Executive Summary

The following are the key takeaways from our analysis of the survey response data. The remainder of this notebook gives evidence for these points.

**Experienced Users**

People who use Jupyter frequently (daily/weekly), who have been using it for more than a year, and who have been in their job role for 2+ years are most heavily represented in the survey data. The majority of respondants self-identify as data scientists, students, scientists, and researchers.

**Like the Notebook Concept**

Unsurprisingly, respondents state that they like the ability to quickly analyze, visualize, explore, and describe data in rich notebook documents that they can share. As such, the interactive computing paradigm should remain central to the user experience and continue to improve over time.

**Need More than Notebooks**

Respondents find that the Jupyter Notebook lacks features they require to complete their data analytics workflow. To address these gaps, the respondents request a wide variety of features and integrations. The most common requests are the following:

* Version control (via git in particular)
* Robust text and code editing (like in Emacs, Vim, Sublime, PyCharm)
* Advanced code development tools (debugging, profiling, variable watching, code modularization)
* Simpler export and deployment options (one-click transformations to slides, scripts, reports) 

The project roadmap should incorporate these suggestions to ensure that Jupyter continues to address the needs of end users.

**Need More Help**

Respondents state that installation of Jupyter Notebook should be easier to perform and understand, both for single users and groups of users. Respondents commonly cite documentation, tutorials, and help as ways to improve the situtation.

## About the Survey

In late 2015, members of the Jupyter Community conducted a 16-question survey about the Jupyter Notebook user experience. The survey ran on [SurveyGizmo](https://www.surveygizmo.com/) from December 21, 2015 until January 15, 2016. Posts on the [Project Jupyter Google Group](https://groups.google.com/forum/#!topic/jupyter/XCzJ02Rzj0Y), on the [Jupyter blog](http://blog.jupyter.org/2015/12/22/jupyter-notebook-user-experience-survey/), and from the [@ProjectJupyter Twitter account](https://twitter.com/ProjectJupyter/status/684096608166776832) were used to share the survey link and solicit responses. At the conclusion of the survey, the raw response data were [posted to GitHub](https://github.com/jupyter/design/tree/master/surveys/2015-notebook-ux) along with a description of the data format.

## About our Analysis

Starting with the raw response data, we ([@parente](https://github.com/parente), [@jtyberg](https://github.com/jtyberg)):

1. Toyed with ways to [extract the salient points from the free text responses](prep/1_ux_survey_review.ipynb)
2. Made the data [more pandas-friendly](prep/2_clean_survey.ipynb)
3. Tried a few different approaches to [identifying keyword patterns and aggregating them into themes](prep/3a_hinderance_themes.ipynb)
4. Repeated our keyword pattern search and theme annotation approach (see below) across all remaining free-text questions (e.g., [integration](prep/3b_integration_themes.ipynb), [needs addressed](prep/3c_needs_addressed_themes.ipynb), [needs not addressed](prep/3d_needs_not_addressed_themes.ipynb), etc.)
5. Summarized our findings in this notebook.

Step #4 constituted the bulk of the analysis work. The notebooks in the [prep](prep/) folder have all the details. The next section includes a brief summary of key terms from our methodology that are used in this report.

## About this Notebook

In this notebook, we lend evidence to the claims made in the executive summary at the top of this notebook. We include written observations, plots, and interactive widgets for exploring the results.

You may:

1. View the [report_dashboard.ipynb](report_dashboard.ipynb) notebook as a static webpage on [NBViewer](https://nbviewer.jupyter.org/github/jupyter/design/blob/master/surveys/2015-notebook-ux/analysis/report_dashboard.ipynb) or [GitHub](https://github.com/jupyter/design/blob/master/surveys/2015-notebook-ux/analysis/report_dashboard.ipynb). (No interactivity, one plot per section)
2. Interact with the widgets in this notebook on a temporary notebook server provided by a [tmpnb site](http://jupyter.cloudet.xyz/notebooks/2015-notebook-ux-survey/analysis/report_dashboard_decl_widgets.ipynb). (Plot keywords vs themes, group by respondent categories, show counts or precentages within groups, modify the code yourself, click a bar to see a random sample of responses, click *View &rarr; Dashboard Preview* to see just the output)
3. Interact with a standalone dashboard application bundled and deployed from this notebook on a [tmpnb site](http://jupyter.cloudet.xyz/files/2015-notebook-ux-survey/analysis/deployed_dashboard/index.html).
4. Clone the [jupyter/design](https://github.com/jupyter/design) repository from GitHub and put the notebooks / data on a Jupyter server of your choosing.

<a name="key-terms"></a>
<div class="alert alert-info"><b>Key Terms</b>
<p>To best understand and interact with the plots below, keep the following in mind:</p>

<ul>
<li><b>Respondents</b> did not necessarily answer every survey question. We considered the available responses, regardless of whether the user answered other questions or not. As a result, the total response count varies across the plots below, especially when grouping by other response categories.</li>
<li><b>Keyword patterns</b> are the common regular expression patterns we identified by sampling the responses. They represent terms and phrases that actually appear in the free-text responses. We optimized these for recall more than precision.</li>
<li><b>Annotator themes</b> are our subjective groupings of keyword patterns. They are an attempt to identify common topics within the responses.</li>
</ul>
</div>

### Notebook Setup

If you want to run this notebook yourself in your own environment, ensure you have all of the  the following:

```
numpy==1.10.4
matplotlib==1.5.1
seaborn==0.7.0
ipywidgets==4.1.1
jupyter_cms==0.4.0
pandas==0.18.0
jupyter-declarativewidgets==0.6.0
jupyter-dashboards==0.4.2 (optional)
```

In [None]:
!pip install jupyter_cms

In [None]:
import warnings
import declarativewidgets as widgets

warnings.simplefilter('ignore')
widgets.init()

Import declarative widgets used in this notebook, including the custom survey explorer widget:

```
prep/widgets/survey-explorer/survey-explorer.html
```

In [None]:
%%html
<link rel='import' href='urth_components/paper-toggle-button/paper-toggle-button.html'
        is='urth-core-import' package='PolymerElements/paper-toggle-button'>
<link rel="import" href="urth_components/paper-menu/paper-menu.html"
        is='urth-core-import' package='PolymerElements/paper-menu'>
<link rel="import" href="urth_components/paper-item/paper-item.html" 
        is='urth-core-import' package='PolymerElements/paper-item'>
<link rel="import" href="urth_components/paper-dropdown-menu/paper-dropdown-menu.html"
        is='urth-core-import' package='PolymerElements/paper-dropdown-menu'>
<link rel="import" href="urth_components/urth-core-function/urth-core-function.html" is='urth-core-import'>
<link rel='import' href='urth_components/urth-viz-bar/urth-viz-bar.html' is='urth-core-import'>
<link rel='import' href='urth_components/urth-viz-table/urth-viz-table.html' is='urth-core-import'>
<link rel='import' href='prep/widgets/survey-explorer/survey-explorer.html'>

In [None]:
import os
import glob
import pandas as pd

In [None]:
%load_ext jupyter_cms

Read the responses with shortened column names.

In [None]:
df = pd.read_csv('./prep/survey_short_columns.csv', index_col=0)

Merge in the themes and keywords for the free-text questions.

In [None]:
files = glob.glob('./prep/*_themes.csv')
files.append('./prep/roles.csv')
for fn in files:
    theme_df = pd.read_csv(fn, sep=';', index_col=0)
    df = df.merge(theme_df, left_index=True, right_index=True, how='left')

Cut-down the number of primary roles for better plotting. Keep the top 8 and bin the rest as "other".

In [None]:
top_roles = df.roles_primary.value_counts()[:8].index

In [None]:
df.loc[~df.roles_primary.isnull() & ~df.roles_primary.isin(top_roles), 'roles_primary'] = 'other'

In [None]:
df.roles_primary.value_counts()

Read the utils notebook which has all the plotting and widget routines. (We didn't want them all in here cluttering the static report.)

In [None]:
utils = load_notebook('./prep/5_utils.ipynb')
# Make functions in utils available here for widgets
exploreDataFrame = utils.exploreDataFrame
getSample = utils.getSample
# Make the full DataFrame available in utils for widgets
utils.set_survey_df(df)
# When we group repsonses by primary role, order the legend from the largest cadre to the smallest.
utils.group_orders['roles_primary'] = top_roles.tolist()

## About the Survey Respondents

We start with information about the respondants themselves. Most of the questions in this section have a fixed set of possible responses. Only one allows for free text responses.

### How often do you use Jupyter Notebook?

The majority of respondants use the notebook on a daily and weekly basis.

In [None]:
how_often_counts = df.how_often.value_counts().reindex(utils.group_orders['how_often'])
how_often_dataframe = pd.DataFrame(how_often_counts).reset_index()
how_often_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="how_often_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Notebook Usage" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

### Roughly how long have you been using Jupyter Notebook?

The majority of respondents have been Jupyter Notebook users for longer than a year.

In [None]:
how_long_counts = df.how_long_used.value_counts().reindex(utils.group_orders['how_long_used'])
how_long_dataframe = pd.DataFrame(how_long_counts).reset_index()
how_long_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="how_long_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Notebook Experience" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

The distribution of frequency of use is skewed toward daily/weekly in both the more and less experienced user groups.

In [None]:
long_x_often = pd.crosstab(df.how_long_used, df.how_often)
long_x_often = long_x_often.reindex(utils.group_orders['how_long_used'], utils.group_orders['how_often'])
long_x_often.columns.name = 'Notebook Usage'
long_x_often_dataframe = pd.DataFrame(long_x_often).reset_index()
long_x_often_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="long_x_often_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Notebook Experience" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

### What is your primary role when using Jupyter Notebook (e.g., student, astrophysicist, financial modeler, business manager, etc.)?

Most respondents name a single job role (e.g., data scientist). Some state more than one (e.g., student, practicing statistician). Data scientist, student, scientist, researcher, and developer are the most common.

In [None]:
%%html
<survey-explorer series='["roles_roles", "roles_keywords"]' sample-source='["roles_roles"]'  labels='["Annotator themes", "Keyword patterns"]'/>

### How many years have you been in this role?

Most respondents have 2+ years of experience in their job role.

In [None]:
counts = df.years_in_role.value_counts()
counts = counts.reindex(utils.group_orders['years_in_role'])
yearcounts_dataframe = pd.DataFrame(counts).reset_index()
yearcounts_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="yearcounts_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Years in Job Role" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

### How do you run the Jupyter Notebook?

The vast majority of respondants run Jupyter as a standalone application.

In [None]:
counts = df.how_run.value_counts()
counts = counts.reindex(utils.group_orders['how_run'])
howrun_counts_dataframe = pd.DataFrame(counts).reset_index()
howrun_counts_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="howrun_counts_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Notebook Environment" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

Write-in responses vary widely. Many are more specific versions of "as a standalone application."

In [None]:
df.how_run_other.dropna().sample(15).to_frame('Random sample of 15 write-in responses').reset_index(drop=True)

### How many people typically see and/or interact with the results of your work in Jupyter Notebook? 

*Consider people who view your notebooks on nbviewer, colleagues who rerun your notebooks, developers who star your notebook repos on GitHub, audiences who see your notebooks as slideshows, etc.*

Most of the respondents create notebooks for tens or hundreds of users. Few write for thousands of users or more.

In [None]:
counts = df.audience_size.value_counts()
counts = counts.reindex(utils.group_orders['audience_size'])
audiencesize_dataframe = pd.DataFrame(counts).reset_index()
audiencesize_dataframe

In [None]:
%%html
<template is="dom-bind">
    <urth-core-dataframe ref="audiencesize_dataframe" value="{{df}}" auto></urth-core-dataframe>
    <urth-viz-bar xlabel="Typical Notebook Audience" ylabel="Respondants" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

## What the Respondents Said

Now we look at responses to questions about the notebook. All of the questions in this section have free-text responses. Revisit the blue **Key Terms** section at the top for help reading the plots.

### What needs in your workflow does Jupyter Notebook address?

Respondents frequently indicate that Jupyter Notebook addresses their need to analyze, explore, visualize, interact with, and document data quickly and easily.

In [None]:
%%html
<survey-explorer series='["workflow_needs_addressed_keywords", "workflow_needs_addressed_themes"]' 
    sample-source='["workflow_needs_addressed_1", "workflow_needs_addressed_2", "workflow_needs_addressed_3"]'/>

### What aspects of Jupyter Notebook make it pleasant to use in your workflow?

Ease of use, the tight interactive workflow, and the ability to combine inline markup with rich media top the list of pleasant aspects noted by respondents.

In [None]:
%%html
<survey-explorer series='["pleasant_aspects_keywords", "pleasant_aspects_themes"]' 
    sample-source='["pleasant_aspects_1", "pleasant_aspects_2", "pleasant_aspects_3"]'/>

### What, if anything, hinders you from making Jupyter Notebook an even more regular part of your workflow?

Respondants cite the need for better development tools as the main blocker more regular use of the notebook. More robust text editing capability, version control, cleaner integration with languages and tools, and better documentation all top the list.

In [None]:
%%html
<survey-explorer series='["hinderances_keywords", "hinderances_themes"]' sample-source='["hinderances"]'/>

### What needs in your workflow does Jupyter Notebook not address?

As expected, nearly the same themes and keywords that top the hinderances list also top the list of workflow needs not addressed by the Jupyter Notebook today.

In [None]:
%%html
<survey-explorer series='["workflow_needs_not_addressed_keywords", "workflow_needs_not_addressed_themes"]' 
    sample-source='["workflow_needs_not_addressed_1", "workflow_needs_not_addressed_2", "workflow_needs_not_addressed_3"]'/>

### What aspects of Jupyter Notebook make it difficult to use in your workflow?

Respondents most often report difficulty with code editing, installing the application, working in the browser, version controlling notebooks, and working without the support of a full IDE.

In [None]:
%%html
<survey-explorer series='["difficult_aspects_keywords", "difficult_aspects_themes"]' 
    sample-source='["difficult_aspects_1", "difficult_aspects_2", "difficult_aspects_3"]'/>

### What tools and applications, if any, would you like to see more tightly integrated with Jupyter Notebook?

Git is by far and away the most oft requested integration. Specific tools like vim, d3, matplotlib, R and Spark as well as general concepts like interactivity, debugging, better editors, and version control are also mentioned frequently.

In [None]:
%%html
<survey-explorer series='["integrations_keywords", "integrations_themes"]' 
    sample-source='["integrations_1", "integrations_2", "integrations_3"]'/>

### What new features or changes would you like to see in Jupyter Notebook?

*Please list anything that comes to mind that helps you in your workflow, big or small.*

Version control with Git also tops the list of requested features and changes. Adding a workspace/IDE to the user experience, supporting custom styles (both visual and functional, ala keymaps), embedding ancillary tools (e.g., filebrowser sidebar), and behaving more like major text editors (e.g., Vim, Atom, Sublime) are also common requests.

In [None]:
%%html
<survey-explorer series='["features_changes_keywords", "features_changes_themes"]' 
    sample-source='["features_changes_1", "features_changes_2", "features_changes_3"]'/>

### Thinking back to when you first started using Jupyter Notebook, what enhancements would have made your initial experience better?

Respondents most frequently name installation improvements, better documentation, a better keyboard map, and tutorials as enhancements that would have improved their first experience.

In [None]:
%%html
<survey-explorer series='["first_experience_enhancements_keywords", "first_experience_enhancements_themes"]' 
    sample-source='["first_experience_enhancements_1", "first_experience_enhancements_2", "first_experience_enhancements_3"]'/>

### Select all the words that best describe Jupyter Notebook.

Respondents select positive words to describe Jupyter Notebook far more often than negative words. The convenience of the notebook stands out.

In [None]:
df['keywords_csv'] = df.keywords.str.replace(';',',')

In [None]:
%%html
<survey-explorer series='["keywords_csv"]' sample-source='["keywords", "keywords_other"]' labels='["Fixed keywords"]'/>

*Powerful* tops the list of write-in responses. The long-tail varys quite a bit, with both positive and negative sentiments expressed.

In [None]:
df.keywords_other.str.lower().value_counts().head(10).to_frame('Respondents')

## Conclusion and Next Steps

From this survey we learned that experienced users like the notebook concept for its simplicity in quickly exploring, visualizing, and describing data, but they need more than notebooks in their analytics workflow, and they suggest all users need more help getting started. Going forward, we recommend that the Jupyter community:

* Constantly ensure that the Jupyter Notebook retains what users value today
* Work solutions to the most common hinderances and feature requests into the Jupyter roadmap
* Run more specific user experience studies / surveys in the future about major changes and additions 
    * For example, if a first pass of git integration lands on jupyter/notebook master, stand up a Binder with that feature enabled, and invite users to try it before landing it in a stable Jupyter release.