# Jupyter Notebok 2015 UX Survey - Summary

## Key Findings

The following are the key takeaways from our analysis of the survey response data. The remainder of this notebook gives evidence for these points.

**Experienced Users**

People who use Jupyter frequently (daily/weekly), who have been using it for more than a year, and who have been in their job role for 2+ years are most heavily represented in the survey data. The majority of respondants self-identify as data scientists, students, scientists, and researchers.

**Like the Notebook Concept**

Unsurprisingly, respondents state that they like the ability to quickly analyze, visualize, explore, and describe data in rich notebook documents that they can share. As such, the interactive computing paradigm should remain central to the user experience and continue to improve over time.

**Need More than Notebooks**

Respondents find that the Jupyter Notebook lacks features they require to complete their data analytics workflow. To address these gaps, the respondents request a wide variety of features and integrations. The most common requests are the following:

* Version control (via git in particular)
* Robust text and code editing (like in Emacs, Vim, Sublime, PyCharm)
* Advanced code development tools (debugging, profiling, variable watching, code modularization)
* Simpler export and deployment options (one-click transformations to slides, scripts, reports) 

The project roadmap should incorporate these suggestions to ensure that Jupyter continues to address the needs of end users.

**Need More Help**

Respondents state that installation of Jupyter Notebook should be easier to perform and understand, both for single users and groups of users. Respondents commonly cite documentation, tutorials, and help as ways to improve the situtation.

## About the Survey

In late 2015, members of the Jupyter Community conducted a 16-question survey about the Jupyter Notebook user experience. The survey ran on [SurveyGizmo](https://www.surveygizmo.com/) from December 21, 2015 until January 15, 2016. Posts on the [Project Jupyter Google Group](https://groups.google.com/forum/#!topic/jupyter/XCzJ02Rzj0Y), on the [Jupyter blog](http://blog.jupyter.org/2015/12/22/jupyter-notebook-user-experience-survey/), and from the [@ProjectJupyter Twitter account](https://twitter.com/ProjectJupyter/status/684096608166776832) were used to share the survey link and solicit responses. At the conclusion of the survey, the raw response data were [posted to GitHub](https://github.com/jupyter/design/tree/master/surveys/2015-notebook-ux) along with a description of the data format.

## About our Analysis

Starting with the raw response data, we ([@parente]() and @[jtyberg](), members of the IBM Emerging Technology Group):

1. Toyed with ways to [extract the salient points from the free text responses](1_ux_survey_review.ipynb)
2. Made the data [more pandas-friendly](2_clean_survey.ipynb)
3. Tried a few different approaches to [identifying keyword patterns and aggregating them into themes](3a_hinderance_themes.ipynb)
4. Repeated our keyword pattern search and more subjective annotator theme grouping approach across all remaining free-text questions (e.g., [integration](3b_integration_themes.ipynb), [needs addressed](3c_needs_addressed_themes.ipynb), [needs not addressed](3d_needs_not_addressed_themes.ipynb), etc.)
5. Summarized our findings in this notebook.

TODO: more about #4

## About this Notebook

In this notebook, we lend evidence to the claims made in the executive summary at the top of this notebook. We include written observations, plots, and interactive widgets for exploring the results.

You may:

1. View this notebook as a static webpage on [NBViewer]() or [GitHub](). (No interactivity, one plot per section).
2. Interact with the widgets in this notebook in a dashboard or notebook view in your own [Binder]() instance.
4. Clone the [jupyter/design](https://github.com/jupyter/design) repository from GitHub and put the notebooks / data on a Jupyter server of your choosing.

<div class="alert alert-info"><b>Keep in mind</b> that respondants did not necessarily answer every survey question. We considered the available responses to a question, regardless of whether the user answered other questions not. As a result, you will see the total responses vary across the plots below, especially if you are interacting with the plots as widgets and choosing to group responses.</div>

### Notebook Setup

If you want to run this notebook yourself in your own environment, ensure you have all of the  the following:

```
numpy==1.10.4
matplotlib==1.5.1
pandas==0.17.0
seaborn==0.7.0
ipywidgets==4.1.1
jupyter_cms==0.4.0
```

In [None]:
import warnings
warnings.simplefilter('ignore')

In [None]:
%matplotlib inline

In [None]:
import os
import glob
import pandas as pd
import seaborn as sns

In [None]:
%load_ext jupyter_cms

In [None]:
utils = load_notebook('./5_utils.ipynb')

In [None]:
sns.set_palette('deep', n_colors=16)
sns.set(rc={"figure.figsize": (14,6)})

Read the responses with shortened column names.

In [None]:
df = pd.read_csv('survey_short_columns.csv', index_col=0)

Merge in the themes and keywords for the free-text questions.

In [None]:
files = glob.glob('./*_themes.csv')
files.append('./roles.csv')
for fn in files:
    theme_df = pd.read_csv(fn, sep=';', index_col=0)
    df = df.merge(theme_df, left_index=True, right_index=True, how='left')

## About the Survey Respondents

We start with information about the respondants themselves. Most of the questions in this section have a fixed set of possible responses. Only one allows for free text responses.

#### Q1: How often do you use Jupyter Notebook?

The majority of respondants use the notebook on a daily and weekly basis.

In [None]:
how_often_counts = df.how_often.value_counts().reindex(utils.group_orders['how_often'])
ax = sns.barplot(how_often_counts.index, how_often_counts.values)
ax.set_ylabel('Respondants')
_ = ax.set_xlabel('Notebook Usage')

#### Q3: Roughly how long have you been using Jupyter Notebook?

The majority of respondents have been users Jupyter for longer than a year.

In [None]:
how_long_counts = df.how_long_used.value_counts().reindex(utils.group_orders['how_long_used'])
ax = sns.barplot(how_long_counts.index, how_long_counts.values)
ax.set_ylabel('Respondants')
_ = ax.set_xlabel('Notebook Experience')

The distribution of frequency of use is skewed toward daily/weekly in both the more and less experienced user groups.

In [None]:
long_x_often = pd.crosstab(df.how_long_used, df.how_often)
long_x_often = long_x_often.reindex(utils.group_orders['how_long_used'], utils.group_orders['how_often'])
long_x_often.columns.name = 'Notebook Usage'
ax = long_x_often.plot(kind='bar')
ax.set_ylabel('Respondents')
ax.set_xlabel('Notebook Experience')
_ = ax.set_xticklabels(ax.get_xmajorticklabels(), rotation=25, ha='right')

#### Q13: What is your primary role when using Jupyter Notebook (e.g., student, astrophysicist, financial modeler, business manager, etc.)?

Most respondents name a single job role (e.g., data scientist). Some state more than one (e.g., student, practicing statistician). Data scientist, student, scientist, researcher, and developer are the most common.

In [None]:
utils.explore(df, [df.roles_roles, df.roles_keywords], utils.default_labels[::-1])

#### Q14: How many years have you been in this role?

Most respondents have 2+ years of experience in their job role.

In [None]:
counts = df.years_in_role.value_counts()
counts = counts.reindex(utils.group_orders['years_in_role'])
ax = sns.barplot(counts.index, counts.values)
ax.set_ylabel('Respondents')
_ = ax.set_xlabel('Years in Job Role')

#### Q5: How do you run the Jupyter Notebook?

The vast majority of respondants run Jupyter as a standalone application.

In [None]:
counts = df.how_run.value_counts()
counts = counts.reindex(utils.group_orders['how_run'])
ax = sns.barplot(counts.index, counts.values)
ax.set_ylabel('Respondents')
_ = ax.set_xlabel('Notebook Environment')

Write-in responses vary widely. Many are more specific version of "as a standalone application."

In [None]:
df.how_run_other.dropna().sample(20)

#### Q16: How many people typically see and/or interact with the results of your work in Jupyter Notebook? (Consider people who view your notebooks on nbviewer, colleagues who rerun your notebooks, developers who star your notebook repos on GitHub, audiences who see your notebooks as slideshows, etc.)

Most of the respondents create notebooks for tens or hundreds of users. Few write for thousands of users or more.

In [None]:
counts = df.audience_size.value_counts()
counts = counts.reindex(utils.group_orders['audience_size'])
ax = sns.barplot(counts.index, counts.values)
ax.set_ylabel('Respondants')
_ = ax.set_xlabel('Typical Notebook Audience')

## What the Respondents Said

Now we look at responses to questions about the notebook. All of the questions in this section have free-text responses.

TODO: reminder about themes, keywords by linking to methodology

#### Q6: What needs in your workflow does Jupyter Notebook address?

Respondents frequently indicate that Jupyter Notebook addresses their need to analyze, explore, visualize, interact with, and document data quickly and easily.

In [None]:
utils.explore(df, [df.workflow_needs_addressed_keywords], 
              labels=['Keyword patterns'])

#### Q8: What aspects of Jupyter Notebook make it pleasant to use in your workflow?

Ease of use, the tight interactive workflow, and the ability to combine inline markup with rich media top the list of pleasant aspects noted by respondents.

In [None]:
utils.explore(df, [df.pleasant_aspects_keywords, df.pleasant_aspects_themes])

#### Q2: What, if anything, hinders you from making Jupyter Notebook an even more regular part of your workflow?

Respondants cite the need for better development tools as the main blocker more regular use of the notebook. More robust text editing capability, version control, cleaner integration with languages and tools, and better documentation all top the list.

In [None]:
utils.explore(df, [df.hinderances_keywords, df.hinderances_themes])

#### Q7: What needs in your workflow does Jupyter Notebook not address?

As expected, nearly the same themes and keywords that top the hinderances list also top the list of workflow needs not addressed by the Jupyter Notebook today.

In [None]:
utils.explore(df, [df.workflow_needs_not_addressed_keywords, df.workflow_needs_not_addressed_themes])

#### Q9: What aspects of Jupyter Notebook make it difficult to use in your workflow?

Respondents most often report difficulty with code editing, installing the application, working in the browser, version controlling notebooks, and working without the support of a full IDE.

In [None]:
utils.explore(df, [df.difficult_aspects_keywords, df.difficult_aspects_themes])

#### Q4: What tools and applications, if any, would you like to see more tightly integrated with Jupyter Notebook?

Git is by far and away the most oft requested integration. Specific tools like vim, d3, matplotlib, R and Spark as well as general concepts like interactivity, debugging, better editors, and version control are also mentioned frequently.

In [None]:
utils.explore(df, [df.integrations_keywords, df.integrations_themes])

#### Q10: What new features or changes would you like to see in Jupyter Notebook? (Please list anything that comes to mind that helps you in your workflow, big or small.)

In [None]:
tmp = df.dropna(subset=['features_changes_keywords'])

In [None]:
tmp[tmp.features_changes_keywords.str.contains('sidebar')][['features_changes_1', 'features_changes_2', 'features_changes_3']]

Version control with Git also tops the list of requested features and changes. Adding a workspace/IDE to the user experience, supporting custom styles (both visual and functional, ala keymaps), embedding ancillary tools (e.g., filebrowser sidebar), and behaving more like major text editors (e.g., Vim, Atom, Sublime) are also common requests.

In [None]:
utils.explore(df, [df.features_changes_keywords, df.features_changes_themes])

#### Q11: Thinking back to when you first started using Jupyter Notebook, what enhancements would have made your initial experience better?

Respondents most frequently name installation improvements, better documentation, a better keyboard map, and tutorials as enhancements that would have improved their first experience.

In [None]:
utils.explore(df, [df.first_experience_enhancements_keywords, df.first_experience_enhancements_themes])

#### Q12: Select all the words that best describe Jupyter Notebook.

Respondents select positive words to describe Jupyter Notebook far more often than negative words. The convenience of the notebook stands out.

In [None]:
utils.explore(df, [df.keywords.str.replace(';',',')], ['Fixed keywords'])

*Powerful* tops the list of write-in responses. The long-tail varys quite a bit, with both positive and negative sentiments expressed.

In [None]:
df.keywords_other.str.lower().value_counts().head(15)

## Next Steps

TODO: more specific questionnaires about workflow and features

## Footnotes

1. pandas 0.17.1 has a bug that causes it to ignore styles when making bar charts. I installed the latest pandas master branch to get around using eye-bleed-blue everywhere.

    ```
    conda install -y cython
    pip install git+https://github.com/pydata/pandas.git
    ```