**NOTE** This notebook is work under progress

# Interactive exploration of current errors in pandas docstrings

*DISCLAIMER: This notebook is based on the one uploaded by @dujm [here](https://github.com/python-sprints/pandas-mentoring/blob/master/notebooks/docstring_error_interactive.ipynb)*


This notebook will help you detect which errors are still present on some of the docstrings of pandas, so that you can select one of them, fix it, and submit a PR to the [pandas repository](https://github.com/pandas-dev/pandas). 

**IMPORTANT!** Before starting to work on fixing an error, check that nobody is already working on it by searching the issues and PRs in the pandas repository. If you nobody is doing so, open an issue and let others know you will be fixing that docstring.

This script currently supports pandas version >= 0.25.0

Let's start by importing the necessary packages:

In [2]:
import os

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import ipywidgets as widgets
import qgrid

ModuleNotFoundError: No module named 'qgrid'

## *Static exploration*

## 1. Generate a .json containing all current errors

This step was automatically done if you are running this notebook from Binder. Keep in mind that the .json file is updated every 15 minutes, so it might be outdated. When you select an error to work on, double check that nobody has submitted an issue to work on it already.

If you want to generate the .json file locally, simply run the following command from your pandas clone:

`./scripts/validate_docstrings.py --format=json > /path/to/json/pandas_docstring_errors.json`

## 2. Plot a table describing the errors

We will plot a table that indicates which pandas functions still have an error in their docstrings. The specific error code and description will be also described.

In [2]:
file = 'pandas_docstring_errors.json'
df = (pd.read_json(file)
            .transpose()
            .filter(items=['errors', 'file', 'file_line'])
            .explode('errors')
            .dropna()
            .reset_index()
            .rename(columns={"index": "function"})
     )
df[['error_code','error_description']] = pd.DataFrame(df.errors.tolist())
df = df.drop(["errors"], axis=1)

df

Unnamed: 0,function,file,file_line,error_code,error_description
0,pandas.tseries.offsets.DateOffset.apply,/Users/galih.sahid/Documents/personal/pandas/p...,270,GL08,The object does not have a docstring
1,pandas.tseries.offsets.DateOffset.isAnchored,/Users/galih.sahid/Documents/personal/pandas/p...,381,GL08,The object does not have a docstring
2,pandas.tseries.offsets.DateOffset.onOffset,/Users/galih.sahid/Documents/personal/pandas/p...,373,GL08,The object does not have a docstring
3,pandas.tseries.offsets.DateOffset.is_anchored,/Users/galih.sahid/Documents/personal/pandas/p...,368,GL08,The object does not have a docstring
4,pandas.tseries.offsets.DateOffset.is_on_offset,/Users/galih.sahid/Documents/personal/pandas/p...,439,GL08,The object does not have a docstring
...,...,...,...,...,...
3076,pandas.DataFrame.to_gbq,/Users/galih.sahid/Documents/personal/pandas/p...,1446,EX01,No examples section found
3077,pandas.DataFrame.to_records,/Users/galih.sahid/Documents/personal/pandas/p...,1684,PR07,"Parameter ""column_dtypes"" has no description"
3078,pandas.DataFrame.to_records,/Users/galih.sahid/Documents/personal/pandas/p...,1684,PR07,"Parameter ""index_dtypes"" has no description"
3079,pandas.DataFrame.to_string,/Users/galih.sahid/Documents/personal/pandas/p...,744,ES01,No extended summary found


## 3. Count number of functions with errors per error type

In [3]:
df_code = df['error_code'].value_counts().reset_index()
df_code.columns = ['error_code','counts']

df_code

Unnamed: 0,error_code,counts
0,SA01,380
1,EX01,379
2,ES01,362
3,RT03,306
4,SA04,280
5,PR07,242
6,GL08,242
7,PR01,231
8,EX02,167
9,EX03,144


## 4. Count number of errors per function

In [4]:
df_function = df['function'].value_counts().reset_index()
df_function.columns = ['function','counts']

df_function

Unnamed: 0,function,counts
0,pandas.PeriodIndex,13
1,pandas.core.groupby.DataFrameGroupBy.boxplot,13
2,pandas.HDFStore.append,13
3,pandas.CategoricalIndex.remove_unused_categories,11
4,pandas.Series.cat.remove_unused_categories,11
...,...,...
1068,pandas.DataFrame.query,1
1069,pandas.tseries.offsets.FY5253Quarter.is_on_offset,1
1070,pandas.tseries.offsets.Second.is_anchored,1
1071,pandas.api.extensions.ExtensionArray.searchsorted,1


## *Interactive exploration* 

Select an error from the following dropdown menu to see a complete description and example of it, and the number of that type of errors in pandas:

In [5]:
# Create dropdown widget
def unique_sorted_values(array):
    unique = array.unique().tolist()
    unique.sort()
    return unique

dropdown_widget = widgets.Dropdown(options=unique_sorted_values(df_code.error_code),
                                   description='Error:')
dropdown_widget

## TODO: Use observe to print error description

Dropdown(description='Error:', options=('ES01', 'EX01', 'EX02', 'EX03', 'GL01', 'GL08', 'PR01', 'PR02', 'PR06'…

You can filter the following table by the error code you want to work on, or its function.

In [6]:
# Create qgrid widget
qgrid_widget = qgrid.show_grid(df, grid_options={'forceFitColumns': True})
qgrid_widget

AttributeError: module 'pandas.core' has no attribute 'index'