Skip to content

Commit

Permalink
docs(eda): enrich parameters in report
Browse files Browse the repository at this point in the history
  • Loading branch information
jinglinpeng committed Nov 25, 2021
1 parent 9e6f8e8 commit 3d0a148
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 7 deletions.
26 changes: 22 additions & 4 deletions docs/source/user_guide/eda/create_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,25 @@ Or we want to open the report in browser::

Or just save the report to local::

report.save(filename='report_01', to='~/Desktop')
report.save(filename='report.html')


You can see the full report :download:`here <../../_static/images/create_report/titanic_dp.html>`


Enable/Disable sections
==================
Computing all the sections is time consuming. You can enable/disable sections using the following two approaches.

1. Just show a few sections by setting `display` argument. E.g., run the following code to show only the overview section and the variables section::

report = create_report(df, display=['Overview', 'Variables'])

2. Just disable a few sections by setting `enable` to False. E.g., run the following code to disable interactions section::

report = create_report(df, config={'interactions.enable': False})


`Overview` section
==================

Expand Down Expand Up @@ -82,19 +96,23 @@ For datetime variable, the report shows line chart
`Interactions` section
======================

In this section, the report will show an interactive plot, user can use the dropdown menu above the plot to select which two variables user wants to compare.
In this section, the report will show an interactive plot, user can use the dropdown menu above the plot to select which two variables user wants to compare.

The plot has scatter plot and the regression line regarding to the two variabes.
By default, it show the scatter plot for all numerical columns.

.. raw:: html

<iframe src="../../_static/images/create_report/interactions.html" height="625" width="70%" style="border: 0"></iframe>

You can also enable categorical variables by setting `interactions.cat_enable` to True. It will add categorical-categorical and categorical-numerical interactions::

report = create_report(df, config={'interactions.cat_enable': True})


`Correlations` section
======================

In this section, we can see the correlations bewteen variables in Spearman, Pearson and Kendall matrices.
In this section, we can see the correlations between variables in Spearman, Pearson and Kendall matrices.

.. raw:: html

Expand Down
6 changes: 3 additions & 3 deletions docs/source/user_guide/eda/parameter_configurations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@
"| `line.unit`| str | \"auto\" | Defines the time unit to group values over for a datetime column. It can be \"year\", \"quarter\", \"month\", \"week\", \"day\", \"hour\", \"minute\", \"second\". With default value \"auto\", it will use the time unit such that the resulting number of groups is closest to 15 in the `Line Chart` |\n",
"| `line.agg`| str | \"mean\" | Specify the aggregate to use when aggregating over a numeric column in the `Line Chart` |\n",
"| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` |\n",
"| `scatter.sample_rate`| float | \"None\" | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with sample_size. Set it to 1.0 for no sampling |\n",
"| `scatter.sample_rate`| float | None | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with `scatter.sample_size`. Set it to 1.0 for no sampling |\n",
"| `hexbin.tile_size` | float | \"auto\" | The size of the tile in the hexbin plot. Measured from the middle of a hexagon to its left or right corner in the `Hexbin Plot`.|\n",
"| `nested.ngroups`| int | 10 | Maximum number of most frequent values from the first column to display in the `Nested Bar Chart` |\n",
"| `nested.nsubgroups`| int | 5 | Maximum number of most frequent values from the second column to display (computed on the filtered data consisting of the most frequent values from the first column) in the `Nested Bar Chart` |\n",
Expand Down Expand Up @@ -153,8 +153,8 @@
"\n",
"| Local Parameter | Type |Default | Description |\n",
"| --- | --- | --- | --- |\n",
"| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` in `plot_correlation(df, x, y)`|",
"| `scatter.sample_rate`| float | \"None\" | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with sample_size. Set it to 1.0 for no sampling |\n",
"| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` in `plot_correlation(df, x, y)`|\n",
"| `scatter.sample_rate`| float | None | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with `scatter.sample_size`. Set it to 1.0 for no sampling |"
]
},
{
Expand Down

0 comments on commit 3d0a148

Please sign in to comment.