docs(eda): enrich parameters in report

sfu-db · Nov 25, 2021 · 3d0a148 · 3d0a148
1 parent 9e6f8e8
commit 3d0a148
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 7 deletions.
diff --git a/docs/source/user_guide/eda/create_report.rst b/docs/source/user_guide/eda/create_report.rst
@@ -46,11 +46,25 @@ Or we want to open the report in browser::
 
 Or just save the report to local::
 
-    report.save(filename='report_01', to='~/Desktop')
+    report.save(filename='report.html')
 
 
 You can see the full report :download:`here <../../_static/images/create_report/titanic_dp.html>`
 
+
+Enable/Disable sections
+==================
+Computing all the sections is time consuming. You can enable/disable sections using the following two approaches.
+
+1. Just show a few sections by setting `display` argument. E.g., run the following code to show only the overview section and the variables section::
+
+    report = create_report(df, display=['Overview', 'Variables'])
+
+2. Just disable a few sections by setting `enable` to False. E.g., run the following code to disable interactions section::
+
+    report = create_report(df, config={'interactions.enable': False})
+
+
 `Overview` section
 ==================
 
@@ -82,19 +96,23 @@ For datetime variable, the report shows line chart
 `Interactions` section
 ======================
 
-In this section, the report will show an interactive plot, user can use the dropdown menu above the plot to select which two variables user wants to compare.
+In this section, the report will show an interactive plot, user can use the dropdown menu above the plot to select which two variables user wants to compare. 
 
-The plot has scatter plot and the regression line regarding to the two variabes.
+By default, it show the scatter plot for all numerical columns.
 
 .. raw:: html
 
     <iframe src="../../_static/images/create_report/interactions.html" height="625" width="70%" style="border: 0"></iframe>
 
+You can also enable categorical variables by setting `interactions.cat_enable` to True. It will add categorical-categorical and categorical-numerical interactions::
+
+    report = create_report(df, config={'interactions.cat_enable': True})
+
 
 `Correlations` section
 ======================
 
-In this section, we can see the correlations bewteen variables in Spearman, Pearson and Kendall matrices.
+In this section, we can see the correlations between variables in Spearman, Pearson and Kendall matrices.
 
 .. raw:: html
 

diff --git a/docs/source/user_guide/eda/parameter_configurations.ipynb b/docs/source/user_guide/eda/parameter_configurations.ipynb
@@ -107,7 +107,7 @@
     "| `line.unit`| str | \"auto\" | Defines the time unit to group values over for a datetime column. It can be \"year\", \"quarter\", \"month\", \"week\", \"day\", \"hour\", \"minute\", \"second\". With default value \"auto\", it will use the time unit such that the resulting number of groups is closest to 15 in the `Line Chart` |\n",
     "| `line.agg`| str | \"mean\" | Specify the aggregate to use when aggregating over a numeric column in the `Line Chart` |\n",
     "| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` |\n",
-    "| `scatter.sample_rate`| float | \"None\" | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with sample_size. Set it to 1.0 for no sampling |\n",
+    "| `scatter.sample_rate`| float | None | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with `scatter.sample_size`. Set it to 1.0 for no sampling |\n",
     "| `hexbin.tile_size` | float | \"auto\" | The size of the tile in the hexbin plot. Measured from the middle of a hexagon to its left or right corner in the `Hexbin Plot`.|\n",
     "| `nested.ngroups`| int | 10 | Maximum number of most frequent values from the first column to display in the `Nested Bar Chart` |\n",
     "| `nested.nsubgroups`| int | 5 | Maximum number of most frequent values from the second column to display (computed on the filtered data consisting of the most frequent values from the first column) in the `Nested Bar Chart` |\n",
@@ -153,8 +153,8 @@
     "\n",
     "| Local Parameter | Type |Default | Description |\n",
     "| --- | --- | --- | --- |\n",
-    "| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` in `plot_correlation(df, x, y)`|",
-    "| `scatter.sample_rate`| float | \"None\" | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with sample_size. Set it to 1.0 for no sampling |\n",
+    "| `scatter.sample_size`| int | 1000 | Number of points to randomly sample per partition in the `Scatter Plot` in `plot_correlation(df, x, y)`|\n",
+    "| `scatter.sample_rate`| float | None | Defines the sample rate per partition in the `Scatter Plot`. Cannot be used with `scatter.sample_size`. Set it to 1.0 for no sampling |"
    ]
   },
   {