Skip to content

Commit

Permalink
Add custom preprocessing tutorial to Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
sergioburdisso committed May 24, 2020
1 parent 0b8e970 commit e28466a
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 16 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ Further Readings
tutorials/topic-categorization
tutorials/movie-review
tutorials/extract-insight
tutorials/custom-preprocessing

.. toctree::
:maxdepth: 3
Expand Down
2 changes: 2 additions & 0 deletions docs/user_guide/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ Tutorials

* :ref:`extract-insight`

* :ref:`custom-preprocessing`


API Documentation
-----------------
Expand Down
3 changes: 3 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ This folder contains files related to tutorials, for instance, the following tut

:page_facing_up: [Getting the text fragments involved in the classification decision](https://pyss3.readthedocs.io/en/latest/tutorials/extract-insight.html)

:page_facing_up: [Working with custom preprocessing methods](https://pyss3.readthedocs.io/en/latest/tutorials/custom-preprocessing.html)


# Jupyter Notebook

Expand All @@ -17,3 +19,4 @@ Open any of the Jupyter Notebook files contained in this folder on Binder, an on
* [extract_insight.ipynb](https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/extract_insight.ipynb)
* [topic_categorization.ipynb](https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/topic_categorization.ipynb)
* [movie_review.ipynb](https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/movie_review.ipynb)
* [custom_preprocessing.ipynb](https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/custom_preprocessing.ipynb)
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with custom user-defined preprocessing methods.\n",
"# Working with custom preprocessing methods\n",
"<br>\n",
"<div style=\"text-align:right\"><i>To open and run this notebook <b>online</b>, click here: <a href=\"https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/using_custom_preprocessing.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" style=\"display: inline\"></a></i></div>\n",
"<div style=\"text-align:right\"><i>To open and run this notebook <b>online</b>, click here: <a href=\"https://mybinder.org/v2/gh/sergioburdisso/pyss3/master?filepath=examples/custom_preprocessing.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" style=\"display: inline\"></a></i></div>\n",
"\n",
"In this notebook, we will see an example showing how we can use a custom preprocessing method and then visualizing it in the Live Test Tool.\n",
"In this notebook, we will see an example showing how we can define and use our own custom preprocessing methods in PySS3 and also how we can tell the Live Test Tool to use them as well.\n",
"\n",
"Let's begin by importing the needed modules:\n",
"\n",
Expand Down Expand Up @@ -107,8 +107,8 @@
"outputs": [],
"source": [
"# In the \"Hyperparameter Optimization\" section at the bottom,\n",
"# it is shown how we obtained these hyperparemter values: s=.44, l=.48, p=0.5\n",
"clf = SS3(s=.44, l=.48, p=0.5)\n",
"# it is shown how we obtained these hyperparemter values: s=.44, l=.48, p=.5\n",
"clf = SS3(s=.44, l=.48, p=.5)\n",
"\n",
"# Let the training begin!\n",
"clf.train(x_train_prep, y_train, n_grams=3, prep=False)"
Expand Down Expand Up @@ -141,7 +141,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Not bad. Note: better performance perhaps could be obtained by performing hyperparameter optimization with our new preprocessed dataset, since the hyperparameter values we've used (s=.44, l=.48, p=0.5) were selected using the default preprocessing (but we're keeping this notebook as simple as possible).\n",
"Not bad. Note: better performance perhaps could be obtained by performing hyperparameter optimization with our new preprocessed dataset, since the hyperparameter values we've used (``s=0.44, l=0.48, p=0.5``) were selected using the default preprocessing (but we're keeping this notebook as simple as possible).\n",
"\n",
"OK, suppose we now want to visualize what our classifier is learning and how he's carrying out the classification process, we could just use the live test as usual but this time using our preprocessed test documents (``x_test_prep``) and again disabling the default preprocessing process (``prep=False``), as follows:"
]
Expand Down Expand Up @@ -242,15 +242,8 @@
"print(\"Smoothness(s):\", best_s)\n",
"print(\"Significance(l):\", best_l)\n",
"print(\"Sanction(p):\", best_p)\n",
"print(\"Alpha(a):\", best_a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Alpha(a):\", best_a)\n",
"\n",
"Evaluation.plot()"
]
},
Expand Down Expand Up @@ -301,7 +294,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The accuracy improved! it went from .828 to .853 :)"
"The accuracy improved! it went from 0.828 to 0.853 :)"
]
}
],
Expand Down

0 comments on commit e28466a

Please sign in to comment.