update docs

internaut · Apr 14, 2023 · 3f1c861 · 3f1c861
1 parent 590f8bd
commit 3f1c861
Show file tree

Hide file tree

Showing 3 changed files with 19 additions and 0 deletions.
diff --git a/README.rst b/README.rst
@@ -61,6 +61,7 @@ The tmtoolkit package offers several text preprocessing and text mining methods,
   `document and token attributes as dataframes <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Accessing-tokens-and-token-attributes>`_
 - calculating and `visualizing corpus summary statistics <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Visualizing-corpus-summary-statistics>`_
 - finding out and joining `collocations <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Identifying-and-joining-token-collocations>`_
+- calculating `token cooccurrences <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Token-cooccurrence-matrices>`_
 - `splitting and sampling corpora <https://tmtoolkit.readthedocs.io/en/latest/text_corpora.html#Corpus-functions-for-document-management>`_
 - generating `n-grams <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-n-grams>`_ and using
   `N-gram models <https://tmtoolkit.readthedocs.io/en/latest/api.html#module-tmtoolkit.ngrammodels>`_

diff --git a/doc/source/intro.rst b/doc/source/intro.rst
@@ -59,6 +59,7 @@ The tmtoolkit package offers several text preprocessing and text mining methods,
   `document and token attributes as dataframes <preprocessing.ipynb#Accessing-tokens-and-token-attributes>`_
 - calculating and `visualizing corpus summary statistics <preprocessing.ipynb#Visualizing-corpus-summary-statistics>`_
 - finding out and joining `collocations <preprocessing.ipynb#Identifying-and-joining-token-collocations>`_
+- calculating `token cooccurrences <preprocessing.ipynb#Token-cooccurrence-matrices>`_
 - `splitting and sampling corpora <text_corpora.ipynb#Corpus-functions-for-document-management>`_
 - generating `n-grams <preprocessing.ipynb#Generating-n-grams>`_ and using
   `N-gram models <api.rst#module-tmtoolkit.ngrammodels>`_

diff --git a/doc/source/version_history.rst b/doc/source/version_history.rst
@@ -3,6 +3,23 @@
 Version history
 ===============
 
+0.12.0 - 2023-XX-XX
+-------------------
+
+- added optional interoperability functions for data exchange with R
+- added ``token_cooccurrence`` function for calculating a token cooccurrence matrix for a corpus
+- added common ``by_attr`` argument for many text processing/mining functions to operate only on a certain token
+  attribute
+- added new function ``token_collocation_matrix`` for calculating a token collocation matrix based on bigrams
+- added PPMI measure (``ppmi`` function)
+- added ``NGramModel`` class for N-gram models
+- added ``NaiveBayesClassifier`` class for Naive Bayes classification models
+- added `Health News in Twitter Data Set <https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter>`_
+- added 5 new languages now supported by SpaCy (Kroatian, Finnish, Korean, Swedish, Ukrainian)
+- fix: don't store parallelization worker related attributes on pickling
+- updated dependencies (only SpaCy 3.3 or higher is now supported)
+- compat. with Python 3.11
+
 0.11.2 - 2022-03-11
 -------------------