Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
internaut committed Apr 14, 2023
1 parent 590f8bd commit 3f1c861
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ The tmtoolkit package offers several text preprocessing and text mining methods,
`document and token attributes as dataframes <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Accessing-tokens-and-token-attributes>`_
- calculating and `visualizing corpus summary statistics <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Visualizing-corpus-summary-statistics>`_
- finding out and joining `collocations <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Identifying-and-joining-token-collocations>`_
- calculating `token cooccurrences <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Token-cooccurrence-matrices>`_
- `splitting and sampling corpora <https://tmtoolkit.readthedocs.io/en/latest/text_corpora.html#Corpus-functions-for-document-management>`_
- generating `n-grams <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-n-grams>`_ and using
`N-gram models <https://tmtoolkit.readthedocs.io/en/latest/api.html#module-tmtoolkit.ngrammodels>`_
Expand Down
1 change: 1 addition & 0 deletions doc/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ The tmtoolkit package offers several text preprocessing and text mining methods,
`document and token attributes as dataframes <preprocessing.ipynb#Accessing-tokens-and-token-attributes>`_
- calculating and `visualizing corpus summary statistics <preprocessing.ipynb#Visualizing-corpus-summary-statistics>`_
- finding out and joining `collocations <preprocessing.ipynb#Identifying-and-joining-token-collocations>`_
- calculating `token cooccurrences <preprocessing.ipynb#Token-cooccurrence-matrices>`_
- `splitting and sampling corpora <text_corpora.ipynb#Corpus-functions-for-document-management>`_
- generating `n-grams <preprocessing.ipynb#Generating-n-grams>`_ and using
`N-gram models <api.rst#module-tmtoolkit.ngrammodels>`_
Expand Down
17 changes: 17 additions & 0 deletions doc/source/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,23 @@
Version history
===============

0.12.0 - 2023-XX-XX
-------------------

- added optional interoperability functions for data exchange with R
- added ``token_cooccurrence`` function for calculating a token cooccurrence matrix for a corpus
- added common ``by_attr`` argument for many text processing/mining functions to operate only on a certain token
attribute
- added new function ``token_collocation_matrix`` for calculating a token collocation matrix based on bigrams
- added PPMI measure (``ppmi`` function)
- added ``NGramModel`` class for N-gram models
- added ``NaiveBayesClassifier`` class for Naive Bayes classification models
- added `Health News in Twitter Data Set <https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter>`_
- added 5 new languages now supported by SpaCy (Kroatian, Finnish, Korean, Swedish, Ukrainian)
- fix: don't store parallelization worker related attributes on pickling
- updated dependencies (only SpaCy 3.3 or higher is now supported)
- compat. with Python 3.11

0.11.2 - 2022-03-11
-------------------

Expand Down

0 comments on commit 3f1c861

Please sign in to comment.