Skip to content

Commit

Permalink
Merge branch 'release0.12.0' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
internaut committed May 3, 2023
2 parents 066c235 + 576db3c commit 366371a
Show file tree
Hide file tree
Showing 54 changed files with 532 additions and 383 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: publish new tmtoolkit release to PyPI
on: push

jobs:
build-and-publish-test:
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags')
environment:
#name: pypi-test
name: pypi
url: https://pypi.org/p/tmtoolkit
permissions:
id-token: write
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install build dependencies
run: python -m pip install -U setuptools wheel build
- name: Build
run: python -m build .
# - name: Publish package distributions to TestPyPI
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# repository-url: https://test.pypi.org/legacy/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
10 changes: 5 additions & 5 deletions .github/workflows/runtests.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# GitHub actions workflow for testing tmtoolkit
# Runs tests on Ubuntu, MacOS and Windows with Python versions 3.8, 3.9 and 3.10 each, which means 9 jobs are spawned.
# Runs tests on Ubuntu, MacOS and Windows with Python versions 3.8, 3.9, 3.10, 3.11.
# Tests are run using tox (https://tox.wiki/).
#
# author: Markus Konrad <markus.konrad@wzb.eu>
# author: Markus Konrad <post@mkonrad.net>

name: run tests

Expand All @@ -19,12 +19,12 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]
testsuite: ["minimal", "full"]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: set up python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ examples/data/*.pickle
.tox/
.Rhistory
doc/source/data/corpus_norm.pickle
.coverage
.coverage*
examples/data/aclImdb_v1.tar.gz
venv
examples/data/topicmod_evaluate_*.png
2 changes: 1 addition & 1 deletion AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Maintainer / main developer

[Markus Konrad](https://github.com/internaut) @ [WZB](https://github.com/WZBSocialScienceCenter/)
[Markus Konrad](https://github.com/internaut)

## Contributors

Expand Down
33 changes: 16 additions & 17 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,16 @@ operations (via NumPy) and parallel computation (using Python's *multiprocessing

The documentation for tmtoolkit is available on `tmtoolkit.readthedocs.org <https://tmtoolkit.readthedocs.org>`_ and
the GitHub code repository is on
`github.com/WZBSocialScienceCenter/tmtoolkit <https://github.com/WZBSocialScienceCenter/tmtoolkit>`_.

**Upgrade note:**

Since Feb 8 2022, the newest version 0.11.0 of tmtoolkit is available on PyPI. This version features a new API
for text processing and mining which is incompatible with prior versions. It's advisable to first read the
first three chapters of the `tutorial <https://tmtoolkit.readthedocs.io/en/latest/getting_started.html>`_
to get used to the new API. You should also re-install tmtoolkit in a new virtual environment or completely
remove the old version prior to upgrading. See the
`installation instructions <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_.
`github.com/internaut/tmtoolkit <https://github.com/internaut/tmtoolkit>`_.

Requirements and installation
-----------------------------

**tmtoolkit works with Python 3.8 or newer (tested up to Python 3.10).**
**tmtoolkit works with Python 3.8 or newer (tested up to Python 3.11).**

.. note:: There are two dependencies, that don't work with Python 3.11 so far: *lda* and *wordcloud*. If you want to
do topic modeling via LDA and/or want to use word cloud visualizations, you must use Python 3.8 to 3.10 or
wait until lda and wordcloud receive updates that make them work under Python 3.11.

The tmtoolkit package is highly modular and tries to install as few dependencies as possible. For requirements and
installation procedures, please have a look at the
Expand Down Expand Up @@ -66,8 +61,10 @@ The tmtoolkit package offers several text preprocessing and text mining methods,
`document and token attributes as dataframes <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Accessing-tokens-and-token-attributes>`_
- calculating and `visualizing corpus summary statistics <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Visualizing-corpus-summary-statistics>`_
- finding out and joining `collocations <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Identifying-and-joining-token-collocations>`_
- calculating `token cooccurrences <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Token-cooccurrence-matrices>`_
- `splitting and sampling corpora <https://tmtoolkit.readthedocs.io/en/latest/text_corpora.html#Corpus-functions-for-document-management>`_
- generating `n-grams <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-n-grams>`_
- generating `n-grams <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-n-grams>`_ and using
`N-gram models <https://tmtoolkit.readthedocs.io/en/latest/api.html#module-tmtoolkit.ngrammodels>`_
- generating `sparse document-term matrices <https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-a-sparse-document-term-matrix-(DTM)>`_

Wherever possible and useful, these methods can operate in parallel to speed up computations with large datasets.
Expand Down Expand Up @@ -110,6 +107,8 @@ Other features
`text files, tabular files (CSV or Excel), ZIP files or folders <https://tmtoolkit.readthedocs.io/en/latest/text_corpora.html#Loading-text-data>`_
- `splitting and joining documents <https://tmtoolkit.readthedocs.io/en/latest/text_corpora.html#Corpus-functions-for-document-management>`_
- `common statistics and transformations for document-term matrices <https://tmtoolkit.readthedocs.io/en/latest/bow.html>`_ like word cooccurrence and *tf-idf*
- `interoperability with R <https://tmtoolkit.readthedocs.io/en/latest/rinterop.html>`_


Limits
------
Expand All @@ -129,7 +128,7 @@ License
-------

Code licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.
See `LICENSE <https://github.com/WZBSocialScienceCenter/tmtoolkit/blob/master/LICENSE>`_ file.
See `LICENSE <https://github.com/internaut/tmtoolkit/blob/master/LICENSE>`_ file.

.. |pypi| image:: https://badge.fury.io/py/tmtoolkit.svg
:target: https://badge.fury.io/py/tmtoolkit
Expand All @@ -139,12 +138,12 @@ See `LICENSE <https://github.com/WZBSocialScienceCenter/tmtoolkit/blob/master/LI
:target: https://pypi.org/project/tmtoolkit/
:alt: Downloads from PyPI

.. |runtests| image:: https://github.com/WZBSocialScienceCenter/tmtoolkit/actions/workflows/runtests.yml/badge.svg
:target: https://github.com/WZBSocialScienceCenter/tmtoolkit/actions/workflows/runtests.yml
.. |runtests| image:: https://github.com/internaut/tmtoolkit/actions/workflows/runtests.yml/badge.svg
:target: https://github.com/internaut/tmtoolkit/actions/workflows/runtests.yml
:alt: GitHub Actions CI Build Status

.. |coverage| image:: https://raw.githubusercontent.com/WZBSocialScienceCenter/tmtoolkit/master/coverage.svg?sanitize=true
:target: https://github.com/WZBSocialScienceCenter/tmtoolkit/tree/master/tests
.. |coverage| image:: https://raw.githubusercontent.com/internaut/tmtoolkit/master/coverage.svg?sanitize=true
:target: https://github.com/internaut/tmtoolkit/tree/master/tests
:alt: Coverage status

.. |rtd| image:: https://readthedocs.org/projects/tmtoolkit/badge/?version=latest
Expand Down
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
Configuration for tests with pytest
.. codeauthor:: Markus Konrad <markus.konrad@wzb.eu>
.. codeauthor:: Markus Konrad <post@mkonrad.net>
"""

from hypothesis import settings, HealthCheck
Expand Down
4 changes: 2 additions & 2 deletions coverage.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 366371a

Please sign in to comment.