Skip to content

Commit

Permalink
Fixed references to base classes in docs
Browse files Browse the repository at this point in the history
Base classes of dynamically created Reader, Writer, Reader.Setup and
Document classes were not correctly referring to the actual base class
in the docs. Fixed references by correctly setting __qualname__ and
__module__ on the generated classes.

Also added some tricks for the docstrings, so we only use class docstrings
that are set explicitly, not ones that get inherited.
  • Loading branch information
markgw committed Aug 7, 2020
1 parent f5acb2b commit 125b762
Show file tree
Hide file tree
Showing 16 changed files with 51 additions and 28 deletions.
6 changes: 6 additions & 0 deletions docs/commands/recover.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ recover
When a document map module gets killed forcibly, sometimes it doesn't have time to
save its execution state, meaning that it can't pick up from where it left off.

.. todo::

This has not been updated for the Pimarc internal storage format,
so still assumes that tar files are used. It will be updated in
future, if there is a need for it.

This command tries to fix the state so that execution can be resumed. It counts
the documents in the output corpora and checks what the last written document was.
It then updates the state to mark the module as partially executed, so that it
Expand Down
14 changes: 8 additions & 6 deletions docs/commands/reset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Usage:

::

pimlico.sh [...] reset [modules [modules ...]] [-h] [-n]
pimlico.sh [...] reset [modules [modules ...]] [-h] [-n] [-f]


Positional arguments
Expand All @@ -28,9 +28,11 @@ Positional arguments
Options
=======

+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Description |
+=======================+=============================================================================================================================================================+
| ``-n``, ``--no-deps`` | Only reset the state of this module, even if it has dependent modules in an executed state, which could be invalidated by resetting and re-running this one |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Description |
+==========================+========================================================================================================================================================================================+
| ``-n``, ``--no-deps`` | Only reset the state of this module, even if it has dependent modules in an executed state, which could be invalidated by resetting and re-running this one |
+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-f``, ``--force-deps`` | Reset the state of this module and any dependent modules in an executed state, which could be invalidated by resetting and re-running this one. Do not ask for confirmation to do this |
+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.concat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ They must have the same data point type, or one must be a subtype of the other.

This is a filter module. It is not executable, so won't appear in a pipeline's list of modules that can be run. It produces its output for the next module on the fly when the next module needs it.

*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ formatting operations are designed for display, this is generally only useful to
consumption.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.group.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ and the grouping will be preserved as the corpus passes through the pipeline.

This is a filter module. It is not executable, so won't appear in a pipeline's list of modules that can be run. It produces its output for the next module on the fly when the next module needs it.

*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.interleave.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ not currently implemented and may not be worth the trouble. Perhaps we will add

This is a filter module. It is not executable, so won't appear in a pipeline's list of modules that can be run. It produces its output for the next module on the fly when the next module needs it.

*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.list_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ Similar to :mod:`~pimlico.modules.corpora.split`, but instead of taking a random
according to a given list of documents, putting those in the list in one set and the rest in another.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.shuffle.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ comes from a filter module, its documents cannot be randomly accessed.
by ``StoredIterableCorpus``: `https://github.com/markgw/pimlico/issues/24`_


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.shuffle_linear.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ are small.
module type.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.split.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ e.g. in a training-test split, store only the test document list, as the trainin
a case, just put the smaller set first and don't request the optional output `doc_list2`.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.store.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ produced corpus for further use, rather than always running the filters/readers
each time the corpus' documents are needed.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.subsample.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ Random subsample
Randomly subsample documents of a corpus at a given rate to create a smaller corpus.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
2 changes: 0 additions & 2 deletions docs/modules/pimlico.modules.corpora.subset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ over the data to count them up.

This is a filter module. It is not executable, so won't appear in a pipeline's list of modules that can be run. It produces its output for the next module on the fly when the next module needs it.

*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======

Expand Down
4 changes: 4 additions & 0 deletions docs/modules/pimlico.modules.embeddings.store_word2vec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ format, and a ``vocab`` file, containing the vocabulary and word counts.

Uses the Gensim implementation of the storage, so depends on Gensim.

Does not support Python 2, since we depend on Gensim.


*This module does not support Python 2, so can only be used when Pimlico is being run under Python 3*

Inputs
======
Expand Down
25 changes: 25 additions & 0 deletions src/python/pimlico/datatypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,14 @@ def _get_reader_cls(cls):
del my_dict["__dict__"]
if "__weakref__" in my_dict:
del my_dict["__weakref__"]
# Set the reader's __qualname__ so it's properly treated as a nested class of the datatype
my_dict["__qualname__"] = "{}.Reader".format(cls.__qualname__)
my_dict["__module__"] = cls.__module__

# No new documentation is provided, then we don't want to inherit the
# superclass' docstring, but instead let the reader follow the link to see that
if my_dict["__doc__"] is None:
my_dict["__doc__"] = "Reader class for {}".format(cls.__qualname__)

reader_cls = PimlicoDatatypeReaderMeta("Reader", (parent_reader,), my_dict)
setattr(cls, _cache_name, reader_cls)
Expand Down Expand Up @@ -157,6 +165,14 @@ def _get_some_writer_cls(cls):
del new_cls_dict["__dict__"]
if "__weakref__" in new_cls_dict:
del new_cls_dict["__weakref__"]
# Set the writer's __qualname__ so it's properly treated as a nested class of the datatype
new_cls_dict["__qualname__"] = "{}.Writer".format(cls.__qualname__)
new_cls_dict["__module__"] = cls.__module__

# No new documentation is provided, then we don't want to inherit the
# superclass' docstring, but instead let the reader follow the link to see that
if new_cls_dict["__doc__"] is None:
new_cls_dict["__doc__"] = "Writer class for {}".format(cls.__qualname__)

# Perform subclassing so that a new Writer is created that is a subclass of the parent's writer
writer_cls = type("Writer", (parent_writer,), new_cls_dict)
Expand Down Expand Up @@ -216,6 +232,15 @@ def _get_setup_cls(cls):
del my_dict["__dict__"]
if "__weakref__" in my_dict:
del my_dict["__weakref__"]
# Set the reader setup's __qualname__ so it's properly treated as a nested class of the datatype's reader
my_dict["__qualname__"] = "{}.Setup".format(cls.__qualname__)
my_dict["__module__"] = cls.__module__

if my_setup is parent_setup or my_dict["__doc__"] is None:
# If setup was not overridden: don't use the base class' doc
# If no new documentation is provided, then we don't want to inherit the
# superclass' docstring, but instead let the reader follow the link to see that
my_dict["__doc__"] = "Setup class for {}".format(cls.__qualname__)

setup_cls = type("Setup", (parent_setup,), my_dict)
setattr(cls, _cache_name, setup_cls)
Expand Down
8 changes: 8 additions & 0 deletions src/python/pimlico/datatypes/corpora/data_points.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ def _get_document_cls(cls):
del new_dict["__dict__"]
if "__weakref__" in new_dict:
del new_dict["__weakref__"]
# Set the reader setup's __qualname__ so it's properly treated as a nested class of the datatype's reader
new_dict["__qualname__"] = "{}.Document".format(cls.__qualname__)
new_dict["__module__"] = cls.__module__

# If no new documentation is provided, then we don't want to inherit the
# superclass' docstring, but instead let the reader follow the link to see that
if "__doc__" not in new_dict or new_dict["__doc__"] is None:
new_dict["__doc__"] = "Document class for {}".format(cls.__name__)

# Perform subclassing so that a new Document is created that is a subclass of the parent's document
cls.__document_type = type("Document", (parent_doc_cls,), new_dict)
Expand Down

0 comments on commit 125b762

Please sign in to comment.