Fix docstrings for`gensim.models.hdpmodel`, `gensim.models.lda_worker` & `gensim.models.lda_dispatcher`(#1667) #1912

gyanesh-m · 2018-02-16T19:41:56Z

This PR fixes the docstrings for lda_worker.py in accordance with numpy-style. There are still some files which need to be fixed and that will be done in later PRs.

(Fixes #1667 )

menshikh-iv

Good start! Please fix my comments + made similar changes for lda_dispatcher.py too

menshikh-iv · 2018-02-19T07:48:18Z

gensim/models/lda_worker.py

-on every node in your cluster. If you wish, you may even run it multiple times \
-on a single machine, to make better use of multiple cores (just beware that \
-memory footprint increases accordingly).
+"""Worker ("slave") process used in computing distributed LDA.


First of all, please fix PEP8 problems (almost lead spaces), look at travis log https://travis-ci.org/RaRe-Technologies/gensim/jobs/342495787#L511

Ok. Also, should I add a section for module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in lda_dispatcher.py ?

menshikh-iv · 2018-02-19T07:49:24Z

gensim/models/lda_worker.py


+Run this script on every node in your cluster. If you wish, you may even 
+run it multiple times on a single machine, to make better use of multiple
+cores (just beware that memory footprint increases accordingly).


Please look at #1892, this is really good way how to document distributed stuff (instruction of running, showing arguments of script in automatic way, etc)

menshikh-iv · 2018-02-19T07:49:49Z

gensim/models/lda_worker.py

+
+    Attributes
+    ----------
+    model : :obj: of :class:`~gensim.models.ldamodel.LdaModel`


no need to write :obj: (here and everywhere)

menshikh-iv · 2018-02-20T05:51:29Z

@gyanesh-m documentation build failed, please have a look https://circleci.com/gh/RaRe-Technologies/gensim/399?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link, you also can build documentation locally with tox -e docs for reproducing the error.

gyanesh-m · 2018-02-20T14:55:57Z

@menshikh-iv Is there a need to mention module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in docstrings of lda_dispatcher.py and lda_worker ? I didn't do it as I couldn't find it in already documented files.

gyanesh-m · 2018-02-21T06:13:14Z

@menshikh-iv Also if hdpmodel.py is not taken, I would like to add documentation for it .

menshikh-iv

Looks good

menshikh-iv · 2018-02-21T07:44:53Z

gensim/models/lda_worker.py

-        default=True, const=False
-    )
-    parser.add_argument("--hmac", help="Nameserver hmac key (default: %(default)s)", default=None)
+        "--no-broadcast", help="Disable broadcast \


why reformatting? we using 120 characters limit

oh, I ran flake8 and it was giving error for lines above 79 chars. Anyways ,I will change it then.

you should use our flake config: tox -e flake8

menshikh-iv · 2018-02-21T07:45:14Z

gensim/models/lda_worker.py

@@ -141,7 +260,8 @@ def main():
        "port": args.port,
        "hmac_key": args.hmac
    }
-    utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(), random_suffix=True, ns_conf=ns_conf)
+    utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(),


no vertical indents (only hanging), here and everywhere.

menshikh-iv · 2018-02-21T07:48:58Z

@gyanesh-m OK, please ping me when you finished with HDP (and don't forget to fix my comments).
After - I'll cleanup PR and merge (and we'll continue the process in distinct PR).

gyanesh-m · 2018-02-23T22:05:32Z

@menshikh-iv Hi, I am done with hdpmodel.py. Please review it.

gyanesh-m · 2018-02-26T08:55:43Z

@menshikh-iv Hi, this is a reminder, please review the hdpmodel.py soon.

menshikh-iv · 2018-02-26T09:20:30Z

@gyanesh-m don't worry, I remember, but you will have to wait, sorry.

gyanesh-m · 2018-02-26T09:26:43Z

@menshikh-iv Ok, np. So is it fine if I start solving another issue ?

menshikh-iv · 2018-02-26T09:28:50Z

@gyanesh-m yeah, helps guys with #1901, this is not really hard, but critical now,

menshikh-iv · 2018-03-12T15:10:48Z

@gyanesh-m I fixed all distributed stuff, please fix hdpmodel.py too

menshikh-iv

Hello @gyanesh-m, please look at my comments & changes and fix suggested comment for hdpmodel.py.

menshikh-iv · 2018-03-12T15:12:51Z

gensim/models/hdpmodel.py

+        kappa : float, optional
+            Learning rate
+        tau : float, optional
+            Slow down parameter


What does this mean, can you describe it in more details? If something isn't clear - this is a bad description.

is this fine -

kappa: float,optional Learning parameter which acts as exponential decay factor to influence extent of learning from each batch. tau: float, optional Learning parameter which down-weights early iterations of documents.```

@gyanesh-m sounds better than current description 👍

menshikh-iv · 2018-03-12T15:13:46Z

gensim/models/hdpmodel.py

+
+        Parameters
+        ----------
+        bow : sequence of list of tuple of ints; [ (int,int) ]


iterable of list of (int, float) here and everywhere for Corpus in BoW format

menshikh-iv · 2018-03-12T15:14:03Z

gensim/models/hdpmodel.py

+
+        Returns
+        -------
+            topic distribution for the given document `bow`, as a list of `(topic_id, topic_probability)` 2-tuples.


missing type, should be list of (int, float)

menshikh-iv · 2018-03-12T15:14:32Z

gensim/models/hdpmodel.py

+        Returns
+        -------
+        numpy.ndarray
+            Gamma value.


What's is Gamma in this case?

This is the first level concentration. It is mentioned under the parameters section. Do I need to mention it here again?

@gyanesh-m I think yes

menshikh-iv · 2018-03-12T15:14:48Z

gensim/models/hdpmodel.py

+            single document.
+        outputdir : str, optional
+            Stores topic and options information in the specified directory.
+        random_state : :class:`~np.random.RandomState`, optional


are you sure about type?

Actually the parameter's type is {None, int, array_like} but the attribute type is the one I mentioned.I got it from here. Should I go with the parameter's type ?

you can mention all of this (mentioned 3 + current)

menshikh-iv · 2018-03-12T15:20:35Z

gensim/models/hdpmodel.py

+        topn : int, optional
+            Number of most probable words to show from given `topic_id`.
+        log : bool, optional
+            Logs a message with level INFO on the logger object.


If True ...

menshikh-iv · 2018-03-12T15:21:32Z

gensim/models/hdpmodel.py

-        Returns:
-            np.ndarray: `num_topics` x `vocabulary_size` array of floats which represents
-            the term topic matrix learned during inference.
+        """Returns the term topic matrix learned during inference.


Better to use Get instead of Return in first line

menshikh-iv · 2018-03-12T15:22:05Z

gensim/models/hdpmodel.py

-        """legacy method; use `self.save()` instead"""
+        """Saves all the topics discovered.
+
+        .. note:: This is a legacy method; use `self.save()` instead.


In numpy-style, this should look like

Notes ----- .....

here and everywhere

menshikh-iv · 2018-03-12T15:23:09Z

gensim/models/hdpmodel.py

@@ -571,9 +850,34 @@ def evaluate_test_corpus(self, corpus):


 class HdpTopicFormatter(object):
+    """Helper class to format the output of topics and most probable words for display."""


Helper for what class (missed reference)

menshikh-iv · 2018-03-12T15:23:49Z

gensim/models/hdpmodel.py

        return self.show_topics(num_topics, num_words, True)

    def show_topics(self, num_topics=10, num_words=10, log=False, formatted=True):
+        """Gives the most probable `num_words` words from `num_topics` topics.


Give, Print instead of Gives, Prints in the first line of docstring (here and everywhere).

menshikh-iv · 2018-03-12T15:32:11Z

@gyanesh-m when you plan to finish this? I can already merge distributed stuff, I also see that you need to make a lot of work with HDP model.

We have to variants

Revert hdp change, merge distributed and you'll continue with HDP in new PR
Fix HDP in current PR (if you make it fast).

What do you think?

gyanesh-m · 2018-03-12T16:33:11Z

@menshikh-iv Thanks for the minor fixes. I think I will be able to fix the hdpmodel.py completely in around 3 hours. I will get started with it right away.

menshikh-iv · 2018-03-12T16:37:22Z

@gyanesh-m 3 hours with the general description, how the model works? Wow, sounds fantastic, good luck!

menshikh-iv · 2018-03-15T14:35:09Z

Hey @gyanesh-m, how is going?

gyanesh-m · 2018-03-15T16:25:44Z

@menshikh-iv Hi, currently I am on page 3. I was having some trouble in understanding it so I thought of going through the basics first. Currently, I have gone through the following tutorials as of now

Dirchlet distribution
Beta distribution
Dirchlet process MM
GMM and EM
Also, I am currently reading a paper on Hierarchical dirchlet process which explains the stick breaking algorithm part clearly. I think I will understand things better once I am done with this paper. I will hopefully complete it soon.

menshikh-iv · 2018-03-20T07:02:57Z

@gyanesh-m nice work 🥇 I need to clean up & merge this, thanks for your work!

gyanesh-m · 2018-03-20T07:10:21Z

@menshikh-iv You're welcome! Happy to help. Also, thank you for your support and guidance too .

Fix docstrings for lda_worker

0ffd13c

menshikh-iv suggested changes Feb 19, 2018

View reviewed changes

gyanesh-m added 2 commits February 20, 2018 07:40

Fix pep8 errors

daab82d

Fix docstrings for lda_dispatcher

592d32a

Fix multi-line docstring description problem

b3fd8c8

gyanesh-m changed the title ~~Fix docstrings for gensim.models.lda_worker (#1667)~~ Fix docstrings for gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 20, 2018

menshikh-iv suggested changes Feb 21, 2018

View reviewed changes

gyanesh-m added 2 commits February 21, 2018 15:11

Fix indentation and char length issue

e833ede

Fix docstrings for hdpmodel.py

4f58ac8

gyanesh-m changed the title ~~Fix docstrings for gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667)~~ Fix docstrings forgensim.models.hdpworker, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 23, 2018

gyanesh-m changed the title ~~Fix docstrings forgensim.models.hdpworker, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667)~~ Fix docstrings forgensim.models.hdpmodel, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 25, 2018

menshikh-iv added the RFM label Mar 5, 2018

menshikh-iv added 4 commits March 12, 2018 14:54

Merge remote-tracking branch 'upstream/develop' into develop

dc86dad

Fix lda_worker (+ small fix for lsi_worker)

430f772

Fix lda_dispatcher (+ small fix for lsi)

725a102

partial fix for hdpmodel

7bae16e

menshikh-iv removed the RFM label Mar 12, 2018

menshikh-iv suggested changes Mar 12, 2018

View reviewed changes

gyanesh-m added 3 commits March 19, 2018 12:28

Fix changes reviewed, except description for model

b1458a8

Add description for hdpmodel

ce47b1d

fix minor error

9b68b16

menshikh-iv added the RFM label Mar 20, 2018

menshikh-iv added 4 commits April 2, 2018 17:25

fix hdpmodel[1]

72d5f0f

fix hdpmodel[2]

66d7b18

fix hdpmodel[3]

2e8249b

fix hdpmodel[4]

0256d7c

menshikh-iv approved these changes Apr 2, 2018

View reviewed changes

menshikh-iv merged commit 1611f3a into piskvorky:develop Apr 2, 2018

menshikh-iv added this to To Do in Documentation via automation Apr 3, 2018

menshikh-iv moved this from To Do to Done in Documentation Apr 3, 2018

piskvorky mentioned this pull request Jun 25, 2018

restyle API Ref #2102

Closed

		@@ -571,9 +850,34 @@ def evaluate_test_corpus(self, corpus):


		class HdpTopicFormatter(object):
		"""Helper class to format the output of topics and most probable words for display."""

Fix docstrings forgensim.models.hdpmodel, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) #1912

Fix docstrings forgensim.models.hdpmodel, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) #1912

Conversation

gyanesh-m commented Feb 16, 2018

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 20, 2018

gyanesh-m commented Feb 20, 2018

gyanesh-m commented Feb 21, 2018 • edited

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 21, 2018

gyanesh-m commented Feb 23, 2018

gyanesh-m commented Feb 26, 2018

menshikh-iv commented Feb 26, 2018

gyanesh-m commented Feb 26, 2018

menshikh-iv commented Feb 26, 2018

menshikh-iv commented Mar 12, 2018

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gyanesh-m Mar 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Mar 12, 2018

gyanesh-m commented Mar 12, 2018

menshikh-iv commented Mar 12, 2018

menshikh-iv commented Mar 15, 2018

gyanesh-m commented Mar 15, 2018

menshikh-iv commented Mar 20, 2018

gyanesh-m commented Mar 20, 2018

Fix docstrings for`gensim.models.hdpmodel`, `gensim.models.lda_worker` & `gensim.models.lda_dispatcher`(#1667) #1912

Fix docstrings for`gensim.models.hdpmodel`, `gensim.models.lda_worker` & `gensim.models.lda_dispatcher`(#1667) #1912

gyanesh-m commented Feb 21, 2018 •

edited

gyanesh-m Mar 12, 2018 •

edited