[WIP] Fix backward incompatibility due to `random_state` #1327

chinmayapancholi13 · 2017-05-16T01:07:50Z

This PR fixes issue #1082.

tmylk · 2017-05-16T16:58:50Z

Please add unit tests

chinmayapancholi13 · 2017-05-16T17:00:27Z

@tmylk Yes. Working on adding tests.

chinmayapancholi13 · 2017-05-16T19:37:35Z

@tmylk @menshikh-iv I have added unit tests for ensuring that backward compatibility is not broken due to id2word and random_state in the first 2 commits in this PR.

In the following 2 commits, I also had to add checks isinstance(self.alpha, np.ndarray) and isinstance(self.eta, np.ndarray) otherwise this was leading to an error when self.alpha or self.eta were of type float (since we are checking len(self.alpha.shape) and len(self.eta.shape) here). Particularly, this was happening when I was trying to again save the pre-0.13.2 model after loading.

And the last commit is failing for Python 3.5 because of the test testPasses in gensim.test.test_ldamodel.TestLdaMulticore due to an assertion-error while comparing two floating point numbers. I haven't updated any code related to this in the PR. Should I do something about this error?

tmylk

Logic changes needed

tmylk · 2017-05-16T23:46:38Z

gensim/models/ldamodel.py

@@ -1004,6 +1004,11 @@ def save(self, fname, ignore=['state', 'dispatcher'], separately=None, *args, **
        """
        if self.state is not None:
            self.state.save(utils.smart_extension(fname, '.state'), *args, **kwargs)
+
+        # Save 'random_state' separately


what is the purpose of saving it separately?

@tmylk I think It's better for case "file don't exist" (and more flexible)

tmylk · 2017-05-16T23:54:07Z

gensim/models/ldamodel.py

        else:
-            result.id2word = None
+            logging.warning("random_state not stored on disk so using default value")


if random_state is not stored that means that it is a new version of the model and it is going to be loaded in the main pickle load. Please change the logic

@tmylk Your suggestion is a bit different from the earlier discussion on the issue. Hence, I want to make sure I understand the desired solution before making the changes.

If I understand it correctly, what you are suggesting is :

If there is indeed a file with the extension .random_state present on disk, this means that the model was saved using a pre-0.13.2 version of Gensim. So we use this file to set result.random_state at the time of loading.

However, if there is no such file present on disk, then this means that the model was saved using a post-0.13.2 version of Gensim and thus result.random_state got set at the time of the main pickle load. So in this case, we don't need to do anything else.

But for models saved using a pre-0.13.2 version of Gensim, there was no .random_state file created at the time of saving the model. So while loading such a model from disk, where would the .random_state file come from in this case? Is the user responsible for creating this file explicitly in such a case? If this is true, then I believe we don't need to make any changes in the save function for LdaModel at all and we just need to change the load function.

Please correct me if I am wrong or missing something here. Otherwise, if this is indeed what we need, I could go ahead and make the appropriate changes.

Please add a check that this indeed happened: thus result.random_state got set at the time of the main pickle load.

@tmylk I believe I have understood the solution that we want. However, I have a minor doubt about where would the .random_state file come from? Is it that the user is responsible for creating it explicitly always and we (i.e. from within the save function) need not create it ever?
If this is true, then in case we are loading a pre-0.13.2 model and no .random_state file exists on disk, then should we set result.random_state using a default value like get_random_state(None) with a logger.warning?

@tmylk Could you please respond to this query?

Is it that the user is responsible for creating it explicitly always

I think the user should not think about additional files, he saves the whole model

If this is true, then in case we are loading a pre-0.13.2 model and no .random_state file exists on disk, then should we set result.random_state using a default value like get_random_state(None) with a logger.warning?

I think it's correct

The simple way to do this is check that random_state was loaded, if not - you set result.random_state using a default value like get_random_state(None) with a logger.warning

@menshikh-iv Then there is no .random_state file involved at all, correct? To summarize, the solution is :

First, load the entire model.

Check if result.random_state was set or not. For the newer (post 0.13.2) models, it would have been set through the main pickle load. For the older (pre 0.13.2) models, result.random_state would not be set through the main pickle load so we set result.random_state to get_random_state(None).

And in this solution, we don't need to make any changes in the save function, just the load function.

tmylk · 2017-05-16T23:57:39Z

gensim/test/test_ldamodel.py

+            self.assertTrue(isinstance(i[1], six.string_types))
+
+    def testRandomStateBackwardCompatibility(self):
+        # load a model saved using a pre-0.13.2 version of Gensim


this test is identical to the previous one. just one test is enough that checks all the fields

Sure. I thought since these were two different issues so we'd want to put separate tests to verify both are resolved. I'll make the update according to your suggestion and remove the earlier test here.

Better to avoid code duplication

tmylk · 2017-05-17T00:03:15Z

could you please test that this exact test fails in the version of the code prior to your changes?
Please create a commit with only the test/test_data and zero changes.

chinmayapancholi13 · 2017-05-17T00:24:18Z

@tmylk The last commit now shows that all checks have passed. I guess it's sorted for now then.
I am making the remaining changes as per your suggestions above.

tmylk · 2017-05-17T11:18:04Z

To validate this as a fix we need to see what changed. This is how Test Driven Development works for any issue:

write new tests. See them fail. You have already written the tests, so please create a commit with only the test/test_data and zero code changes.
Write new code. see tests pass. This is what you have done in the last commit.

Could you please add these 2 commits to this PR?

chinmayapancholi13 · 2017-05-17T18:33:11Z

@tmylk Got it. Thanks a lot for the comprehensive feedback. :)
I am adding these 2 commits to the PR.

Edit : I have added the commit having only unit tests and test data, which (as it should) is failing the test which checks random_state backward compatiblity. Could you please respond to my query for your review comments above before I make the remaining changes in the code?

…tate

chinmayapancholi13 · 2017-05-23T08:54:04Z

@tmylk @menshikh-iv I have made changes to the code as well. One of the tests is failing for Python 2.7 in the last commit because of Cannot allocate memory error.

Edit : The last commit now passes all the tests.

…ents1 Added comments explaining logic for changes in PR #1327

fixed backward incompatiblity due to random_state

43d3f00

chinmayapancholi13 changed the title ~~Fix backward incompatibility due to random_state~~ [WIP] Fix backward incompatibility due to random_state May 16, 2017

removed space before semi_colon

addb8c7

chinmayapancholi13 added 4 commits May 16, 2017 10:27

added unit tests for checking id2word and random_state compatibility

0edd447

added space after # in comments

45152e9

added check for numpy.ndarray for alpha and eta

ba5bfb8

fixed typo in previous commit

81d3b3f

tmylk suggested changes May 16, 2017

View reviewed changes

only test and test data for random_state without new code changes

562f959

chinmayapancholi13 added 3 commits May 22, 2017 22:27

changed load function to fix backward incompatibility due to random_S…

8687b17

…tate

added checks for numpy.ndarray before accessing shape attribute

129bd8e

added warning when using default value for random_state

2010559

jayantj mentioned this pull request May 23, 2017

support both old and new fastText model #1319

Merged

updated test to include saving and loading using new Gensim version

c7194c9

menshikh-iv merged commit 7414b60 into piskvorky:develop May 25, 2017

chinmayapancholi13 mentioned this pull request May 25, 2017

Added comments explaining logic for changes in PR #1327 #1365

Merged

menshikh-iv mentioned this pull request May 26, 2017

RandomState (Fix to issue #113) breaks backwards compatibility with old LDA models #1082

Closed

menshikh-iv added a commit that referenced this pull request May 26, 2017

Merge pull request #1365 from chinmayapancholi13/random_state_pr_comm…

370646b

…ents1 Added comments explaining logic for changes in PR #1327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix backward incompatibility due to `random_state` #1327

[WIP] Fix backward incompatibility due to `random_state` #1327

chinmayapancholi13 commented May 16, 2017

tmylk commented May 16, 2017

chinmayapancholi13 commented May 16, 2017

chinmayapancholi13 commented May 16, 2017 •

edited

tmylk left a comment

tmylk May 16, 2017

menshikh-iv May 17, 2017 •

edited

tmylk May 16, 2017

chinmayapancholi13 May 17, 2017

tmylk May 17, 2017

chinmayapancholi13 May 17, 2017

chinmayapancholi13 May 19, 2017

menshikh-iv May 22, 2017

menshikh-iv May 22, 2017

chinmayapancholi13 May 23, 2017

tmylk May 16, 2017

chinmayapancholi13 May 17, 2017

tmylk May 17, 2017

tmylk commented May 17, 2017

chinmayapancholi13 commented May 17, 2017 •

edited

tmylk commented May 17, 2017 •

edited

chinmayapancholi13 commented May 17, 2017 •

edited

chinmayapancholi13 commented May 23, 2017 •

edited

[WIP] Fix backward incompatibility due to random_state #1327

[WIP] Fix backward incompatibility due to random_state #1327

Conversation

chinmayapancholi13 commented May 16, 2017

tmylk commented May 16, 2017

chinmayapancholi13 commented May 16, 2017

chinmayapancholi13 commented May 16, 2017 • edited

tmylk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv May 17, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmylk commented May 17, 2017

chinmayapancholi13 commented May 17, 2017 • edited

tmylk commented May 17, 2017 • edited

chinmayapancholi13 commented May 17, 2017 • edited

chinmayapancholi13 commented May 23, 2017 • edited

[WIP] Fix backward incompatibility due to `random_state` #1327

[WIP] Fix backward incompatibility due to `random_state` #1327

chinmayapancholi13 commented May 16, 2017 •

edited

menshikh-iv May 17, 2017 •

edited

chinmayapancholi13 commented May 17, 2017 •

edited

tmylk commented May 17, 2017 •

edited

chinmayapancholi13 commented May 17, 2017 •

edited

chinmayapancholi13 commented May 23, 2017 •

edited