Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-17950: Make Schema pickleable #440

Merged
merged 3 commits into from Mar 9, 2019
Merged

DM-17950: Make Schema pickleable #440

merged 3 commits into from Mar 9, 2019

Conversation

parejkoj
Copy link
Contributor

@parejkoj parejkoj commented Mar 1, 2019

No description provided.

Copy link
Contributor

@r-owen r-owen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. A few small requested changes.

// Pickle support
clsField.def("__reduce__", [clsField](Field<T> const &self) {
return py::make_tuple(clsField, py::make_tuple(self.getName(), self.getDoc(), self.getUnits()));
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using py::pickle here? Please explain in the code comment, since others will be wondering when they get to this bit.

Likewise for clsSchemaItem below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were following your suggestions from last year in the slack pybind11 channel. Do you think that advice no longer applies?

https://lsstc.slack.com/archives/C31M66U9L/p1531928376000442

Copy link
Contributor

@r-owen r-owen Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pybind11 documentation clearly says to use py::pickle to implement pickle support in C++. I thought we had concluded it was much safer to do that then use __reduce__ in C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I was just going off of the discussion I linked above from slack, which implied that reduce was the better choice. I'll try to convert this to py::pickle.

return (makeSchemaMapper, (self.getInputSchema(), self.getOutputSchema(), mappings))


def makeSchemaMapper(input, output, mappings):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function more generally useful? If so, consider listing it in __all__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not: you have to get the mappings in order first. Because of that, I don't think it would save any code, since you'd have to make the mappings dict.

mapper2 = self._makeMapper()
# input schema should still be equal
self.assertTrue(_checkSchemaIdentical(mapper1.getInputSchema(), mapper2.getInputSchema()))
self.assertNotEqual(mapper1, mapper2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what these two tests are doing. The setup part looks basically identical yet one tests the input schema and one tests the output schema and it's not clear if _makeMapper does something fundamentally different in the two tests. If _makeMapper is doing basically the same thing for both tests then I suggest combining the tests to make the code less confusing. If _makeMapper is doing something different then please document what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the test docstrings were pretty clear. These are all separate tests because I needed to keep track of which ones passed and which failed as I was developing it.

Would it help if I renamed the parameters going into _makeMapper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is: do the two calls to _makeMapper do something fundamentally different that is relevant to the one kind of test or the other? Or are you just making different mappers for the heck of it? _makeMapper has no docs so I haven't a clue.

If there is no practical difference between the mappers for the two tests then I personally think the tests would be easier to read combined into one. If one fails the traceback will show which bit failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our habit of putting all the tests in the same method can result in difficult development and debugging, because you can flip-flop between fixing one test and breaking another without knowing it, because the first failure masks all the others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, but once the tests are written they should pass, and clarity becomes paramount. Somebody reading the tests should understand what's going on. When I read these tests I was confused.

@@ -291,6 +296,82 @@ def testJoin2(self):
self.assertEqual(s1.join("a", "b", "c"), "a_b_c")
self.assertEqual(s1.join("a", "b", "c", "d"), "a_b_c_d")

def _makeMapper(self, name1="a", name2="bb", name3="ccc", name4="dddd"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc string and preferably make this a public method (name it makeMapper instead of _makeMapper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What advantage does making a non-test method in a test suite provide? Nothing will import it, so the only things calling it are module-local, so "public" vs. "private" shouldn't really matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I guess it's a matter of taste. I tend to make these things public because I might end find I want to share them among tests (by subclassing or moving the code). But private works too.

I guess the important thing is that a user that is trying to understand the test will want to know what it does. So private or public, it deserves a full doc string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the docstring I pushed help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks!

@parejkoj parejkoj force-pushed the tickets/DM-17950 branch 2 times, most recently from 0f415aa to 01e6aa1 Compare March 8, 2019 23:39
Add pickle support for Key and SchemaItem, including Flags

It was much easier to implement == at the python level for SchemaMapper,
and it probably isn't needed in C++.
@parejkoj parejkoj merged commit 41c8d5d into master Mar 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants