New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-17950: Make Schema pickleable #440
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. A few small requested changes.
// Pickle support | ||
clsField.def("__reduce__", [clsField](Field<T> const &self) { | ||
return py::make_tuple(clsField, py::make_tuple(self.getName(), self.getDoc(), self.getUnits())); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not using py::pickle
here? Please explain in the code comment, since others will be wondering when they get to this bit.
Likewise for clsSchemaItem
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were following your suggestions from last year in the slack pybind11 channel. Do you think that advice no longer applies?
https://lsstc.slack.com/archives/C31M66U9L/p1531928376000442
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pybind11 documentation clearly says to use py::pickle to implement pickle support in C++. I thought we had concluded it was much safer to do that then use __reduce__
in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I was just going off of the discussion I linked above from slack, which implied that reduce was the better choice. I'll try to convert this to py::pickle
.
return (makeSchemaMapper, (self.getInputSchema(), self.getOutputSchema(), mappings)) | ||
|
||
|
||
def makeSchemaMapper(input, output, mappings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function more generally useful? If so, consider listing it in __all__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not: you have to get the mappings in order first. Because of that, I don't think it would save any code, since you'd have to make the mappings dict.
mapper2 = self._makeMapper() | ||
# input schema should still be equal | ||
self.assertTrue(_checkSchemaIdentical(mapper1.getInputSchema(), mapper2.getInputSchema())) | ||
self.assertNotEqual(mapper1, mapper2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what these two tests are doing. The setup part looks basically identical yet one tests the input schema and one tests the output schema and it's not clear if _makeMapper
does something fundamentally different in the two tests. If _makeMapper
is doing basically the same thing for both tests then I suggest combining the tests to make the code less confusing. If _makeMapper is doing something different then please document what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the test docstrings were pretty clear. These are all separate tests because I needed to keep track of which ones passed and which failed as I was developing it.
Would it help if I renamed the parameters going into _makeMapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is: do the two calls to _makeMapper do something fundamentally different that is relevant to the one kind of test or the other? Or are you just making different mappers for the heck of it? _makeMapper has no docs so I haven't a clue.
If there is no practical difference between the mappers for the two tests then I personally think the tests would be easier to read combined into one. If one fails the traceback will show which bit failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our habit of putting all the tests in the same method can result in difficult development and debugging, because you can flip-flop between fixing one test and breaking another without knowing it, because the first failure masks all the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, but once the tests are written they should pass, and clarity becomes paramount. Somebody reading the tests should understand what's going on. When I read these tests I was confused.
@@ -291,6 +296,82 @@ def testJoin2(self): | |||
self.assertEqual(s1.join("a", "b", "c"), "a_b_c") | |||
self.assertEqual(s1.join("a", "b", "c", "d"), "a_b_c_d") | |||
|
|||
def _makeMapper(self, name1="a", name2="bb", name3="ccc", name4="dddd"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a doc string and preferably make this a public method (name it makeMapper
instead of _makeMapper
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does making a non-test
method in a test suite provide? Nothing will import it, so the only things calling it are module-local, so "public" vs. "private" shouldn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I guess it's a matter of taste. I tend to make these things public because I might end find I want to share them among tests (by subclassing or moving the code). But private works too.
I guess the important thing is that a user that is trying to understand the test will want to know what it does. So private or public, it deserves a full doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the docstring I pushed help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks!
0f415aa
to
01e6aa1
Compare
Add pickle support for Key and SchemaItem, including Flags It was much easier to implement == at the python level for SchemaMapper, and it probably isn't needed in C++.
01e6aa1
to
dac8bbd
Compare
No description provided.