Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13507: Add SHA1 hash interface to SkyMap. #13

Merged
merged 2 commits into from
Feb 9, 2018
Merged

Conversation

TallJimbo
Copy link
Member

No description provided.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me although, given the discussion, I actually expected this to be using __hash__ to define uniqueness of skymaps, and possibly internally hash(). Are you not doing this because you explicitly need SHA1 to work and skymaps are not readonly?

self.config.patchBorder,
self.config.tractOverlap,
self.config.pixelScale,
bytes(str(self.config.projection), encoding='latin1'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why latin1 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These strings are required by the config to be one of a few 3-letter FITS WCS projection names, so I know latin1 is sufficient for representing - until FITS supports unicode in headers, that is :-)

That said, I don't actually know whether knowing it's sufficient means I should use it over something else. I suspect it just doesn't matter, but I'd love to get any feedback you have on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect ascii for a projection code.

Copy link
Member Author

@TallJimbo TallJimbo Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out this doesn't work anyway, because bytes() doesn't take an encoding argument in Python 2 (because it's just str()). I've switched to just using future.utils.tobytes instead, which I think just uses latin1 under the hood.

@PaulPrice
Copy link
Contributor

hash is randomly salted in py3, so not repeatable: https://docs.python.org/3/reference/datamodel.html#object.__hash__

@timj
Copy link
Member

timj commented Feb 8, 2018

Can't we still set __hash__ though by using the SHA1 we calculate?

@timj
Copy link
Member

timj commented Feb 8, 2018

Maybe the SHA1 calculation is too slow for us to reuse it for __hash__?

@TallJimbo
Copy link
Member Author

TallJimbo commented Feb 8, 2018

Yeah, I had thought through not using hash but I hadn't really thought about implementing the __hash__ interface with sha1. I don't think I can do that directly, because I think __hash__ wants an int, not a bytes of length 20. I don't think we need all those bytes for this particular comparison; a non-salted int64 hash that emphasized collision avoidance over performance would be fine - I just don't know of one.

I also think that if I implement __hash__ I need to implement __eq__ to be consistent with it (which might be a good idea anyway, though it's not something I'm eager to spend time on), and perhaps emphasize that all subclasses should immutable, so they can be used as keys in dicts and sets. I'm not sure how strongly implementing __hash__ implies all of the rest.

@timj
Copy link
Member

timj commented Feb 8, 2018

would it work to define __hash__ as hash(self.getSha1())?

@TallJimbo
Copy link
Member Author

TallJimbo commented Feb 8, 2018

would it work to define __hash__ as hash(self.getSha1())?

Yeah, I think that's probably a good idea, and it'd be easy to implement __eq__ on top of getSha1() as well for consistency.

@TallJimbo
Copy link
Member Author

Two new commits: SHA1 values are now computed on first use and then cached, and I've implemented __hash__ and __eq__ on top of that.

@@ -166,6 +170,18 @@ def __iter__(self):
def __len__(self):
return len(self._tractInfoList)

def __hash__(self):
return hash(self._sha1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below shouldn't it be getSha1() that is called. Otherwise how do you know it's been cached?

for tractOverlap in (0.0, 0.01, 0.1): # degrees
config = self.getConfig()
config.tractOverlap = tractOverlap
skyMap = self.getSkyMap(config)
for tractInfo in skyMap:
self.assertAlmostEqual(tractInfo.getTractOverlap().asDegrees(), tractOverlap)
self.assertEqual(len(skyMap), self._NumTracts)
self.assertNotEqual(skyMap.getSha1(), defaultSkyMap.getSha1())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For at least some of these tests, can we now switch them to using the __eq__ implementation?

This comes with a doc note that SkyMaps must be conceptually
immutable, which all current classes already are.
@TallJimbo TallJimbo merged commit 20413be into master Feb 9, 2018
@ktlim ktlim deleted the tickets/DM-13507 branch August 25, 2018 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants