Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

[#396] New processor to merge trial and record identifiers #61

Merged
merged 2 commits into from Sep 29, 2016

Conversation

vitorbaptista
Copy link
Contributor

@nightsh
Copy link
Member

nightsh commented Sep 29, 2016

I don't know how to check if it's picking up all the required rows for update, but by reading the query I can only assume that. So I'm OK with that.

But... After I run it once against the API DB dump I have, I can still run it again and have it pick up a few trials again and update them. Same story.

Even more: it's never ending. If I run it a 3rd time, it outputs this. Same on a subsequent 4th, 5th and so on:

INFO:processors.merge_trial_identifiers.processor:[1] Trial 011ba7e60ea14087ab9b18afd1632811 was updated
INFO:processors.merge_trial_identifiers.processor:[2] Trial 50f12b8a73de4379bea53de56633e9f4 was updated
INFO:processors.merge_trial_identifiers.processor:[3] Trial fcfa0dd7587c40c19f9f5dcdb81291de was updated
INFO:processors.merge_trial_identifiers.processor:3 trials updated
INFO:processors.merge_trial_identifiers.processor:0 trials failed

The reason it's failing is because, somehow, the dashes are removed from the ID. Of course, the update should fail. But it's not!

Random pieces of the puzzle:

  • our ES index can return a trial even if you request an ID without dashes. It's fuzzy. As far as I'm aware, our Pg doesn't do this.
  • why doesn't raise an exception if it fails? I'm almost sure it does, but...

🐰hole again?

@pwalsh
Copy link
Member

pwalsh commented Sep 29, 2016

@nightsh completely out of context, but reading your last comment: uuids with or without dashes are "the same" as far as a UUID object is concerned in python or postgres, and probably elasticsearch too. i.e.: it is not a string, and the dashes don't matter this way or that.

@nightsh
Copy link
Member

nightsh commented Sep 29, 2016

@pwalsh wow, that's an interesting information, thanks!
(I'm now revisiting the code to see if this changes anything)

@pwalsh
Copy link
Member

pwalsh commented Sep 29, 2016

import uuid

out1 = uuid.uuid4()
nodash = out1.hex
out2 = uuid.UUID(nodash, version=4)
out2 == out1  # returns True

@nightsh
Copy link
Member

nightsh commented Sep 29, 2016

@vitorbaptista following Paul's comment (which was new info for me) I'd dare to suggest going for my original PR for this, but running the update function until there are no more left? Or something around those lines. We're circling around this issue for a while now.

By the way @vitorbaptista, have you run my PR against your DB? Maybe it's just my host (or my local containers) acting weird, so far nobody bothered to confirm it in other environments.

@vitorbaptista
Copy link
Contributor Author

As we talked, you found an actual bug. Just fixed it on d6e4e43.

I ran your PR against my local database, but I didn't have a production dump at the time, so couldn't reproduce the error. I got the dump now, but it took a while and I wrote this in the meantime. This is good to go and tested, so I don't think it's worthy for us to spend time looking for the bug in your code (sounds like something for a long and lazy Sunday 😄 )

@vitorbaptista
Copy link
Contributor Author

@nightsh Apart from the error you found, is there anything else blocking this to be merged?

@nightsh
Copy link
Member

nightsh commented Sep 29, 2016

Nope, go for it! 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants