Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13835: Cannot ingest empty data #33

Merged
merged 5 commits into from May 31, 2018
Merged

DM-13835: Cannot ingest empty data #33

merged 5 commits into from May 31, 2018

Conversation

kfindeisen
Copy link
Member

This PR changes the behavior of DatasetIngestTask's private methods from crashing to raising a specific exception. It also adds a unit test that could not be implemented because of this bug (at least, not while we were still using testdata_decam).

Having a parser quit the program makes sense when the user provides
bad arguments to the main program, but is dangerous when it happens as
an implementation detail of a submodule.
@kfindeisen kfindeisen requested a review from parejkoj May 30, 2018 22:22
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful of changes requested, but I don't need to see it again. Thanks for documenting the exceptions.

Raises
------
RuntimeError
Raised if there are no files to ingest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this. We need more docstrings that call out known exceptions.

"""
if not dataFiles:
raise RuntimeError("No raw files to ingest.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be a little more explicit here, to help with debugging: "No raw files to ingest; dataFiles is %s."%dataFiles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if statement guarantees that won't display anything interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataFiles could be None, False, [], (), {}, ''. It might be informative to know which of those it was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, weakly typed languages. 😞 I'll write it with %r, then.

"""
if not calibDataFiles:
raise RuntimeError("No calib files to ingest.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, please include calibDataFiles in the raised string.

with tarfile.open(defectTarball, "r") as opened:
if opened.getNames():
if not os.path.isdir(defectDir):
os.mkdir(defectDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use pathlib here in place of if exist: mkdir, to avoid os.mkdir race conditions on parallel ingest:

pathlib.Path(defectDir).mkdir(parents=True, exist_ok=True)

https://stackoverflow.com/a/14364249
https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure parallel ingestion is possible, but your solution is more elegant.

butler = self._rawButler()
for datum in IngestionTestSuite.rawData:
dataId = {'visit': datum['visit']}
self.assertEqual(butler.datasetExists('raw', dataId), datum['file'] not in badFiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooog. I'm not too keen on this syntax (it was hard for me to parse), but I'm not sure of a better approach given the butler methods that exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do the approach of if in badFiles: self.assertFalse(...) else: self.assertTrue(...), but I'd find that ickier.

Attempts to ingest empty lists of files now raise an exception. Unit
tests have been updated to reflect the new behavior.
Path.mkdir provides a clean postcondition on the directory's
existence, unlike os.mkdir.
This factoring makes it easier to share the information among
test cases.
@kfindeisen kfindeisen merged commit 3af2236 into master May 31, 2018
@kfindeisen kfindeisen deleted the tickets/DM-13835 branch November 30, 2018 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants