Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21849: adapt to collections changes, implement RFC-663. #44

Merged
merged 6 commits into from Mar 25, 2020

Conversation

TallJimbo
Copy link
Member

No description provided.

exists: bool


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a dataclass? It seems like a normal class with a bunch of code in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be; I just tend to start prototyping things as dataclasses because I like to think about data structures before methods. These classes were the last thing I was working on before I had to step away from it, and they just never got cleaned up (or documented, as I'm sure you noticed).

This commit probably should have been part of DM-21246.
if args.replace_run:
replaced, _ = self.output.chain[0]
inputs = self.output.chain[1:]
_LOG.debug(f"Simulating collection search in '{self.output.name}' "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer to defer the string interpolation in log messages (especially for debug) so should not use f-strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to believe that could possibly ever matter in Python (I think it's a dubious habit we've gotten into because it does make sense in C++), but it is at least precedent. Will change.

Copy link
Collaborator

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK as far as I can understand all new stuff. Few minor comments.

dest = getattr(namespace, self.dest, {})
# In case default is set to a dict (empty or not) we want to use
dest = getattr(namespace, self.dest, [])
# In case default is set to a list (empty or not) we want to use
# new deep copy of that dictionary as initial value to avoid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# new deep copy of that dictionary as initial value to avoid
# new copy of that list as initial value to avoid

"Comma-separated names of the input collection(s). Any entry "
"includes a colon (:), the first string is a dataset type name "
"that restricts the search in that collection. "
"May be passed multiple times (all arguments are concatenated). "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"May be passed multiple times (all arguments are concatenated). "
"May be passed multiple times (all arguments are concatenated)."

help=(
"Instead of creating a new RUN collection, insert datasets into "
"either the one given by --output-run (if provided) or the first "
"child collection of --output (which must be of type RUN). "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"child collection of --output (which must be of type RUN). "
"child collection of --output (which must be of type RUN)."

help=(
"Delete the datasets in the collection replaced by --replace-run, "
"either just from the datastore ('unstore') or by removing them "
"and the RUN completely ('purge'). Requires --replace-run."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use these values with --replace-run option instead of using a separate option? (Like --replace-run, --replace-run=unstore, --replace-run=purge)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find arguments that only sometimes take a value a little unusual.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not uncommon, but implementing it with Python argparse may not be trivial. I'm ok with whatever you decide.

if args.extend_run:
runName, _ = self.output.chain[0]
else:
runName = "{}/{:%Y%m%dT%Hh%Mm%Ss}".format(self.output, datetime.datetime.now())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make time format be more ISO-compatible? If someone wants to parse that string it will be easier to do it without "hms" in the string (but then I don't know if anyone would want to parse it).
Is this supposed to produce unique string or just a random one (I'm thinking about collisions in shared registry)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove the hms if you think a version with no time delimiter is better; I was worried it would be unreadable with nothing, but I didn't want to use : in a directory name.

It should be unique, but all the scenarios I can imagine in which there might be a collision would be indicative of a higher-level logic problem:

  • if we start parallel jobs that are supposed to write to the same run collection, and they race to create it and get to pick their own timestamps, it's a problem if they end up with different run names - any code doing that should create the run itself first and pass that down to the parallel jobs;
  • if different people start distinct jobs with the same chained collection name, they're either not coordinating a joint processing campaign well or one of them is writing to someone else's namespace, and there should be an error.

I believe the fact that --extend-run or --replace-run are valid if and only if the run already exists guards against both cases, even if it's only the loser of the race that sees the error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself am used to ISO format so I have no problem reading that. Colons are certainly better be avoided in paths. If it is just a random/unique name and no one is supposed to convert it back to timestamp then I guess it does not matter, should be ok to keep hms.

For uniqueness my worry was about use case when the same person starts multiple jobs which should write to unique output run collections, that could cause a collision with time-based unique names. I guess for that use case it's reasonable to require that --output option be unique?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would milliseconds being included in the date be enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Femtoseconds would probably work better 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything I'd be more inclined to round to the nearest 5 or 10 seconds, because making the names more likely to be unique seems more likely to mask higher-level bugs - in other words, I think it's much more likely for a user to start multiple jobs at the same time that they expect to write to the same run (e.g. by launching multiple processes themselves instead of using our tools to launch multiple processes), and it's better for that to fail than for it to write to different runs. But 5 seconds is probably too long for quick try-fail-repeat debugging work, and it's not like relying on time collisions is a reliable way to avoid that problem anyway. But I really don't think there is a problem with 1s resolution.

self.assertEqual(args.input, {})

def assertExpressionsEquivalent(a, b):
"""Test that input collection expressions are requivalent after
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requivalent

TallJimbo and others added 5 commits March 21, 2020 20:45
This passes (unit) tests, but it's not as complete as I'd like it to
be:

 - it removes tests for removed command-line arguments, but doesn't
   add tests for new command-line arguments;

 - the --skip-existing argument feels like it now needs more
   flexibility in defining what "existing" means (see code comment);

 - the new helper dataclasses in cmdLineFwk.py lack API docs (they're
   sufficiently internal that they probably shouldn't land in HTML,
   but they still ought to have docs in the code);

 - I'm not certain it was a good idea to retain the
   "datasetType:collection" syntax for input collections, given that
   it's really now more like a mapping that's oriented the other way
   (whether that was a good idea or not is another question - my
   attitude is that it's an edge-case feature either way, and this way
   lets ctrl_mpexec leverage existing daf_butler features better).

And, the big one: this probably should not be assumed to actually be
working until ci_hsc_gen3 also passes, and I haven't tried that yet.
@TallJimbo TallJimbo merged commit 0418758 into master Mar 25, 2020
@TallJimbo TallJimbo deleted the tickets/DM-21849 branch March 25, 2020 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants