Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-15614: Allow dynamic hierarchy delimiters for configs #84

Merged
merged 33 commits into from Sep 7, 2018

Conversation

timj
Copy link
Member

@timj timj commented Aug 31, 2018

The delimiter for hierarchies no longer defaults to a .. The delimiter is now a class variable and can be overridden by starting the string with the delimiter you wish to use.

@timj timj requested review from TallJimbo and ktlim August 31, 2018 19:27
@timj timj force-pushed the tickets/DM-15614 branch 4 times, most recently from 6659e57 to 190d07d Compare September 4, 2018 20:58
Copy link
Contributor

@ktlim ktlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only got through nameTuples() for now. Still need to look at names() and test_config.py.

"""Implements a datatype that is used by `Butler` for configuration
parameters.

It is essentially a `dict` with key/value pairs, including nested dicts
(as values). In fact, it can be initialized with a `dict`. The only caveat
is that keys may **not** contain dots (``.``). This is explained next:
is that keys may **not** contain the delimiter. This is explained next:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is backwards. You intend to allow any keys whatsoever; it's the accessor who must ensure that the delimiter does not appear in any keys. Instead, this reads like an admonishment to the key creator not to use an unknown delimiter.

string, such that ``foo["a", "b", "c"]`` is equivalent to
``foo[".a.b.c"]``. Finally, the delimiter can be escaped if it should
not be used in part of the string: ``foo[r".a.b\.c"]`` results in a two
element hierarchy of ``a`` and ``b.c``. For hard-coded strings it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you escape an initial non-alphanumeric character? Looking at the code, the answer is no. Maybe you should be able to, or else you should make clear that you can't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I want to support escaping of the delimiter at the start of the key. We can revisit that if it becomes an issue (I'm not convinced people will want to be escaping delimiters at all).

as the delimeter resulting in a single key of ``a.b.c`` being accessed.
Using ``foo[":a:b:c"]`` is therefore equivalent to ``foo[".a.b.c"]``.
This requires that keys in `Config` instances do not themselves start with
non-alphanumeric characters. If the hierarchy is already available in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can access non-alphanumeric-starting keys by selecting a different delimiter: ":.a.b.c" should work, no?

@@ -140,6 +141,9 @@ class Config(_ConfigBase):
If `None` is provided an empty `Config` will be created.
"""

_D = "→"
"""Default internal delimiter to use for components in the hierarchy"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say when this default is used.

else:
return [key, ]
# Allow escaping of the delimiter: .a.foo\.bar -> a foo.bar
escaped = f"\\{d}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe fr"\{d}"? Or that may be too fancy. What you have is fine.

temp = None
if escaped in key:
# Replace with a character that won't be in the string
temp = "\r"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hacky. Perhaps you should just use re.split(fr'(?<=[^\\])\{d}', key)? (The lookbehind should be safe because there should always be at least one non-delimiter character, but you should probably check for that, too: you wouldn't want to allow a key of "::a:b\:c".)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but I just realized you need to replace the quoted delimiters with the actual delimiter afterwards, which my re.split doesn't do. Sorry about that.

# Override the split for the simple case where there is an exact
# match. This allows `Config.items()` to work since `UserDict`
# accesses Config.data directly to obtain the keys and every top
# level key should always retrieve the top level values.
if name in data:
keys = (name,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems kind of wasteful to do another in and not just return the value here, but I guess you don't want to repeat the Config and _D code at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it only goes once around the loop anyhow.

return data

def __setitem__(self, name, value):
keys = name.split(".")
keys = self._splitKeys(name)
last = keys.pop()
if isinstance(value, Config):
value = copy.deepcopy(value.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below this, I'm a little worried about the setdefault(key, {}). What if there's supposed to be a sequence in there? You'd start getting "3" keys instead of 3 indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config[".a.b.1"] = 5 works so long as /a/b is a sequence. Adding support for sequences is fairly new and has some rough edges. You can't say config[".a.b.10.c"] = 4 if /a/b/ is not a sequence or if /a/b is a sequence but only has 5 elements. If we started to want to support that there'd have to be some more deep thinking and we'd have to consider a ConfigSequence object I think. I'd like to defer going down that road until we need to.

if isinstance(d, collections.Sequence):
theseKeys = range(len(d))
else:
theseKeys = d.keys()
for key in theseKeys:
val = d[key]
levelKey = f"{base}.{key}" if base is not None else key
levelKey = base + (key,) if base is not None else (key,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tuples may include integers, not just strings. Do you need to stringify the numbers above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are tests for this and it's not an issue (names() does for stringification).

Copy link
Contributor

@ktlim ktlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have to check the tests.

return keys

def names(self, topLevelOnly=False, delimiter=None):
"""Get the delimited name of all the keys in the hierarchy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "a delimited name" instead of "the delimited name" since there isn't one canonical name.

If True, only the top level are returned.
delimiter : `str`, optional
Delimiter to use when forming the keys. The delimiter must
not be present in any of the keys. If `None` given the delimiter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, "a delimiter", not "the delimiter".


Returns
-------
names : `list`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of str?


if delimiter is not None:
if delimiter in combined:
# For better error message we need to report the clashing key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping the backslash-quoting would allow you to use the provided delimiter even in this case, by quoting it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes. You are right. I could escape the delimiter in the keys in this case.

for n in names:
for s in n:
if searchStr in s:
return s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to return s or n?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. n is more explicit.

delimiter = self._D
ntries = 0
while delimiter in combined:
log.debug(f"Trying delimiter '{delimiter}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message here isn't quite accurate; delimiter was already tried and found wanting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had meant to remove that debug message anyhow.

@@ -586,7 +713,7 @@ def __init__(self, other=None, validate=True, mergeDefaults=True, searchPaths=No
# component.component (since the included files can themselves
# include the component name)
if self.component is not None:
doubled = "{0}.{0}".format(self.component)
doubled = (self.component, ) * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less tricky to say (self.component, self.component) ?

The delimiter is now set as Config.D in a single location.
Furthermore, the code to process the string supplied by
the caller when addressing a config hierarchy is now
in a single location.  The logic has been expanded
to allow an iterable such as a list as well as a string.
Additionally, if the first character of the string is
not an alphanumeric character then that character
is assumed to be the delimiter for the given string.
Either using Config.D as a delimiter when building strings
dynamically, or by using a leading "." to allow the lookup
to be as close as possible to the previous form.
Starting with the delimiter used means that the string
will work even in the unlikely event that the default
delimiter is changed.
If the string being used to access the config entry is being
constructed from a variable, it is more efficient and safer
to pass in the components as a tuple() rather than having
to worry about a delimiter.
It was being used in four places and that seemed like three too
many places.
This is a clearer name and was requested in an earlier ticket
review but was not implemented at the time.
We weren't making use of this and it added metaclass
complication.
The delimiter is not allowed to clash with any of the keys.

If no delimiter is specified the default internal delimiter
will be used but if that is present in a key then a new delimiter
will be used.
No longer use delimiters in the code since it is always safer
to be explicit with a tuple at the expense of two more characters.
@@ -98,30 +99,53 @@ def testBadConfig(self):
with self.assertRaises(RuntimeError):
Config(badArg)

def testBasics(self):
c = Config({1: 2, 3: 4, "key3": 6})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem pretty minimal; can't you assert the actual strings or at least more of them or perhaps do a round-trip of the repr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do a round trip. I didn't really want to get bogged down in exact stringification form.

names = c.names()
nameTuples = c.nameTuples()
self.assertEqual(len(names), len(nameTuples))
self.assertGreater(len(names), 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why assertGreater here? Don't you know what the length should be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to use Equal.

self.assertGreater(len(names), 5)
self.assertGreater(len(nameTuples), 5)

with self.assertRaises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would go away if we can do the quoted-delimiter thing.

@timj timj merged commit 389b180 into master Sep 7, 2018
@timj timj deleted the tickets/DM-15614 branch September 7, 2018 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants