New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15614: Allow dynamic hierarchy delimiters for configs #84
Conversation
6659e57
to
190d07d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only got through nameTuples()
for now. Still need to look at names()
and test_config.py
.
"""Implements a datatype that is used by `Butler` for configuration | ||
parameters. | ||
|
||
It is essentially a `dict` with key/value pairs, including nested dicts | ||
(as values). In fact, it can be initialized with a `dict`. The only caveat | ||
is that keys may **not** contain dots (``.``). This is explained next: | ||
is that keys may **not** contain the delimiter. This is explained next: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is backwards. You intend to allow any keys whatsoever; it's the accessor who must ensure that the delimiter does not appear in any keys. Instead, this reads like an admonishment to the key creator not to use an unknown delimiter.
string, such that ``foo["a", "b", "c"]`` is equivalent to | ||
``foo[".a.b.c"]``. Finally, the delimiter can be escaped if it should | ||
not be used in part of the string: ``foo[r".a.b\.c"]`` results in a two | ||
element hierarchy of ``a`` and ``b.c``. For hard-coded strings it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you escape an initial non-alphanumeric character? Looking at the code, the answer is no. Maybe you should be able to, or else you should make clear that you can't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I want to support escaping of the delimiter at the start of the key. We can revisit that if it becomes an issue (I'm not convinced people will want to be escaping delimiters at all).
as the delimeter resulting in a single key of ``a.b.c`` being accessed. | ||
Using ``foo[":a:b:c"]`` is therefore equivalent to ``foo[".a.b.c"]``. | ||
This requires that keys in `Config` instances do not themselves start with | ||
non-alphanumeric characters. If the hierarchy is already available in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can access non-alphanumeric-starting keys by selecting a different delimiter: ":.a.b.c"
should work, no?
@@ -140,6 +141,9 @@ class Config(_ConfigBase): | |||
If `None` is provided an empty `Config` will be created. | |||
""" | |||
|
|||
_D = "→" | |||
"""Default internal delimiter to use for components in the hierarchy""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say when this default is used.
else: | ||
return [key, ] | ||
# Allow escaping of the delimiter: .a.foo\.bar -> a foo.bar | ||
escaped = f"\\{d}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe fr"\{d}"
? Or that may be too fancy. What you have is fine.
temp = None | ||
if escaped in key: | ||
# Replace with a character that won't be in the string | ||
temp = "\r" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hacky. Perhaps you should just use re.split(fr'(?<=[^\\])\{d}', key)
? (The lookbehind should be safe because there should always be at least one non-delimiter character, but you should probably check for that, too: you wouldn't want to allow a key of "::a:b\:c"
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but I just realized you need to replace the quoted delimiters with the actual delimiter afterwards, which my re.split
doesn't do. Sorry about that.
# Override the split for the simple case where there is an exact | ||
# match. This allows `Config.items()` to work since `UserDict` | ||
# accesses Config.data directly to obtain the keys and every top | ||
# level key should always retrieve the top level values. | ||
if name in data: | ||
keys = (name,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems kind of wasteful to do another in
and not just return the value here, but I guess you don't want to repeat the Config
and _D
code at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it only goes once around the loop anyhow.
return data | ||
|
||
def __setitem__(self, name, value): | ||
keys = name.split(".") | ||
keys = self._splitKeys(name) | ||
last = keys.pop() | ||
if isinstance(value, Config): | ||
value = copy.deepcopy(value.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below this, I'm a little worried about the setdefault(key, {})
. What if there's supposed to be a sequence in there? You'd start getting "3"
keys instead of 3
indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config[".a.b.1"] = 5
works so long as /a/b
is a sequence. Adding support for sequences is fairly new and has some rough edges. You can't say config[".a.b.10.c"] = 4
if /a/b/
is not a sequence or if /a/b
is a sequence but only has 5 elements. If we started to want to support that there'd have to be some more deep thinking and we'd have to consider a ConfigSequence object I think. I'd like to defer going down that road until we need to.
if isinstance(d, collections.Sequence): | ||
theseKeys = range(len(d)) | ||
else: | ||
theseKeys = d.keys() | ||
for key in theseKeys: | ||
val = d[key] | ||
levelKey = f"{base}.{key}" if base is not None else key | ||
levelKey = base + (key,) if base is not None else (key,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tuples may include integers, not just strings. Do you need to stringify the numbers above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are tests for this and it's not an issue (names()
does for stringification).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have to check the tests.
return keys | ||
|
||
def names(self, topLevelOnly=False, delimiter=None): | ||
"""Get the delimited name of all the keys in the hierarchy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "a delimited name" instead of "the delimited name" since there isn't one canonical name.
If True, only the top level are returned. | ||
delimiter : `str`, optional | ||
Delimiter to use when forming the keys. The delimiter must | ||
not be present in any of the keys. If `None` given the delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, "a delimiter", not "the delimiter".
|
||
Returns | ||
------- | ||
names : `list` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list of str
?
|
||
if delimiter is not None: | ||
if delimiter in combined: | ||
# For better error message we need to report the clashing key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping the backslash-quoting would allow you to use the provided delimiter even in this case, by quoting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Yes. You are right. I could escape the delimiter in the keys in this case.
for n in names: | ||
for s in n: | ||
if searchStr in s: | ||
return s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to return s
or n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. n
is more explicit.
delimiter = self._D | ||
ntries = 0 | ||
while delimiter in combined: | ||
log.debug(f"Trying delimiter '{delimiter}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message here isn't quite accurate; delimiter
was already tried and found wanting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I had meant to remove that debug message anyhow.
@@ -586,7 +713,7 @@ def __init__(self, other=None, validate=True, mergeDefaults=True, searchPaths=No | |||
# component.component (since the included files can themselves | |||
# include the component name) | |||
if self.component is not None: | |||
doubled = "{0}.{0}".format(self.component) | |||
doubled = (self.component, ) * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less tricky to say (self.component, self.component)
?
The delimiter is now set as Config.D in a single location. Furthermore, the code to process the string supplied by the caller when addressing a config hierarchy is now in a single location. The logic has been expanded to allow an iterable such as a list as well as a string. Additionally, if the first character of the string is not an alphanumeric character then that character is assumed to be the delimiter for the given string.
Either using Config.D as a delimiter when building strings dynamically, or by using a leading "." to allow the lookup to be as close as possible to the previous form.
Starting with the delimiter used means that the string will work even in the unlikely event that the default delimiter is changed.
If the string being used to access the config entry is being constructed from a variable, it is more efficient and safer to pass in the components as a tuple() rather than having to worry about a delimiter.
It was being used in four places and that seemed like three too many places.
This is a clearer name and was requested in an earlier ticket review but was not implemented at the time.
We weren't making use of this and it added metaclass complication.
The delimiter is not allowed to clash with any of the keys. If no delimiter is specified the default internal delimiter will be used but if that is present in a key then a new delimiter will be used.
No longer use delimiters in the code since it is always safer to be explicit with a tuple at the expense of two more characters.
6a330a4
to
b05e83d
Compare
tests/test_config.py
Outdated
@@ -98,30 +99,53 @@ def testBadConfig(self): | |||
with self.assertRaises(RuntimeError): | |||
Config(badArg) | |||
|
|||
def testBasics(self): | |||
c = Config({1: 2, 3: 4, "key3": 6}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem pretty minimal; can't you assert the actual strings or at least more of them or perhaps do a round-trip of the repr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do a round trip. I didn't really want to get bogged down in exact stringification form.
tests/test_config.py
Outdated
names = c.names() | ||
nameTuples = c.nameTuples() | ||
self.assertEqual(len(names), len(nameTuples)) | ||
self.assertGreater(len(names), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assertGreater
here? Don't you know what the length should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to use Equal.
tests/test_config.py
Outdated
self.assertGreater(len(names), 5) | ||
self.assertGreater(len(nameTuples), 5) | ||
|
||
with self.assertRaises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would go away if we can do the quoted-delimiter thing.
Now the delimiter in a key is escaped.
The delimiter for hierarchies no longer defaults to a
.
. The delimiter is now a class variable and can be overridden by starting the string with the delimiter you wish to use.