configparser support for reading from strings and dictionaries #53697

ambv · 2010-08-02T00:31:15Z

BPO	9452
Nosy	@freddrake, @ezio-melotti, @merwok, @briancurtin, @ambv
Files	issue9452.diff: Patch that introduces read_string and read_dict + related bugfixes

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-08-09.12:53:41.249>
created_at = <Date 2010-08-02.00:31:14.969>
labels = ['type-feature', 'library']
title = 'configparser support for reading from strings and dictionaries'
updated_at = <Date 2010-08-09.12:53:41.247>
user = 'https://github.com/ambv'

bugs.python.org fields:

activity = <Date 2010-08-09.12:53:41.247>
actor = 'fdrake'
assignee = 'none'
closed = True
closed_date = <Date 2010-08-09.12:53:41.249>
closer = 'fdrake'
components = ['Library (Lib)']
creation = <Date 2010-08-02.00:31:14.969>
creator = 'lukasz.langa'
dependencies = []
files = ['18440']
hgrepos = []
issue_num = 9452
keywords = ['patch']
message_count = 19.0
messages = ['112409', '112410', '112413', '112416', '112429', '112510', '112520', '112614', '112617', '112720', '112736', '112799', '112949', '113057', '113058', '113059', '113075', '113298', '113411']
nosy_count = 5.0
nosy_names = ['fdrake', 'ezio.melotti', 'eric.araujo', 'brian.curtin', 'lukasz.langa']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue9452'
versions = ['Python 3.2']

ambv · 2010-08-02T00:31:12Z

Overview
--------

It's a fairly common need in configuration parsing to take configuration from a string or a Python data structure (most commonly, a dictionary). The attached patch introduces two new methods to RawConfigParser that serve this purpose: readstring and readdict. In the process, two behavioral bugs were detected and fixed.

Detailed information about the patch
------------------------------------
Source changes:

added readstring and readdict methods
fixed a bug in SafeConfigParser.set when setting a None value would raise an exception (the code tried to remove interpolation escapes from None)
added a new exception DuplicateOptionError, raised when during parsing a single file, string or a dictionary the same option occurs twice. This catches mispellings and case-sensitivity errors (parsers are by default case-insensitive in regard to options).
added checking for option duplicates described above in _read

Test changes:

self.fromstring is using readstring now
self.cf removed because it was bad design, now config parser instances are explicit everywhere
** changed definition of get_error and parse_error
split test_basic into two parts: config parser building and the actual test
** introduced config parser built from the dictionary
added a duplicate option checking case for string and dictionary based reading

Documentation changes:

documented readstring and readdict
documented DuplicateOptionError
explicit remark about the case-insensitivity
corrected remark about leading whitespace being removed from values (also trailing whitespace is, and for keys as well)

ambv · 2010-08-02T00:35:24Z

There goes the patch.

ambv · 2010-08-02T00:58:48Z

Patch updated after review by Ezio Melotti.

To answer a common question that came up in the review: all atypical names and implementation details are there due to consistency with existing configparser code, e.g.:

readstring ~= readfp (no _ between words)
DuplicateOptionError ~= DuplicateSectionError (not Duplicated)
all exceptions use old style BaseClass.__init__ and not super()

API won't change so this has to remain that way. Exceptions may be refactored in one go at a later stage.

briancurtin · 2010-08-02T02:38:42Z

Although you say this is fairly common, I haven't heard of anyone using or requesting this type of feature. Do you have any real-world use cases for this? Before we start adding more read methods I think we should know who wants them and why.

I'm not sure duplicates should raise exceptions. To me, the current behavior of using the last read section/option is fine. It's predictable and it works. Halting a program's operation due to duplicate sections/options seems a bit harsh to me.

ambv · 2010-08-02T08:46:04Z

Good questions, thanks! The answers will come useful for documentation and later hype :)

READING CONFIGURATION FROM A DATA STRUCTURE
-------------------------------------------

This is all about templating a decent set of default values. The major use case I'm using this for (with a homebrew SafeConfigParser subclass at the moment) is to provide in one place a set of defaults for the whole configuration. The so-called defaults= that we have at the moment don't fit this space well enough because they provide values that can (and will) jump into every section. This made them useless for me twice:

when configuring access to external components in a fairly complex system; abstracting out the useless details the template I was looking for was

[name-server]
port=
protocol=
verbose=

[workflow-manager]
port=
protocol=
verbose=

[legacy-integration]
port=
protocol=
verbose=

# there were about 15 of these

second case was a legacy CSV translation system (don't ask!). An abstract of a config with conflicting keys:

[company1-report]
delimiter=,
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=True

[company2-report]
delimiter=;
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=False

# and so on for ~10 entries

As you can see, in both examples defaults= couldn't be a good enough template. The reason I wanted these default values to be specified in the program was two-fold:

to be able to use the configuration without worrying about NoOptionErrors or fallback values on each get() invocation
to handle customers with existing configuration files which didn't include specific sections; if they didn't need customization they could simply use the defaults provided

I personally like the dictionary reading method but this is a matter of taste. Plus, .fromstring() is already used in unit tests :)

DUPLICATE OPTION VALIDATION
---------------------------

Firstly, I'd like to stress that this validation does NOT mean that we cannot update keys once they appear in configuration. Duplicate option detection works only while parsing a single file, string or dictionary. In this case duplicates are a configuration error and should be notified to the user.

You are right that for a programmer accepting the last value provided is acceptable. In this case the impact should be on the user who might not feel the same. If his configuration is ambiguous, it's best to use the Zen: "In the face of ambiguity, refuse the temptation to guess."

This is very much the case for large configuration files (take /etc/ssh/sshd_config or any real life ftpd config, etc.) when users might miss the fact that one option is uncommented in the body or thrown in at the end of the file by another admin or even the user himself.

Users might also be unaware of the case-insensitivity.

These two problems are even more likely to cause headaches for the dictionary reading algorithm where there actually isn't an order in the keys within a section and you can specify a couple of values that represent the same key because of the case-insensitivity. Plus, this is going to be even more visible once we introduce mapping protocol access when you can add a whole section with keys using the dictionary syntax.

Another argument is that there is already section duplicate validation but it doesn't work when reading from files. This means that the user might add two sections of the same name with contradicting options.

SUMMARY
-------
Reading from strings or dictionaries provides an additional way to feed the parser with values. Judging from the light complexity of both methods I would argue that it's beneficial to configparser users to have well factored unit tested methods for these tasks so they don't have to reimplement them over and over again when the need arises.

In terms of validation, after you remark and thinking about it for a while, I think that the best path may be to let programmers choose during parser initialization whether they want validation or not. This would be also a good place to include section duplicate validation during file reading. Should I provide an updated patch?

After a couple of years of experience with external customers configuring software I find it better for the software to aid me in customer support. This is the best solution when users can help themselves. And customers (and we ourselves, too!) do stupid things all the time. And so, specifying a default set of sane values AND checking for duplicates within the same section helps with that.

freddrake · 2010-08-02T18:41:49Z

Reading from a string is certainly fairly common, though I'm pretty happy with using an io.StringIO seems reasonable and straightforward.

I've never stumbled over the need to "read" from dictionaries as described.

ambv · 2010-08-02T19:25:20Z

Corrected a simple mistake in the patch.

ambv · 2010-08-03T14:38:09Z

Updated patch after discussion on #python-dev:

PEP-8 compliant names used: read_file, read_string, read_dict. readfp has been PendingDeprecated
documentation updates
option validation is now optional with the use of strict= argument in the parser's init
new unit tests introduced to check for the new behaviour

ambv · 2010-08-03T14:47:57Z

FTR, some people questioned the purpose of read_dict(). Let me summarize this very briefly here:

the API is using dictionaries similar to those in defaults= but has one level of depth more (sections)
initializing a parser with a dictionary produces syntax that is more natural in code
having a single method implementing reading a dictionary with unit tests, support for proper duplicate handling, etc. frees users from writing their own
we need that anyway for upcoming mapping protocol access discussed in bpo-5412
more detailed use cases in msg112429

ambv · 2010-08-03T23:04:39Z

Rietveld review link: http://codereview.appspot.com/1924042/edit

freddrake · 2010-08-04T01:44:55Z

I agree that the existing defaults={...} should never have been added to the stdlib. It made sense in the originating application, but should have been implemented differently to keep application-specific behavior out of what eventually was added to the stdlib.

Will think further about the rest of this when I'm on my own computer and can read & play with the patch in a more usable environment.

ambv · 2010-08-04T13:09:32Z

Patch updated after review by Ezio Melotti and Éric Araujo. Thanks guys.

freddrake · 2010-08-05T07:10:04Z

(Apparently I don't have the right permissions on Rietveld.)

Docstrings should be written in the standard PEP-8 way (single line
summary + additional explanation as needed following a blank line).
read_sting and read_dict should still take a filename argument for
use in messages, with and something like <data in ...> (with
the caller's file being filled in if not provided).
Indentation in the last read_dict of
test.test_cfgparser.BasicTestCase.test_basic_from_dict is
incconsistent with the previous read_dict in the same test.
Lines over 79 characters should be shortened. Most of these are in
docstrings, so just re-wrapping should be sufficient for most.
Changing the test structure to avoid self.cf may have been convenient,
but is unrelated to the actual functionality changes. In the future
such refactorings should be performed in separate patches. (Ordering
dependencies are fine, so long as they're noted in the relevant
issues.)
DuplicateOptionError is missing from __all__.
Changing the constructor to use keyword-only arguments carries some
backward-compatibility risk. That can be avvoided by removing that
change and adding strict=False at the end of the parameter list.

It's unlikely that this is a significant risk, since these parameters
generally lend themselves to keyword usage.

I think this should have been several separate patches:

refactoring (the self.cf changes in the tests)
addition of the DuplicateOptionError
the read_* methods (including the readfp deprecation)
the new "strict" option

Don't change that at this point, but please consider smaller chunks in
the future.

ambv · 2010-08-05T22:16:05Z

Updated patch after review by Fred Drake. Thanks, it was terrific!

Status:

Docstrings should be written in the standard PEP-8 way (single line
summary + additional explanation as needed following a blank line).

Corrected where applicable. Is it OK if the one-sentence summary is occasionally longer than one line? Check out DuplicateSectionError, could it have a summary as complete as this that would fit a single line?

On a similar note, an inconsistency of configparser.py is that sometimes a blank line is placed after the docstring and sometimes there is none. How would you want this to look vi in the end?

read_sting and read_dict should still take a filename argument for
use in messages, with and something like <data in ...> (with
the caller's file being filled in if not provided).

Are you sure about that? read_string might have some remote use for an argument like that, but I'd call it source= anyway. As for read_dict, the implementation does not even use _read(), so I don't know where could this potential filename= be used.

Indentation in the last read_dict of
test.test_cfgparser.BasicTestCase.test_basic_from_dict is
incconsistent with the previous read_dict in the same test.

Updated. All in all, some major unit test lipsticking should be done as a separate patch.

Lines over 79 characters should be shortened. Most of these are in
docstrings, so just re-wrapping should be sufficient for most.

Corrected, even made my Vim show me these kinds of formatting problems. I also corrected a couple of these which were there before the change.

Changing the test structure to avoid self.cf may have been convenient,
but is unrelated to the actual functionality changes. In the future
such refactorings should be performed in separate patches. (Ordering
dependencies are fine, so long as they're noted in the relevant
issues.)

Good point, thanks.

DuplicateOptionError is missing from __all__.

Corrected.

Changing the constructor to use keyword-only arguments carries some
backward-compatibility risk. That can be avvoided by removing that
change and adding strict=False at the end of the parameter list.

All of these arguments are new in trunk so there is no backwards compatibility here to think about. The arguments that are not new (defaults and dict_type) are placed before the asterisk.

Don't change that at this point, but please consider smaller chunks in
the future.

I will, thanks.

ambv · 2010-08-05T22:17:00Z

Ah, forgot to remind you that I don't have commit privileges yet.

merwok · 2010-08-05T22:18:45Z

> Docstrings should be written in the standard PEP-8 way (single line
> summary + additional explanation as needed following a blank line).
Corrected where applicable. Is it OK if the one-sentence summary is
occasionally longer than one line?

It’s a one-line summary, not one sentence. PEP-257 has all the details you’re looking for, and more.

freddrake · 2010-08-06T05:51:02Z

Summmary lines in docstrings are one line, as Éric points out.
They're summaries, so need not be complete. Use elaboration text as
needed, and omit anything that's not relevant in context. An
alternate wording to consider:

"""Raised by strict parsers for options repeated in an input source."""

In particular, there's no mention of what the kinds of sources are,
since that's not relevant for the exception itself.
I like the blank line before the ending """, others don't. More
important is consistency within the module, so feel free to coerce it
either way.
Perhaps read_* should all call the extra argument source= instead of
filename=. readfp should take filename and pass it as source
for compatibility. This still makes sense for read_file since the
"file" part of the name refers to the kind of object that's passed in,
not that the file represents a simple file on the filesystem.
Since the Duplicate*Error exceptions can now be raised when "reading"
(_read itself is an implementation detail, so that's irrelevant), they
should grow filename, lineno, line constructor arguments. These
should be filled in as much as possible when those exceptions are
raised.
Adding a source attribute to exceptions that have filename
attribute is reasonable; they should have the same value. (One should
be a property that mirrors the other.) The filename parameter and
attribute name must remain for compatibility.
There's at least one way to make vim show the too-long text with a
violent red background. Appropriate for code, sucks when reading
logfiles. But I'm not a vim user. Whatever tool makes it show up for
you is fine.
The constructors in Python 3.2 should be compatible with those from
Python 2.7 (IMO). They're currently not: allow_no_value should be
allowed positionally and come right after dict_type. (This problem
existed before your patch; it should be corrected first.)
In the docs, "Parse configuration from a dictionary" should probably
be "Load configuration from a dictionary"; most of the parsing effort
is skipped in this case, so this reads a little oddly.
Indentation in the DuplicateOptionError constructor is a bit weird in
the superclass constructor call. Something like this would be better:

    def __init__(self, section, option):
        Error.__init__(
            self,
            "Option %r in section %r already exists" % (option, section))
        self.section = section
        self.option = option
        self.args = (section, option)

ambv · 2010-08-08T19:02:44Z

Patch updated.

All docstrings now have a one-line summary.

All multiline docstrings now have a newline character before the closing """.

No method docstrings now include any additional newlines between them and the
code. Most of them were okay, a couple were edited for consistency.

All read_* methods now have a source= attribute. read_file defaults to <???>,
read_string to <string>, read_dict to <dict>. Didn't provide any additional
introspection because it's not trivial and I want this patch to be committed at
last. This might be an additional advantage because a generic <string> or <dict>
name may motivate programmers to specify a more context-specific source name of
their own.

As for Duplicate*Error, I've added source= and lineno= to both. Didn't add line=
because it's useless, the error is very specific on what is wrong. Reading from
a dictionary does not pass the lineno for obvious reasons. Reading from files
and strings does.

The filename' attribute and __init__ argument in ParsingError were PendingDeprecated and a source' attribute was introduced for consistency.

allow_no_value was moved to the 3rd place in the argument list for backwards
compatibility with Python 2.7. I didn't notice your change made to Python
2.7 as well, all other new arguments were added by me. This ensures there's no
backwards compatibility involved in their case. That's why I left all of the
new arguments as keyword only.

Documentation and unit tests updated.

BTW, if allow_no_value made it to 2.7 this means it has a bug in
SafeConfigParser.set. Try to set a valueless option, line 694 will raise an
exception. Should I provide you with a separate patch only to fix this for
2.7.1?

PS. I made Vim show me too long lines with a delicate red background. Way better
than the violent alternative ;) Plus, I only set it for the Python filetype.

freddrake · 2010-08-09T12:53:41Z

Patch committed on py3k branch in r83889.

ambv added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 2, 2010

freddrake closed this as completed Aug 9, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configparser support for reading from strings and dictionaries #53697

configparser support for reading from strings and dictionaries #53697

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

briancurtin commented Aug 2, 2010

ambv commented Aug 2, 2010

freddrake commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 3, 2010

ambv commented Aug 3, 2010

ambv commented Aug 3, 2010

freddrake commented Aug 4, 2010

ambv commented Aug 4, 2010

freddrake commented Aug 5, 2010

ambv commented Aug 5, 2010

ambv commented Aug 5, 2010

merwok commented Aug 5, 2010

freddrake commented Aug 6, 2010

ambv commented Aug 8, 2010

freddrake commented Aug 9, 2010

configparser support for reading from strings and dictionaries #53697

configparser support for reading from strings and dictionaries #53697

Comments

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 2, 2010

briancurtin commented Aug 2, 2010

ambv commented Aug 2, 2010

freddrake commented Aug 2, 2010

ambv commented Aug 2, 2010

ambv commented Aug 3, 2010

ambv commented Aug 3, 2010

ambv commented Aug 3, 2010

freddrake commented Aug 4, 2010

ambv commented Aug 4, 2010

freddrake commented Aug 5, 2010

ambv commented Aug 5, 2010

ambv commented Aug 5, 2010

merwok commented Aug 5, 2010

freddrake commented Aug 6, 2010

ambv commented Aug 8, 2010

freddrake commented Aug 9, 2010