Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configparser support for reading from strings and dictionaries #53697

Closed
ambv opened this issue Aug 2, 2010 · 19 comments
Closed

configparser support for reading from strings and dictionaries #53697

ambv opened this issue Aug 2, 2010 · 19 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ambv
Copy link
Contributor

ambv commented Aug 2, 2010

BPO 9452
Nosy @freddrake, @ezio-melotti, @merwok, @briancurtin, @ambv
Files
  • issue9452.diff: Patch that introduces read_string and read_dict + related bugfixes
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-08-09.12:53:41.249>
    created_at = <Date 2010-08-02.00:31:14.969>
    labels = ['type-feature', 'library']
    title = 'configparser support for reading from strings and dictionaries'
    updated_at = <Date 2010-08-09.12:53:41.247>
    user = 'https://github.com/ambv'

    bugs.python.org fields:

    activity = <Date 2010-08-09.12:53:41.247>
    actor = 'fdrake'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-08-09.12:53:41.249>
    closer = 'fdrake'
    components = ['Library (Lib)']
    creation = <Date 2010-08-02.00:31:14.969>
    creator = 'lukasz.langa'
    dependencies = []
    files = ['18440']
    hgrepos = []
    issue_num = 9452
    keywords = ['patch']
    message_count = 19.0
    messages = ['112409', '112410', '112413', '112416', '112429', '112510', '112520', '112614', '112617', '112720', '112736', '112799', '112949', '113057', '113058', '113059', '113075', '113298', '113411']
    nosy_count = 5.0
    nosy_names = ['fdrake', 'ezio.melotti', 'eric.araujo', 'brian.curtin', 'lukasz.langa']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9452'
    versions = ['Python 3.2']

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 2, 2010

    Overview
    --------

    It's a fairly common need in configuration parsing to take configuration from a string or a Python data structure (most commonly, a dictionary). The attached patch introduces two new methods to RawConfigParser that serve this purpose: readstring and readdict. In the process, two behavioral bugs were detected and fixed.

    Detailed information about the patch
    ------------------------------------
    Source changes:

    • added readstring and readdict methods
    • fixed a bug in SafeConfigParser.set when setting a None value would raise an exception (the code tried to remove interpolation escapes from None)
    • added a new exception DuplicateOptionError, raised when during parsing a single file, string or a dictionary the same option occurs twice. This catches mispellings and case-sensitivity errors (parsers are by default case-insensitive in regard to options).
    • added checking for option duplicates described above in _read

    Test changes:

    • self.fromstring is using readstring now
    • self.cf removed because it was bad design, now config parser instances are explicit everywhere
      ** changed definition of get_error and parse_error
    • split test_basic into two parts: config parser building and the actual test
      ** introduced config parser built from the dictionary
    • added a duplicate option checking case for string and dictionary based reading

    Documentation changes:

    • documented readstring and readdict
    • documented DuplicateOptionError
    • explicit remark about the case-insensitivity
    • corrected remark about leading whitespace being removed from values (also trailing whitespace is, and for keys as well)

    @ambv ambv added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 2, 2010
    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 2, 2010

    There goes the patch.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 2, 2010

    Patch updated after review by Ezio Melotti.

    To answer a common question that came up in the review: all atypical names and implementation details are there due to consistency with existing configparser code, e.g.:

    • readstring ~= readfp (no _ between words)
    • DuplicateOptionError ~= DuplicateSectionError (not Duplicated)
    • all exceptions use old style BaseClass.__init__ and not super()

    API won't change so this has to remain that way. Exceptions may be refactored in one go at a later stage.

    @briancurtin
    Copy link
    Member

    Although you say this is fairly common, I haven't heard of anyone using or requesting this type of feature. Do you have any real-world use cases for this? Before we start adding more read methods I think we should know who wants them and why.

    I'm not sure duplicates should raise exceptions. To me, the current behavior of using the last read section/option is fine. It's predictable and it works. Halting a program's operation due to duplicate sections/options seems a bit harsh to me.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 2, 2010

    Good questions, thanks! The answers will come useful for documentation and later hype :)

    READING CONFIGURATION FROM A DATA STRUCTURE
    -------------------------------------------

    This is all about templating a decent set of default values. The major use case I'm using this for (with a homebrew SafeConfigParser subclass at the moment) is to provide in one place a set of defaults for the whole configuration. The so-called defaults= that we have at the moment don't fit this space well enough because they provide values that can (and will) jump into every section. This made them useless for me twice:

    • when configuring access to external components in a fairly complex system; abstracting out the useless details the template I was looking for was

    [name-server]
    port=
    protocol=
    verbose=

    [workflow-manager]
    port=
    protocol=
    verbose=

    [legacy-integration]
    port=
    protocol=
    verbose=

    # there were about 15 of these

    • second case was a legacy CSV translation system (don't ask!). An abstract of a config with conflicting keys:

    [company1-report]
    delimiter=,
    amount_column=
    amount_type=
    description_column=
    description_type=
    ignore_first_line=True

    [company2-report]
    delimiter=;
    amount_column=
    amount_type=
    description_column=
    description_type=
    ignore_first_line=False

    # and so on for ~10 entries

    As you can see, in both examples defaults= couldn't be a good enough template. The reason I wanted these default values to be specified in the program was two-fold:

    1. to be able to use the configuration without worrying about NoOptionErrors or fallback values on each get() invocation

    2. to handle customers with existing configuration files which didn't include specific sections; if they didn't need customization they could simply use the defaults provided

    I personally like the dictionary reading method but this is a matter of taste. Plus, .fromstring() is already used in unit tests :)

    DUPLICATE OPTION VALIDATION
    ---------------------------

    Firstly, I'd like to stress that this validation does NOT mean that we cannot update keys once they appear in configuration. Duplicate option detection works only while parsing a single file, string or dictionary. In this case duplicates are a configuration error and should be notified to the user.

    You are right that for a programmer accepting the last value provided is acceptable. In this case the impact should be on the user who might not feel the same. If his configuration is ambiguous, it's best to use the Zen: "In the face of ambiguity, refuse the temptation to guess."

    This is very much the case for large configuration files (take /etc/ssh/sshd_config or any real life ftpd config, etc.) when users might miss the fact that one option is uncommented in the body or thrown in at the end of the file by another admin or even the user himself.

    Users might also be unaware of the case-insensitivity.

    These two problems are even more likely to cause headaches for the dictionary reading algorithm where there actually isn't an order in the keys within a section and you can specify a couple of values that represent the same key because of the case-insensitivity. Plus, this is going to be even more visible once we introduce mapping protocol access when you can add a whole section with keys using the dictionary syntax.

    Another argument is that there is already section duplicate validation but it doesn't work when reading from files. This means that the user might add two sections of the same name with contradicting options.

    SUMMARY
    -------
    Reading from strings or dictionaries provides an additional way to feed the parser with values. Judging from the light complexity of both methods I would argue that it's beneficial to configparser users to have well factored unit tested methods for these tasks so they don't have to reimplement them over and over again when the need arises.

    In terms of validation, after you remark and thinking about it for a while, I think that the best path may be to let programmers choose during parser initialization whether they want validation or not. This would be also a good place to include section duplicate validation during file reading. Should I provide an updated patch?

    After a couple of years of experience with external customers configuring software I find it better for the software to aid me in customer support. This is the best solution when users can help themselves. And customers (and we ourselves, too!) do stupid things all the time. And so, specifying a default set of sane values AND checking for duplicates within the same section helps with that.

    @freddrake
    Copy link
    Member

    Reading from a string is certainly fairly common, though I'm pretty happy with using an io.StringIO seems reasonable and straightforward.

    I've never stumbled over the need to "read" from dictionaries as described.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 2, 2010

    Corrected a simple mistake in the patch.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 3, 2010

    Updated patch after discussion on #python-dev:

    • PEP-8 compliant names used: read_file, read_string, read_dict. readfp has been PendingDeprecated
    • documentation updates
    • option validation is now optional with the use of strict= argument in the parser's init
    • new unit tests introduced to check for the new behaviour

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 3, 2010

    FTR, some people questioned the purpose of read_dict(). Let me summarize this very briefly here:

    • the API is using dictionaries similar to those in defaults= but has one level of depth more (sections)
    • initializing a parser with a dictionary produces syntax that is more natural in code
    • having a single method implementing reading a dictionary with unit tests, support for proper duplicate handling, etc. frees users from writing their own
    • we need that anyway for upcoming mapping protocol access discussed in bpo-5412
    • more detailed use cases in msg112429

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 3, 2010

    Rietveld review link: http://codereview.appspot.com/1924042/edit

    @freddrake
    Copy link
    Member

    I agree that the existing defaults={...} should never have been added to the stdlib. It made sense in the originating application, but should have been implemented differently to keep application-specific behavior out of what eventually was added to the stdlib.

    Will think further about the rest of this when I'm on my own computer and can read & play with the patch in a more usable environment.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 4, 2010

    Patch updated after review by Ezio Melotti and Éric Araujo. Thanks guys.

    @freddrake
    Copy link
    Member

    (Apparently I don't have the right permissions on Rietveld.)

    • Docstrings should be written in the standard PEP-8 way (single line
      summary + additional explanation as needed following a blank line).

    • read_sting and read_dict should still take a filename argument for
      use in messages, with and something like <data in ...> (with
      the caller's file being filled in if not provided).

    • Indentation in the last read_dict of
      test.test_cfgparser.BasicTestCase.test_basic_from_dict is
      incconsistent with the previous read_dict in the same test.

    • Lines over 79 characters should be shortened. Most of these are in
      docstrings, so just re-wrapping should be sufficient for most.

    • Changing the test structure to avoid self.cf may have been convenient,
      but is unrelated to the actual functionality changes. In the future
      such refactorings should be performed in separate patches. (Ordering
      dependencies are fine, so long as they're noted in the relevant
      issues.)

    • DuplicateOptionError is missing from __all__.

    • Changing the constructor to use keyword-only arguments carries some
      backward-compatibility risk. That can be avvoided by removing that
      change and adding strict=False at the end of the parameter list.

      It's unlikely that this is a significant risk, since these parameters
      generally lend themselves to keyword usage.

    I think this should have been several separate patches:

    • refactoring (the self.cf changes in the tests)

    • addition of the DuplicateOptionError

    • the read_* methods (including the readfp deprecation)

    • the new "strict" option

    Don't change that at this point, but please consider smaller chunks in
    the future.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 5, 2010

    Updated patch after review by Fred Drake. Thanks, it was terrific!

    Status:

    Docstrings should be written in the standard PEP-8 way (single line
    summary + additional explanation as needed following a blank line).

    Corrected where applicable. Is it OK if the one-sentence summary is occasionally longer than one line? Check out DuplicateSectionError, could it have a summary as complete as this that would fit a single line?

    On a similar note, an inconsistency of configparser.py is that sometimes a blank line is placed after the docstring and sometimes there is none. How would you want this to look vi in the end?

    read_sting and read_dict should still take a filename argument for
    use in messages, with and something like <data in ...> (with
    the caller's file being filled in if not provided).

    Are you sure about that? read_string might have some remote use for an argument like that, but I'd call it source= anyway. As for read_dict, the implementation does not even use _read(), so I don't know where could this potential filename= be used.

    Indentation in the last read_dict of
    test.test_cfgparser.BasicTestCase.test_basic_from_dict is
    incconsistent with the previous read_dict in the same test.

    Updated. All in all, some major unit test lipsticking should be done as a separate patch.

    Lines over 79 characters should be shortened. Most of these are in
    docstrings, so just re-wrapping should be sufficient for most.

    Corrected, even made my Vim show me these kinds of formatting problems. I also corrected a couple of these which were there before the change.

    Changing the test structure to avoid self.cf may have been convenient,
    but is unrelated to the actual functionality changes. In the future
    such refactorings should be performed in separate patches. (Ordering
    dependencies are fine, so long as they're noted in the relevant
    issues.)

    Good point, thanks.

    DuplicateOptionError is missing from __all__.

    Corrected.

    Changing the constructor to use keyword-only arguments carries some
    backward-compatibility risk. That can be avvoided by removing that
    change and adding strict=False at the end of the parameter list.

    All of these arguments are new in trunk so there is no backwards compatibility here to think about. The arguments that are not new (defaults and dict_type) are placed before the asterisk.

    Don't change that at this point, but please consider smaller chunks in
    the future.

    I will, thanks.

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 5, 2010

    Ah, forgot to remind you that I don't have commit privileges yet.

    @merwok
    Copy link
    Member

    merwok commented Aug 5, 2010

    > Docstrings should be written in the standard PEP-8 way (single line
    > summary + additional explanation as needed following a blank line).
    Corrected where applicable. Is it OK if the one-sentence summary is
    occasionally longer than one line?

    It’s a one-line summary, not one sentence. PEP-257 has all the details you’re looking for, and more.

    @freddrake
    Copy link
    Member

    • Summmary lines in docstrings are one line, as Éric points out.
      They're summaries, so need not be complete. Use elaboration text as
      needed, and omit anything that's not relevant in context. An
      alternate wording to consider:

      """Raised by strict parsers for options repeated in an input source."""

      In particular, there's no mention of what the kinds of sources are,
      since that's not relevant for the exception itself.

    • I like the blank line before the ending """, others don't. More
      important is consistency within the module, so feel free to coerce it
      either way.

    • Perhaps read_* should all call the extra argument source= instead of
      filename=. readfp should take filename and pass it as source
      for compatibility. This still makes sense for read_file since the
      "file" part of the name refers to the kind of object that's passed in,
      not that the file represents a simple file on the filesystem.

    • Since the Duplicate*Error exceptions can now be raised when "reading"
      (_read itself is an implementation detail, so that's irrelevant), they
      should grow filename, lineno, line constructor arguments. These
      should be filled in as much as possible when those exceptions are
      raised.

    • Adding a source attribute to exceptions that have filename
      attribute is reasonable; they should have the same value. (One should
      be a property that mirrors the other.) The filename parameter and
      attribute name must remain for compatibility.

    • There's at least one way to make vim show the too-long text with a
      violent red background. Appropriate for code, sucks when reading
      logfiles. But I'm not a vim user. Whatever tool makes it show up for
      you is fine.

    • The constructors in Python 3.2 should be compatible with those from
      Python 2.7 (IMO). They're currently not: allow_no_value should be
      allowed positionally and come right after dict_type. (This problem
      existed before your patch; it should be corrected first.)

    • In the docs, "Parse configuration from a dictionary" should probably
      be "Load configuration from a dictionary"; most of the parsing effort
      is skipped in this case, so this reads a little oddly.

    • Indentation in the DuplicateOptionError constructor is a bit weird in
      the superclass constructor call. Something like this would be better:

        def __init__(self, section, option):
            Error.__init__(
                self,
                "Option %r in section %r already exists" % (option, section))
            self.section = section
            self.option = option
            self.args = (section, option)

    @ambv
    Copy link
    Contributor Author

    ambv commented Aug 8, 2010

    Patch updated.

    All docstrings now have a one-line summary.

    All multiline docstrings now have a newline character before the closing """.

    No method docstrings now include any additional newlines between them and the
    code. Most of them were okay, a couple were edited for consistency.

    All read_* methods now have a source= attribute. read_file defaults to <???>,
    read_string to <string>, read_dict to <dict>. Didn't provide any additional
    introspection because it's not trivial and I want this patch to be committed at
    last. This might be an additional advantage because a generic <string> or <dict>
    name may motivate programmers to specify a more context-specific source name of
    their own.

    As for Duplicate*Error, I've added source= and lineno= to both. Didn't add line=
    because it's useless, the error is very specific on what is wrong. Reading from
    a dictionary does not pass the lineno for obvious reasons. Reading from files
    and strings does.

    The filename' attribute and __init__ argument in ParsingError were PendingDeprecated and a source' attribute was introduced for consistency.

    allow_no_value was moved to the 3rd place in the argument list for backwards
    compatibility with Python 2.7. I didn't notice your change made to Python
    2.7 as well, all other new arguments were added by me. This ensures there's no
    backwards compatibility involved in their case. That's why I left all of the
    new arguments as keyword only.

    Documentation and unit tests updated.

    BTW, if allow_no_value made it to 2.7 this means it has a bug in
    SafeConfigParser.set. Try to set a valueless option, line 694 will raise an
    exception. Should I provide you with a separate patch only to fix this for
    2.7.1?

    PS. I made Vim show me too long lines with a delicate red background. Way better
    than the violent alternative ;) Plus, I only set it for the Python filetype.

    @freddrake
    Copy link
    Member

    Patch committed on py3k branch in r83889.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants