Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Object/structseq.c to match namedtuple and tuple api #46145

Open
tiran opened this issue Jan 14, 2008 · 45 comments
Open

Enhance Object/structseq.c to match namedtuple and tuple api #46145

tiran opened this issue Jan 14, 2008 · 45 comments
Labels
3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@tiran
Copy link
Member

tiran commented Jan 14, 2008

BPO 1820
Nosy @malemburg, @birkenfeld, @rhettinger, @terryjreedy, @amauryfa, @abalkin, @salty-horse, @jackdied, @ericvsmith, @giampaolo, @tiran, @benjaminp, @merwok, @ericsnowcurrently
Files
  • structseq_subclasses_tuple.diff: patch which makes structseq a subclass of tuple
  • structseq.diff: Tuple subclass patch with tests.
  • structseq_1.patch: Add _fields, _replace, _asdict and eval(repr(s))
  • structseq_fields.patch: Add only _fields
  • structseq_repr_issue.diff: repr() issue with unnamed fields
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2008-01-14.04:20:27.181>
    labels = ['extension-modules', 'easy', 'type-feature', '3.9']
    title = 'Enhance Object/structseq.c to match namedtuple and tuple api'
    updated_at = <Date 2020-01-15.21:54:55.660>
    user = 'https://github.com/tiran'

    bugs.python.org fields:

    activity = <Date 2020-01-15.21:54:55.660>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2008-01-14.04:20:27.181>
    creator = 'christian.heimes'
    dependencies = []
    files = ['9159', '9170', '32525', '35367', '35802']
    hgrepos = []
    issue_num = 1820
    keywords = ['patch', 'easy']
    message_count = 42.0
    messages = ['59888', '59891', '59949', '59955', '59956', '59957', '62830', '62831', '81627', '85097', '90461', '90472', '90506', '90953', '104559', '104562', '109503', '113451', '113453', '113456', '133351', '133355', '133364', '133367', '191872', '192849', '198291', '202330', '202333', '207994', '217301', '217337', '217365', '217468', '217512', '219037', '219150', '219297', '219381', '220315', '220316', '221902']
    nosy_count = 21.0
    nosy_names = ['lemburg', 'georg.brandl', 'rhettinger', 'terry.reedy', 'amaury.forgeotdarc', 'gjb1002', 'belopolsky', 'salty-horse', 'jackdied', 'eric.smith', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson', 'adlaiff6', 'eric.araujo', 'Arfrever', 'santoso.wijaya', 'eric.snow', 'sunfinite', 'abarnert', 'Philip Dye']
    pr_nums = []
    priority = 'low'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1820'
    versions = ['Python 3.9']

    Linked PRs

    @tiran
    Copy link
    Member Author

    tiran commented Jan 14, 2008

    Raymond Hettinger wrote:

    Here's a couple more if you want to proceed down that path:

    1. Have structseq subclass from PyTupleObject so that isinstance(s,
      tuple) returns True. This makes the object usable whenever
      tuples are needed.

    2. Add _fields, _asdict, and _replace to match the API in
      collections.namedtuple(). The _fields tuple should only include the
      visible positional fields while _asdict() and _replace() should include
      all of the fields whether visible or accessible only by
      attribute access.

    3. Change the constructor to accept keyword args so that eval(repr(s))
      == s works.

    NOTE:
    I've marked the task as easy but it's not a task for a total newbie.
    It's a feasible yet challenging task for somebody who likes to get into
    CPython core programming. Basic C knowledge is required!

    @tiran tiran added interpreter-core (Objects, Python, Grammar, and Parser dirs) easy type-feature A feature request or enhancement labels Jan 14, 2008
    @adlaiff6
    Copy link
    Mannequin

    adlaiff6 mannequin commented Jan 14, 2008

    Here is a patch for #1. I ran make test, and nothing was broken that
    seemed to be my fault, so I assume it's okay.

    Yes, it's small, it's my first one here. I'll get to the other two
    tomorrow.

    @rhettinger rhettinger self-assigned this Jan 15, 2008
    @rhettinger
    Copy link
    Contributor

    Thanks for the patch. I removed the whitespace changes and added some
    tests to make sure structseq now works with the % formatting operator
    and isinstance(t,tuple).

    Am getting a sporadic segfault in test_zipimport when running "make
    test", so holding-off on applying:

    test_zipimport
    ~/py26/Lib/test/test_zipimport.py:91: ImportWarning: Not importing
    directory '/home/rhettinger/py26/Modules/zlib': missing __init__.py
    ["__dummy__"])
    make: *** [test] Segmentation fault

    @adlaiff6
    Copy link
    Mannequin

    adlaiff6 mannequin commented Jan 15, 2008

    I just svn upped (it updated zipimport) and applied your patch, and
    './python Lib/test/regrtest.py test_zipimport.py' says it's okay, so I
    would go ahead and commit it.

    @rhettinger
    Copy link
    Contributor

    Run, "make test" a few times to make sure it doesn't bomb.

    The problem may be due to needing a deferred_type instead of assigning
    &PyTupleObject directly. Will look it more later.

    @rhettinger
    Copy link
    Contributor

    It worked fine on a fresh check-out. Committed in revision 59967.

    @birkenfeld
    Copy link
    Member

    Is there something else to be done for this to be closed?

    @rhettinger
    Copy link
    Contributor

    All three items are still open. The second one is the easiest.

    @rhettinger
    Copy link
    Contributor

    See also bpo-2308

    @rhettinger
    Copy link
    Contributor

    Jack, do you have any interest in putting this one over the goal line?

    @rhettinger rhettinger assigned jackdied and unassigned rhettinger Apr 1, 2009
    @rhettinger
    Copy link
    Contributor

    In Py3.x, this fails:
    "%s.%s.%s-%s-%s" % sys.version_info

    The reason is that PyUnicode_Format() expects a real tuple, not a tuple
    lookalike. The fix is to either have structseq inherit from tuple or to
    modify PyUnicode_Format() to handle structseq:

       if (PyCheck_StructSeq(args)) {
          newargs = PyTuple_FromSequence(args);
          if (newargs == NULL)
              return NULL;
          result = PyUncode_Format(format, newargs);
          Py_DECREF(newargs);
          return result;
       }

    @malemburg
    Copy link
    Member

    Raymond Hettinger wrote:

    Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:

    In Py3.x, this fails:
    "%s.%s.%s-%s-%s" % sys.version_info

    The reason is that PyUnicode_Format() expects a real tuple, not a tuple
    lookalike. The fix is to either have structseq inherit from tuple or to
    modify PyUnicode_Format() to handle structseq:

    if (PyCheck_StructSeq(args)) {
    newargs = PyTuple_FromSequence(args);
    if (newargs == NULL)
    return NULL;
    result = PyUncode_Format(format, newargs);
    Py_DECREF(newargs);
    return result;
    }

    -1

    The special-casing of tuples vs. non-tuples for % is already
    bad enough. Adding structseq as another special case doesn't
    make that any better.

    What's so hard about writing

    "%s.%s.%s-%s-%s" % tuple(sys.version_info)

    anyway ?

    @malemburg malemburg changed the title Enhance Object/structseq.c to match namedtuple and tuple api Enhance Object/structseq.c to match namedtuple and tuple api Jul 13, 2009
    @birkenfeld
    Copy link
    Member

    ISTM that structseq should have been a tuple subclass anyway. Isn't it
    advertised as some sort of "tuple with attribute access"?

    @salty-horse
    Copy link
    Mannequin

    salty-horse mannequin commented Jul 26, 2009

    For those who missed it, the patch that was committed in r59967 was
    quickly reverted in r59970 with the comment:

    "Temporarily revert 59967 until GC can be added."

    Raymond, can you please explain what was missing from the patch?

    @ericvsmith
    Copy link
    Member

    See also bpo-8413, which would be addressed by this change.

    @rhettinger
    Copy link
    Contributor

    I agree that the priority is higher now that we have a demonstrable regression.

    Getting structseq to subclass from tuple will take some effort (tuples have many methods that would need to be overriden).

    @benjaminp
    Copy link
    Contributor

    structseq now does subclass tuple, so if there's any interest in adding namedtuple APIs, now it should be easier.

    @terryjreedy
    Copy link
    Member

    Would whatever remains of this be deferred by the PEP-3003 moratorium?

    @ericvsmith
    Copy link
    Member

    I don't think it would be covered by the moratorium, since it's not a language change. The change to make structseq derive from tuple was not subject to the moratorium, for example.

    @rhettinger
    Copy link
    Contributor

    This is definitely not covered by the language moratorium. Guido has requested this change and it needs to go forward.

    @amauryfa
    Copy link
    Member

    amauryfa commented Apr 8, 2011

    bpo-5907 would benefit of this change.
    Unfortunately, structseq constructors already have keyword arguments; they are equivalent to "def __new__(cls, sequence, dict=NULL)".
    OTOH these keywords arguments are not documented anywhere.

    I suggest to change the constructor to something equivalent to:
    "def __new__(cls, sequence=NULL, dict=NULL, *, field1=NULL, field2=NULL)"

    @rhettinger
    Copy link
    Contributor

    Also see bpo-11698

    @sunfinite
    Copy link
    Mannequin

    sunfinite mannequin commented Nov 7, 2013

    New patch for 3.4 adds the following:

    1. _fields
    2. _replace()
    3. _asdict()
    4. eval(repr(s)) == s

    Now the issues:

    1. _asdict() returns a normal dictionary. I don't know if this is what
      is required.

    2. Both _asdict() and _replace() assume that unnamed visible fields are
      at the end of the visible sequence. A comment at the beginning says
      they are allowed for indices < n_visible_fields.

      Is there another way to map members to values? Because tp->members
      is (visible named fields + invisible named fields) whereas values
      array is (visible named fields + visible unnamed fields + invisible
      named fields)

    3. The mismatch mentioned above is present in the current
      implementation of repr:

      In os.stat_result, the last three visible fields are unnamed
      (i.e integer a_time, m_time and c_time). However they are present
      in the repr output with keys which are the first three keys from the
      invisible part(float a_time, m_time and c_time).
      Was this intentional?

      Also, the above logic causes duplicate keys when invisible fields
      are included in the repr output as per bpo-11629.

      In my patch for that issue, i put invisible fields under the 'dict'
      keyword argument. This output format utilises code already present
      in structseq_new and makes eval work as expected when invisible
      fields are present in repr. Also, it sidesteps the question of
      duplicated keys because they are present inside a dict.

    4. Raymond stated that _fields should contain only the visible
      positional fields. So _fields of os.stat_result contains only 7
      fields because of the three unnamed fields. Is this the expected
      implementation?

    5. Is there a way to declare a member function in C that accepts only
      keyword arguments(like _replace())? I could not find one.

    This is my first real C patch. So some of the issues might just be a
    lack of understanding. Apologies.

    @sunfinite
    Copy link
    Mannequin

    sunfinite mannequin commented Nov 7, 2013

    Oops, the correct issue for improving the repr is bpo-11698.

    @abarnert
    Copy link
    Mannequin

    abarnert mannequin commented Jan 13, 2014

    See bpo-20230, which includes a patch implementing just the first part of part 2 (adding _fields, but not _asdict or _replace).

    Since this one has been open for 5+ years, if it's not going to be done soon, any chance of the easy (_fields) change being committed and putting off the harder parts until later?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 27, 2014

    > 1. _asdict() returns a normal dictionary. I don't know if this is what
    > is required.

    Good question. I don't think we can import OrderedDict from collections
    because of the impact on startup time (_collections_abc was created to
    avoid the issue for MutableMapping).

    @rhettinger
    Copy link
    Contributor

    any chance of the easy (_fields) change being committed
    and putting off the harder parts until later?

    Yes, it is not an all-or-nothing exercise.

    > 1. _asdict() returns a normal dictionary. I don't know if this is what
    > is required.

    A regular dict would work just fine for now (there is a patch to introduce an OrderedDict in C, but that transition could be done later.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 28, 2014

    Is accessing _fields a common operation? Personally I'd use a
    PyGetSetDef and generate the tuple on access (perhaps cache the
    result).

    @rhettinger
    Copy link
    Contributor

    Is accessing _fields a common operation?

    Not common at all -- it is used for introspection.

    That said, there is nearly zero savings from generating the tuple upon look-up. I would say that using a PyGetSetDef is a false optimization.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 29, 2014

    Okay, if no one else wants this, I'll go ahead with the _fields part.

    Andrew, could you sign a contributor agreement?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 24, 2014

    Hi Andrew. Did you by any chance sign the contributor agreement?

    [It's perfectly okay if you don't want to, but then we cannot use
    the patch from bpo-20230.]

    @sunfinite
    Copy link
    Mannequin

    sunfinite mannequin commented May 26, 2014

    Hi Stefan,

    I've added a new patch which only adds _fields, combining parts from my earlier patch and Andrew's (his patch does not account for visible unnamed fields).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 28, 2014

    Sunny, is there a definition of "visible positional fields"? Currently,
    it seems to me that in os.stat_result we have the opposite problem, namely
    "visible non-positional" fields:

    For example, st_atime_ns is visible but not indexable:

    os.stat_result(st_mode=33188, st_ino=524840, st_dev=64513, st_nlink=1, st_uid=0, st_gid=0, st_size=2113, st_atime=1401225421, st_mtime=1398786163, st_ctime=1398786163)
    >>> x[9]
    1398786163
    >>> x.st_atime_ns
    1401225421887116783
    >>> x[10] 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: tuple index out of range

    @sunfinite
    Copy link
    Mannequin

    sunfinite mannequin commented May 30, 2014

    Hi Stefan,

    There is a comment at the top in structseq.c

    /* Fields with this name have only a field index, not a field name.
       They are only allowed for indices < n_visible_fields. */
    char *PyStructSequence_UnnamedField = "unnamed field";

    This is the definition of os.stat_result in posixmodule.c:

    static PyStructSequence_Field stat_result_fields[] = {
    {"st_mode", "protection bits"},
    {"st_ino", "inode"},
    {"st_dev", "device"},
    {"st_nlink", "number of hard links"},
    {"st_uid", "user ID of owner"},
    {"st_gid", "group ID of owner"},
    {"st_size", "total size, in bytes"},
    /* The NULL is replaced with PyStructSequence_UnnamedField later. */
    {NULL, "integer time of last access"},
    {NULL, "integer time of last modification"},
    {NULL, "integer time of last change"},
    {"st_atime", "time of last access"},
    {"st_mtime", "time of last modification"},
    {"st_ctime", "time of last change"},
    {"st_atime_ns", "time of last access in nanoseconds"},
    {"st_mtime_ns", "time of last modification in nanoseconds"},
    {"st_ctime_ns", "time of last change in nanoseconds"},
    ...

    By visible i mean the values present in the repr of the object and by-extension accessible by position.

    I talked about my problems with os.stat_result in points 3 and 4 of msg202333 i.e repr of os.stat_result takes the first three keys from the attribute-access only fields (the invisible part) and uses them for the last three values in the displayed structseq.

    With my current patch, _fields for os.stat_result only has 7 values:

    >>> x = os.stat('.')
    >>> x._fields
    ('st_mode', 'st_ino', 'st_dev', 'st_nlink', 'st_uid', 'st_gid', 'st_size')
    >>>

    Is this what you expect?

    @abarnert
    Copy link
    Mannequin

    abarnert mannequin commented Jun 11, 2014

    Hi, Stephan. Sorry, for some reason Yahoo was sending updates from the tracker to spam again, so I missed this. I'd be glad to sign a contributor agreement if it's still relevant, but it looks like there's a later patch that does what mine did and more?

    @abarnert
    Copy link
    Mannequin

    abarnert mannequin commented Jun 11, 2014

    Sorry, Stefan, not Stephan. Anyway, I've signed the agreement.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 29, 2014

    Andrew, thanks for signing the agreement!

    [Sunny]

    Is this what you expect?

    I find the initialization in os.stat_result somewhat strange. Also,
    a certain use of unnamed fields that worked in 2.5 is now broken,
    which we should sort out before proceeding any further:

    Apply structseq_repr_issue.diff, then:

    >>> from _testcapi import mk_structseq
    >>> s = mk_structseq("unnamed_end")
    >>> s[3]
    3
    >>> tuple(s)
    (0, 1, 2, 3, 4)
    >>> s
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    SystemError: In structseq_repr(), member 3 name is NULL for type _testcapi.struct_unnamed_end

    Perhaps we should just add the missing field names, like namedtuple
    does with rename=True:

    (a=1, b=2, c=3, _4=4, _5=5)

    In any case, in 2.5 the entire tuple was printed as the repr,
    regardless of unnamed fields.

    @skrah skrah mannequin changed the title Enhance Object/structseq.c to match namedtuple and tuple api Enhance Object/structseq.c to match namedtuple and tuple api Jun 29, 2014
    @abalkin abalkin added the 3.7 (EOL) end of life label Oct 3, 2016
    @rhettinger rhettinger added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Jan 2, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kesmit13
    Copy link

    kesmit13 commented Aug 1, 2023

    I just wondered if this project still had life in it. I came across an issue with structseqs and pandas. pandas will automatically apply column names based on the names of fields in a sequence of tuples, however, it uses the _fields attribute to get that information. I support a database connector that uses structseqs because of the performance over namedtuples, but it does make our integration with pandas take a bit more work because of the missing _fields attribute.

    @FFY00
    Copy link
    Member

    FFY00 commented Sep 1, 2023

    There's an implementation for _fields, _field_defaults, and _asdict in GH-108648, based on the discussion in the duplicated issue GH-108647.

    @serhiy-storchaka
    Copy link
    Member

    #108751 will make adding _replace unneeded. Instead we should add __replace__.

    asdict could be generalized if there is a list of fields. Unfortunately, the semantic of _fields in named tuples and __dataclass_fields__ in dataclasses are very different.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests