Enhance Object/structseq.c to match namedtuple and tuple api #46145

tiran · 2008-01-14T04:20:27Z

BPO	1820
Nosy	@malemburg, @birkenfeld, @rhettinger, @terryjreedy, @amauryfa, @abalkin, @salty-horse, @jackdied, @ericvsmith, @giampaolo, @tiran, @benjaminp, @merwok, @ericsnowcurrently
Files	structseq_subclasses_tuple.diff: patch which makes structseq a subclass of tuple structseq.diff: Tuple subclass patch with tests. structseq_1.patch: Add _fields, _replace, _asdict and eval(repr(s)) structseq_fields.patch: Add only _fields structseq_repr_issue.diff: repr() issue with unnamed fields

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2008-01-14.04:20:27.181>
labels = ['extension-modules', 'easy', 'type-feature', '3.9']
title = 'Enhance Object/structseq.c to match namedtuple and tuple api'
updated_at = <Date 2020-01-15.21:54:55.660>
user = 'https://github.com/tiran'

bugs.python.org fields:

activity = <Date 2020-01-15.21:54:55.660>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation = <Date 2008-01-14.04:20:27.181>
creator = 'christian.heimes'
dependencies = []
files = ['9159', '9170', '32525', '35367', '35802']
hgrepos = []
issue_num = 1820
keywords = ['patch', 'easy']
message_count = 42.0
messages = ['59888', '59891', '59949', '59955', '59956', '59957', '62830', '62831', '81627', '85097', '90461', '90472', '90506', '90953', '104559', '104562', '109503', '113451', '113453', '113456', '133351', '133355', '133364', '133367', '191872', '192849', '198291', '202330', '202333', '207994', '217301', '217337', '217365', '217468', '217512', '219037', '219150', '219297', '219381', '220315', '220316', '221902']
nosy_count = 21.0
nosy_names = ['lemburg', 'georg.brandl', 'rhettinger', 'terry.reedy', 'amaury.forgeotdarc', 'gjb1002', 'belopolsky', 'salty-horse', 'jackdied', 'eric.smith', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson', 'adlaiff6', 'eric.araujo', 'Arfrever', 'santoso.wijaya', 'eric.snow', 'sunfinite', 'abarnert', 'Philip Dye']
pr_nums = []
priority = 'low'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1820'
versions = ['Python 3.9']

Linked PRs

GH-46145: add _fields, _field_defaults, and _asdict to PyStructSequence #108648

The text was updated successfully, but these errors were encountered:

tiran · 2008-01-14T04:20:26Z

Raymond Hettinger wrote:

Here's a couple more if you want to proceed down that path:

Have structseq subclass from PyTupleObject so that isinstance(s,
tuple) returns True. This makes the object usable whenever
tuples are needed.
Add _fields, _asdict, and _replace to match the API in
collections.namedtuple(). The _fields tuple should only include the
visible positional fields while _asdict() and _replace() should include
all of the fields whether visible or accessible only by
attribute access.
Change the constructor to accept keyword args so that eval(repr(s))
== s works.

NOTE:
I've marked the task as easy but it's not a task for a total newbie.
It's a feasible yet challenging task for somebody who likes to get into
CPython core programming. Basic C knowledge is required!

adlaiff6 · 2008-01-14T07:03:20Z

Here is a patch for #1. I ran make test, and nothing was broken that
seemed to be my fault, so I assume it's okay.

Yes, it's small, it's my first one here. I'll get to the other two
tomorrow.

rhettinger · 2008-01-15T01:21:34Z

Thanks for the patch. I removed the whitespace changes and added some
tests to make sure structseq now works with the % formatting operator
and isinstance(t,tuple).

Am getting a sporadic segfault in test_zipimport when running "make
test", so holding-off on applying:

test_zipimport
~/py26/Lib/test/test_zipimport.py:91: ImportWarning: Not importing
directory '/home/rhettinger/py26/Modules/zlib': missing __init__.py
["__dummy__"])
make: *** [test] Segmentation fault

adlaiff6 · 2008-01-15T02:01:04Z

I just svn upped (it updated zipimport) and applied your patch, and
'./python Lib/test/regrtest.py test_zipimport.py' says it's okay, so I
would go ahead and commit it.

rhettinger · 2008-01-15T02:05:38Z

Run, "make test" a few times to make sure it doesn't bomb.

The problem may be due to needing a deferred_type instead of assigning
&PyTupleObject directly. Will look it more later.

rhettinger · 2008-01-15T03:03:57Z

It worked fine on a fresh check-out. Committed in revision 59967.

birkenfeld · 2008-02-23T22:49:46Z

Is there something else to be done for this to be closed?

rhettinger · 2008-02-23T22:54:02Z

All three items are still open. The second one is the easiest.

rhettinger · 2009-02-11T04:46:19Z

See also bpo-2308

rhettinger · 2009-04-01T21:16:27Z

Jack, do you have any interest in putting this one over the goal line?

rhettinger · 2009-07-12T22:48:16Z

In Py3.x, this fails:
"%s.%s.%s-%s-%s" % sys.version_info

The reason is that PyUnicode_Format() expects a real tuple, not a tuple
lookalike. The fix is to either have structseq inherit from tuple or to
modify PyUnicode_Format() to handle structseq:

   if (PyCheck_StructSeq(args)) {
      newargs = PyTuple_FromSequence(args);
      if (newargs == NULL)
          return NULL;
      result = PyUncode_Format(format, newargs);
      Py_DECREF(newargs);
      return result;
   }

malemburg · 2009-07-13T08:56:27Z

Raymond Hettinger wrote:

Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:

In Py3.x, this fails:
"%s.%s.%s-%s-%s" % sys.version_info

The reason is that PyUnicode_Format() expects a real tuple, not a tuple
lookalike. The fix is to either have structseq inherit from tuple or to
modify PyUnicode_Format() to handle structseq:

if (PyCheck_StructSeq(args)) {
newargs = PyTuple_FromSequence(args);
if (newargs == NULL)
return NULL;
result = PyUncode_Format(format, newargs);
Py_DECREF(newargs);
return result;
}

-1

The special-casing of tuples vs. non-tuples for % is already
bad enough. Adding structseq as another special case doesn't
make that any better.

What's so hard about writing

"%s.%s.%s-%s-%s" % tuple(sys.version_info)

anyway ?

birkenfeld · 2009-07-13T22:01:07Z

ISTM that structseq should have been a tuple subclass anyway. Isn't it
advertised as some sort of "tuple with attribute access"?

salty-horse · 2009-07-26T18:03:05Z

For those who missed it, the patch that was committed in r59967 was
quickly reverted in r59970 with the comment:

"Temporarily revert 59967 until GC can be added."

Raymond, can you please explain what was missing from the patch?

ericvsmith · 2010-04-29T18:21:47Z

See also bpo-8413, which would be addressed by this change.

rhettinger · 2010-04-29T18:34:17Z

I agree that the priority is higher now that we have a demonstrable regression.

Getting structseq to subclass from tuple will take some effort (tuples have many methods that would need to be overriden).

benjaminp · 2010-07-07T21:31:36Z

structseq now does subclass tuple, so if there's any interest in adding namedtuple APIs, now it should be easier.

terryjreedy · 2010-08-09T18:50:49Z

Would whatever remains of this be deferred by the PEP-3003 moratorium?

ericvsmith · 2010-08-09T18:55:12Z

I don't think it would be covered by the moratorium, since it's not a language change. The change to make structseq derive from tuple was not subject to the moratorium, for example.

rhettinger · 2010-08-09T19:02:01Z

This is definitely not covered by the language moratorium. Guido has requested this change and it needs to go forward.

amauryfa · 2011-04-08T23:02:13Z

bpo-5907 would benefit of this change.
Unfortunately, structseq constructors already have keyword arguments; they are equivalent to "def __new__(cls, sequence, dict=NULL)".
OTOH these keywords arguments are not documented anywhere.

I suggest to change the constructor to something equivalent to:
"def __new__(cls, sequence=NULL, dict=NULL, *, field1=NULL, field2=NULL)"

rhettinger · 2011-04-08T23:23:56Z

Also see bpo-11698

sunfinite · 2013-11-07T12:40:48Z

New patch for 3.4 adds the following:

_fields
_replace()
_asdict()
eval(repr(s)) == s

Now the issues:

_asdict() returns a normal dictionary. I don't know if this is what
is required.
Both _asdict() and _replace() assume that unnamed visible fields are
at the end of the visible sequence. A comment at the beginning says
they are allowed for indices < n_visible_fields.

Is there another way to map members to values? Because tp->members
is (visible named fields + invisible named fields) whereas values
array is (visible named fields + visible unnamed fields + invisible
named fields)
The mismatch mentioned above is present in the current
implementation of repr:

In os.stat_result, the last three visible fields are unnamed
(i.e integer a_time, m_time and c_time). However they are present
in the repr output with keys which are the first three keys from the
invisible part(float a_time, m_time and c_time).
Was this intentional?

Also, the above logic causes duplicate keys when invisible fields
are included in the repr output as per bpo-11629.

In my patch for that issue, i put invisible fields under the 'dict'
keyword argument. This output format utilises code already present
in structseq_new and makes eval work as expected when invisible
fields are present in repr. Also, it sidesteps the question of
duplicated keys because they are present inside a dict.
Raymond stated that _fields should contain only the visible
positional fields. So _fields of os.stat_result contains only 7
fields because of the three unnamed fields. Is this the expected
implementation?
Is there a way to declare a member function in C that accepts only
keyword arguments(like _replace())? I could not find one.

This is my first real C patch. So some of the issues might just be a
lack of understanding. Apologies.

sunfinite · 2013-11-07T12:49:00Z

Oops, the correct issue for improving the repr is bpo-11698.

abarnert · 2014-01-13T01:31:59Z

See bpo-20230, which includes a patch implementing just the first part of part 2 (adding _fields, but not _asdict or _replace).

Since this one has been open for 5+ years, if it's not going to be done soon, any chance of the easy (_fields) change being committed and putting off the harder parts until later?

skrah · 2014-04-27T18:14:46Z

> 1. _asdict() returns a normal dictionary. I don't know if this is what
> is required.

Good question. I don't think we can import OrderedDict from collections
because of the impact on startup time (_collections_abc was created to
avoid the issue for MutableMapping).

rhettinger · 2014-04-28T02:17:25Z

any chance of the easy (_fields) change being committed
and putting off the harder parts until later?

Yes, it is not an all-or-nothing exercise.

> 1. _asdict() returns a normal dictionary. I don't know if this is what
> is required.

A regular dict would work just fine for now (there is a patch to introduce an OrderedDict in C, but that transition could be done later.

skrah · 2014-04-28T11:50:53Z

Is accessing _fields a common operation? Personally I'd use a
PyGetSetDef and generate the tuple on access (perhaps cache the
result).

rhettinger · 2014-04-29T05:23:13Z

Is accessing _fields a common operation?

Not common at all -- it is used for introspection.

That said, there is nearly zero savings from generating the tuple upon look-up. I would say that using a PyGetSetDef is a false optimization.

skrah · 2014-04-29T11:55:14Z

Okay, if no one else wants this, I'll go ahead with the _fields part.

Andrew, could you sign a contributor agreement?

skrah · 2014-05-24T12:19:52Z

Hi Andrew. Did you by any chance sign the contributor agreement?

[It's perfectly okay if you don't want to, but then we cannot use
the patch from bpo-20230.]

sunfinite · 2014-05-26T10:17:34Z

Hi Stefan,

I've added a new patch which only adds _fields, combining parts from my earlier patch and Andrew's (his patch does not account for visible unnamed fields).

skrah · 2014-05-28T20:36:08Z

Sunny, is there a definition of "visible positional fields"? Currently,
it seems to me that in os.stat_result we have the opposite problem, namely
"visible non-positional" fields:

For example, st_atime_ns is visible but not indexable:

os.stat_result(st_mode=33188, st_ino=524840, st_dev=64513, st_nlink=1, st_uid=0, st_gid=0, st_size=2113, st_atime=1401225421, st_mtime=1398786163, st_ctime=1398786163)
>>> x[9]
1398786163
>>> x.st_atime_ns
1401225421887116783
>>> x[10] 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: tuple index out of range

sunfinite · 2014-05-30T09:38:43Z

Hi Stefan,

There is a comment at the top in structseq.c

/* Fields with this name have only a field index, not a field name.
   They are only allowed for indices < n_visible_fields. */
char *PyStructSequence_UnnamedField = "unnamed field";

This is the definition of os.stat_result in posixmodule.c:

static PyStructSequence_Field stat_result_fields[] = {
{"st_mode", "protection bits"},
{"st_ino", "inode"},
{"st_dev", "device"},
{"st_nlink", "number of hard links"},
{"st_uid", "user ID of owner"},
{"st_gid", "group ID of owner"},
{"st_size", "total size, in bytes"},
/* The NULL is replaced with PyStructSequence_UnnamedField later. */
{NULL, "integer time of last access"},
{NULL, "integer time of last modification"},
{NULL, "integer time of last change"},
{"st_atime", "time of last access"},
{"st_mtime", "time of last modification"},
{"st_ctime", "time of last change"},
{"st_atime_ns", "time of last access in nanoseconds"},
{"st_mtime_ns", "time of last modification in nanoseconds"},
{"st_ctime_ns", "time of last change in nanoseconds"},
...

By visible i mean the values present in the repr of the object and by-extension accessible by position.

I talked about my problems with os.stat_result in points 3 and 4 of msg202333 i.e repr of os.stat_result takes the first three keys from the attribute-access only fields (the invisible part) and uses them for the last three values in the displayed structseq.

With my current patch, _fields for os.stat_result only has 7 values:

>>> x = os.stat('.')
>>> x._fields
('st_mode', 'st_ino', 'st_dev', 'st_nlink', 'st_uid', 'st_gid', 'st_size')
>>>

Is this what you expect?

abarnert · 2014-06-11T22:30:03Z

Hi, Stephan. Sorry, for some reason Yahoo was sending updates from the tracker to spam again, so I missed this. I'd be glad to sign a contributor agreement if it's still relevant, but it looks like there's a later patch that does what mine did and more?

abarnert · 2014-06-11T22:32:59Z

Sorry, Stefan, not Stephan. Anyway, I've signed the agreement.

skrah · 2014-06-29T21:13:17Z

Andrew, thanks for signing the agreement!

[Sunny]

Is this what you expect?

I find the initialization in os.stat_result somewhat strange. Also,
a certain use of unnamed fields that worked in 2.5 is now broken,
which we should sort out before proceeding any further:

Apply structseq_repr_issue.diff, then:

>>> from _testcapi import mk_structseq
>>> s = mk_structseq("unnamed_end")
>>> s[3]
3
>>> tuple(s)
(0, 1, 2, 3, 4)
>>> s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: In structseq_repr(), member 3 name is NULL for type _testcapi.struct_unnamed_end

Perhaps we should just add the missing field names, like namedtuple
does with rename=True:

(a=1, b=2, c=3, _4=4, _5=5)

In any case, in 2.5 the entire tuple was printed as the repr,
regardless of unnamed fields.

kesmit13 · 2023-08-01T15:56:14Z

I just wondered if this project still had life in it. I came across an issue with structseqs and pandas. pandas will automatically apply column names based on the names of fields in a sequence of tuples, however, it uses the _fields attribute to get that information. I support a database connector that uses structseqs because of the performance over namedtuples, but it does make our integration with pandas take a bit more work because of the missing _fields attribute.

FFY00 · 2023-09-01T03:55:33Z

There's an implementation for _fields, _field_defaults, and _asdict in GH-108648, based on the discussion in the duplicated issue GH-108647.

serhiy-storchaka · 2023-09-01T11:29:07Z

#108751 will make adding _replace unneeded. Instead we should add __replace__.

asdict could be generalized if there is a list of fields. Unfortunately, the semantic of _fields in named tuples and __dataclass_fields__ in dataclasses are very different.

tiran added interpreter-core (Objects, Python, Grammar, and Parser dirs) easy type-feature A feature request or enhancement labels Jan 14, 2008

rhettinger self-assigned this Jan 15, 2008

rhettinger assigned jackdied and unassigned rhettinger Apr 1, 2009

malemburg changed the title ~~Enhance Object/structseq.c to match namedtuple and tuple api~~ Enhance Object/structseq.c to match namedtuple and tuple api Jul 13, 2009

rhettinger unassigned jackdied Apr 8, 2011

skrah mannequin changed the title ~~Enhance Object/structseq.c to match namedtuple and tuple api~~ Enhance Object/structseq.c to match namedtuple and tuple api Jun 29, 2014

abalkin added the 3.7 (EOL) end of life label Oct 3, 2016

rhettinger added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Jan 2, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

merwok added 3.13 bugs and security fixes and removed 3.9 only security fixes labels Aug 1, 2023

serhiy-storchaka mentioned this issue Aug 30, 2023

Increase PyStructSequence compatibility with collections.namedtuple #108647

Closed

bedevere-bot mentioned this issue Sep 1, 2023

GH-46145: add _fields, _field_defaults, and _asdict to PyStructSequence #108648

Open

hugovk removed the easy label Sep 8, 2023

This was referenced Dec 4, 2023

[pytree] support PyStructSequence types for Python pytree pytorch/pytorch#113258

Open

refactor: reorder cases for namedtuple and PyStructSequence types metaopt/optree#111

Merged

Enhance Object/structseq.c to match namedtuple and tuple api #46145

Enhance Object/structseq.c to match namedtuple and tuple api #46145

Comments

tiran commented Jan 14, 2008 • edited by bedevere-bot Loading

Linked PRs

tiran commented Jan 14, 2008

adlaiff6 mannequin commented Jan 14, 2008

rhettinger commented Jan 15, 2008

adlaiff6 mannequin commented Jan 15, 2008

rhettinger commented Jan 15, 2008

rhettinger commented Jan 15, 2008

birkenfeld commented Feb 23, 2008

rhettinger commented Feb 23, 2008

rhettinger commented Feb 11, 2009

rhettinger commented Apr 1, 2009

rhettinger commented Jul 12, 2009

malemburg commented Jul 13, 2009

birkenfeld commented Jul 13, 2009

salty-horse mannequin commented Jul 26, 2009

ericvsmith commented Apr 29, 2010

rhettinger commented Apr 29, 2010

benjaminp commented Jul 7, 2010

terryjreedy commented Aug 9, 2010

ericvsmith commented Aug 9, 2010

rhettinger commented Aug 9, 2010

amauryfa commented Apr 8, 2011

rhettinger commented Apr 8, 2011

sunfinite mannequin commented Nov 7, 2013

sunfinite mannequin commented Nov 7, 2013

abarnert mannequin commented Jan 13, 2014

skrah mannequin commented Apr 27, 2014

rhettinger commented Apr 28, 2014

skrah mannequin commented Apr 28, 2014

rhettinger commented Apr 29, 2014

skrah mannequin commented Apr 29, 2014

skrah mannequin commented May 24, 2014

sunfinite mannequin commented May 26, 2014

skrah mannequin commented May 28, 2014

sunfinite mannequin commented May 30, 2014

abarnert mannequin commented Jun 11, 2014

abarnert mannequin commented Jun 11, 2014

skrah mannequin commented Jun 29, 2014

kesmit13 commented Aug 1, 2023

FFY00 commented Sep 1, 2023

serhiy-storchaka commented Sep 1, 2023

tiran commented Jan 14, 2008 •

edited by bedevere-bot

Loading