Auto generate excessive long parametrized ids #7254

symonk · 2020-05-24T20:25:15Z

Evening maintainers, I've opened this PR to get some feedback on this approach. I'm not entirely sure on the best area to solve this problem, using parameterised with very large datasets gets automatically appended onto the generated/resolved ids. This poses a problem later when trying to update the PYTEST_CURRENT_TEST environment variable on a windows based system, doing so will raise a ValueError from setitem. This attempt at patching it checks for a windows based operating system and enforces a 100 char limit, exceeding the limit for an individual id will rewrite using uuid4 based auto-generated template name.

Looking some maintainer feedback on the following:

Is this the right area to solve the problem?
I wanted to make an opt for overriding the limit here but struggled with guaranteeing MetaFunc has access to the pytest config consistently
is this even an appropriate kind of solution / is there a preferred way of auto generating instead of the one outlined here
This area of the system seems massively used, I'm sure there are tons of edge cases I haven't considered here

The code base is very big, so I'm relying solely on local test failure(s) to detect anything chaotic, but I would appreciate any tips / feedback for a new comer here.

…ter than 100 on windows

RonnyPfannschmidt · 2020-05-24T20:30:05Z

src/_pytest/python.py

@@ -1230,6 +1231,14 @@ def idmaker(
        _idvalset(valindex, parameterset, argnames, idfn, ids, config=config, item=item)
        for valindex, parameterset in enumerate(parametersets)
    ]
+    # rewrite parametrized ids greater than a 100 limit


we have to do this for all ids and we should use the auto-id mechanism for that + warn

it has to happen on all platforms, else users get different test ids on different platforms thus destroying reasonably simple aggregation

Thanks @RonnyPfannschmidt for your time, i'm struggling to warn here without causing an error and failing to collect tests

at first glance i would propose to use the node.warn apis if applicable, and to move the limit into the _idval helper function in order to get a normal numbered id like other tests with non-repressentable ids

@RonnyPfannschmidt any advice here? i am struggling to warn here with warnings.warn() / item/warn() ? node.warn I am not too familiar with. Attempting to warning is just erroring, I assume due to collection/interpretation.

ill at best be able to look into this mid week as im currently preoccupied

symonk · 2020-05-25T13:48:49Z

struggling to warn here using warnings.warn / item.warn inside _idva, think its because this is happening during collection. Also unsure what to do with ids=[] value(s) regarding this

symonk · 2020-05-25T13:50:19Z

testing/python/metafunc.py

-                "δοκ.ιμή@παράδειγμα.δοκιμή",
-                "\\u03b4\\u03bf\\u03ba.\\u03b9\\u03bc\\u03ae@\\u03c0\\u03b1\\u03c1\\u03ac\\u03b4\\u03b5\\u03b9\\u03b3"
-                "\\u03bc\\u03b1.\\u03b4\\u03bf\\u03ba\\u03b9\\u03bc\\u03ae",
+                "δοκ.ιμή@δοκιμή",


converted bytes now exceed the 100 limit, unsure of the data set so shortened it, not sure if acceptable

this one makes me wonder if a "character" limit of 100 might be better than a bytes limit of 100

as far as i can guess the source data is a most interesting mail address and the shortened one isnt

I'm curious why we even escape ID names to ASCII? Is there a reason not to allow Unicode IDs?

OK found the answer: disable_test_id_escaping_and_forfeit_all_rights_to_community_support hehe.

bluetech

I think it's a good idea to avoid excessively long IDs.

But, I think that if the user explicitly provides a long ID, we should not forfeit it, but the code as written does.

bluetech · 2020-05-30T12:24:37Z

src/_pytest/python.py

@@ -1181,19 +1181,25 @@ def _idval(
        if hook_id:
            return hook_id

+    modified_val = None  # empty strings are valid as val


I think a better name for this variable would be id. Because I see val as the input value, and the modified_val here is actually the intended ID string for that value.

thanks @bluetech will revisit this during this week. - Any other thoughts on a more appropriate name? I wouldnt want to shadow the id() builtin here

bluetech · 2020-05-30T12:30:25Z

src/_pytest/python.py

+    # val does not always enter the isinstance checks due to its type
+    # when it does we will check the string length returned and use auto generation of ids when over 100 chars
+    # on types which do not match isinstance checks pytest uses auto generate of ids automatically anyway


I suggest to reword this a bit more succinctly:

Suggested change

# val does not always enter the isinstance checks due to its type

# when it does we will check the string length returned and use auto generation of ids when over 100 chars

# on types which do not match isinstance checks pytest uses auto generate of ids automatically anyway

# If val is not of one of the types we support above, or if it ends up too long to

# be useful, use a generated name based on the argument name and index instead.

bluetech · 2020-05-30T12:33:15Z

testing/python/metafunc.py

-                "δοκ.ιμή@παράδειγμα.δοκιμή",
-                "\\u03b4\\u03bf\\u03ba.\\u03b9\\u03bc\\u03ae@\\u03c0\\u03b1\\u03c1\\u03ac\\u03b4\\u03b5\\u03b9\\u03b3"
-                "\\u03bc\\u03b1.\\u03b4\\u03bf\\u03ba\\u03b9\\u03bc\\u03ae",
+                "δοκ.ιμή@δοκιμή",


I'm curious why we even escape ID names to ASCII? Is there a reason not to allow Unicode IDs?

OK found the answer: disable_test_id_escaping_and_forfeit_all_rights_to_community_support hehe.

bluetech · 2020-05-30T12:37:44Z

@symonk I updated the PR description to add Fixes #6881.

symonk · 2020-06-01T12:21:58Z

I think it's a good idea to avoid excessively long IDs.

But, I think that if the user explicitly provides a long ID, we should not forfeit it, but the code as written does.

we are already doing it with certain types of data. I am not sure how to overcome this - should it perhaps be a configurable option that users can choose to enable? Given that PYTEST_CURRENT_TEST is rewritten, maybe even a larger default before rewriting would be sensible, to perhaps rewrite when the data is getting extremely big? The user can hack around the windows limits (manually), but should pytest expect them to do so?

nicoddemus · 2021-10-05T19:59:47Z

Gentle ping. 👍

kurtmckee · 2023-03-17T03:44:17Z

Another gentle ping.

I'm encountering this while migrating the feedparser test suite to pytest. It's possible to overcome this by using an explicit ids= parameter but it is unexpected to encounter a ValueError from pytest.

I'm very willing to help drive this to closure. If help or updates are needed, please reply to this PR or to #6881. I'm subscribing to both.

RonnyPfannschmidt · 2023-03-17T04:52:41Z

I'm wondering if we should just fall back to autogenerated ids or for long values +warn

kurtmckee · 2023-03-17T11:32:23Z

@RonnyPfannschmidt, that would solve my use case. Do you want a new PR that accomplishes that?

RonnyPfannschmidt · 2023-03-17T12:33:27Z

That could be a starting point, i wonder if it would make sense to use content hashes for long content

kurtmckee · 2023-03-17T13:33:08Z

i wonder if it would make sense to use content hashes

That may be helpful, if uniqueness is a requirement (or even just desirable). My use case involves multiplying a string by a large number to test a performance optimization that only tries decoding n bytes of input to determine the character set. I imagine that others may encounter similar situations.

I won't be able to pick up this work immediately due to work and life, but I'll take a stab at it and post a PR as I'm able.

RonnyPfannschmidt · 2023-03-17T20:51:35Z

@kurtmckee note that for specific tests that just multiply a string to go for long, choosing a actual id is the best way to go

obestwalter · 2024-06-20T11:51:02Z

It looks like a different approach is preferred, so we should close this one in favour of an upcoming PR by @kurtmckee (if you are up for that :)).

kurtmckee · 2024-06-20T12:02:34Z

@obestwalter I can give it a shot! To confirm, the strategy here is:

Detect long IDs (cross-platform for consistency)
Hash the IDs to avoid platform-specific length restrictions
Issue a warning suggesting ways to choose IDs independently

Does that summary of my understanding align with what's been discussed here?

obestwalter · 2024-06-20T14:56:54Z

Proposing to close this and move the discussion back to #6681 - thanks @symonk to get the ball rolling on this one.

symonk added 4 commits May 24, 2020 20:59

monkeypatch os.name for the test in CI, rewrite parametrized IDs grea…

da10019

…ter than 100 on windows

tidy up comments

33284ff

remove print statements from tests

e5ee3fa

add accidentally removed comments

9f4f5ae

RonnyPfannschmidt reviewed May 24, 2020

View reviewed changes

symonk added 2 commits May 24, 2020 21:33

skip if not running on windows based systems

89bbabe

change the solution to be platform agnostic for consistency

5f78d4c

symonk changed the title ~~Auto generating parametrized ids for excessive sizes on windows~~ Auto generate excessive long parametrized ids May 24, 2020

symonk added 2 commits May 25, 2020 14:45

refactor auto id rewriting to use the inbuilt argname-idx capabilities

70b0774

use a more appropriate name for the post-checked val

b2462d8

symonk commented May 25, 2020

View reviewed changes

make code comments a bit clearer

ce48153

bluetech requested changes May 30, 2020

View reviewed changes

Base automatically changed from master to main March 9, 2021 20:40

obestwalter added the status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity label Jun 20, 2024

obestwalter mentioned this pull request Jun 20, 2024

Long parametrized test_input on Windows: ValueError: the environment variable is longer than 32767 characters #6881

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto generate excessive long parametrized ids #7254

Auto generate excessive long parametrized ids #7254

symonk commented May 24, 2020 •

edited by bluetech

Loading

RonnyPfannschmidt May 24, 2020

symonk May 24, 2020 •

edited

Loading

RonnyPfannschmidt May 24, 2020

symonk Jun 1, 2020

RonnyPfannschmidt Jun 1, 2020

symonk commented May 25, 2020 •

edited

Loading

symonk May 25, 2020

RonnyPfannschmidt May 25, 2020

bluetech May 30, 2020

bluetech left a comment

bluetech May 30, 2020

symonk Jun 1, 2020

bluetech May 30, 2020

bluetech May 30, 2020

bluetech commented May 30, 2020

symonk commented Jun 1, 2020

nicoddemus commented Oct 5, 2021

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

obestwalter commented Jun 20, 2024

kurtmckee commented Jun 20, 2024

obestwalter commented Jun 20, 2024

Auto generate excessive long parametrized ids #7254

Are you sure you want to change the base?

Auto generate excessive long parametrized ids #7254

Conversation

symonk commented May 24, 2020 • edited by bluetech Loading

Choose a reason for hiding this comment

symonk May 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

symonk commented May 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech commented May 30, 2020

symonk commented Jun 1, 2020

nicoddemus commented Oct 5, 2021

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

kurtmckee commented Mar 17, 2023

RonnyPfannschmidt commented Mar 17, 2023

obestwalter commented Jun 20, 2024

kurtmckee commented Jun 20, 2024

obestwalter commented Jun 20, 2024

symonk commented May 24, 2020 •

edited by bluetech

Loading

symonk May 24, 2020 •

edited

Loading

symonk commented May 25, 2020 •

edited

Loading