Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long parametrized test_input on Windows: ValueError: the environment variable is longer than 32767 characters #6881

Open
4 tasks done
hugovk opened this issue Mar 8, 2020 · 14 comments · May be fixed by #7254
Open
4 tasks done
Labels
platform: windows windows platform-specific problem type: bug problem that needs to be addressed

Comments

@hugovk
Copy link
Member

hugovk commented Mar 8, 2020

  • a detailed description of the bug or suggestion
  • output of pip list from the virtual environment you are using
Package        Version    
-------------- -----------
atomicwrites   1.3.0      
attrs          19.3.0     
colorama       0.4.3      
more-itertools 8.2.0      
packaging      20.3       
pip            20.0.2     
pluggy         0.13.1     
py             1.8.1      
pyparsing      2.4.6      
pytest         5.3.5      
setuptools     41.2.0     
six            1.14.0     
ujson          1.36.dev102
wcwidth        0.1.8  
  • pytest and operating system versions
    pytest 5.3.5, Windows Server 2019 on GitHub Actions
    Python 3.5-3.8
  • minimal example if possible

Include long test input in a parametrize test:

@pytest.mark.parametrize(
    "test_input",
    [
        "1",
        "2",
        "[" * (1024 * 1024),
        "{" * (1024 * 1024),
    ],
)
def test_long_input(test_input):
    # Do something with test_input
    pass

Expected

Tests pass

Actual

Tests fail with ValueError: the environment variable is longer than 32767 characters:

2020-03-08T21:44:22.8972828Z key = 'PYTEST_CURRENT_TEST'
2020-03-08T21:44:22.8973045Z value = 'tests/test_ujson.py::test_long_input[{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{...{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{] (teardown)'
2020-03-08T21:44:22.8973138Z 
2020-03-08T21:44:22.8973265Z     def __setitem__(self, key, value):
2020-03-08T21:44:22.8973624Z         key = self.encodekey(key)
2020-03-08T21:44:22.8973756Z         value = self.encodevalue(value)
2020-03-08T21:44:22.8973879Z >       self.putenv(key, value)
2020-03-08T21:44:22.8974020Z E       ValueError: the environment variable is longer than 32767 characters
2020-03-08T21:44:22.8974117Z 
2020-03-08T21:44:22.8974254Z c:\hostedtoolcache\windows\python\3.8.2\x64\lib\os.py:681: ValueError
2020-03-08T21:44:22.8974408Z ======================== 138 passed, 4 errors in 2.66s ========================
2020-03-08T21:44:22.9037130Z ##[error]Process completed with exit code 1.

https://github.com/hugovk/ultrajson/runs/493853892?check_suite_focus=true

Passes on Ubuntu 16.04, 18.04, macOS Catalina 10.15 with Python 2.7, 3.5-3.8, but for all operating systems the test name is also really long because it includes the parameterised values, which clutters the logs.

Can be worked around by splitting the long test input into its own method:

@pytest.mark.parametrize(
    "test_input",
    [
        "1",
        "2",
    ],
)
def test_long_input(test_input):
    # Do something with test_input
    pass

@pytest.mark.parametrize(
    "test_input",
    [
        "[",
        "{",
    ],
)
def test_long_input(test_input):
    # Do something with test_input * (1024 * 1024), instead of test_input
    pass
@Zac-HD Zac-HD added platform: windows windows platform-specific problem type: bug problem that needs to be addressed labels Mar 9, 2020
@Zac-HD
Copy link
Member

Zac-HD commented Mar 9, 2020

Re: long name in logs, passing ids=... to parametrize would allow you to customize it to a shorter form.

(which might also work around the env issue? unclear, but we should fix that anyway.)

@hugovk
Copy link
Member Author

hugovk commented Mar 9, 2020

Thanks, passing ids=... works around the env issue too:

@pytest.mark.parametrize(
    "test_input",
    [
        "1",
        "2",
        "[" * (1024 * 1024),
        "{" * (1024 * 1024),
    ],
    ids=["a", "b", "c", "d"]
)
def test_long_input(test_input):
    pass

(Although there are 30 params in the real test, so I'd opt for the method splitting workaround in this case.)

@The-Compiler
Copy link
Member

(Although there are 30 params in the real test, so I'd opt for the method splitting workaround in this case.)

Note that you can pass a function to ids instead (docs). Thus, you could do something like ids=lambda s: s[:10] (assuming that your IDs are still unique after that).

@RonnyPfannschmidt
Copy link
Member

i believe this is one of the c ases where pytest should not consider the string as valid id and hint and pasing a explicit name, autogenerating the test name instead

aka "{"*10000 should result in a warning and a autogenerated test id
the suggestion should be to use pytest.param(""{"*10000, id="intent-of-the-input"

@symonk
Copy link
Member

symonk commented May 24, 2020

@RonnyPfannschmidt / @nicoddemus any recommendations on where the fix should live? but where do we draw the line here? a few queries from my initial investigation:

  • _pytest/python.py (add some checks around validating ids here - could you recommend where this issue should typically be solved?
  • Do we auto generate a random name after a certain amount? checking length of nodeid plus the parametrized data (plus various bolts on like '[] teardown' to account for worse case scenario when updating PYTEST_CURRENT_TEST in environ are problematic.
  • Do we say, if your value is over 'X' in length, we will auto generate one for you (shorter - only on windows) and alert you to the fact? I'm not sure how potentially breaking that is tho to current windows tests as some may be using slightly longer. my initial thought was if the data is > 1024 we will rewrite it but ultimately from what I gather the entire environment path on windows cannot exceed the 32.7K limit so part of me feels like its a futile effort of a fix in vain because if you just do enough shorter ones, same problem will persist?

@RonnyPfannschmidt
Copy link
Member

I would draw the line around 100, maybe less

@symonk
Copy link
Member

symonk commented May 24, 2020

ok I will mock something up and discuss via PR, I think we have a few areas where maybe we should consider a --windows based flag that does a couple of things, 2-3 issue's ive seen are similar in nature to this but in different parts of the system, relating to file lengths or env var lengths etc :)

@ItsDrike
Copy link

ItsDrike commented Feb 18, 2023

Any progress on this issue? We were still able to replicate it with latest pytest (3 years later). Was this just forgotten about, or is it a wontfix?

Even worse, it seems that doing:

@pytest.mark.skipif(platform.system() == "Windows", reason="environment variable limit on Windows")
@pytest.mark.parametrize(("string"), ["a" * (32768)])
def test_write_utf_limit(string):
    ...

Causes the test to be skipped, but also failed somehow. I suspect that it's because the error occurred during parametrization, so even though the test function didn't actually run, it still failed. See: commit that caused this, along with the corresponding failure in it's CI run

Although the suggested fix by setting ids does work, it seems like something that should be addressed, if possible.

@nicoddemus
Copy link
Member

Nobody took the time to fix it, but a pull request would be certainly welcome.

@RonnyPfannschmidt
Copy link
Member

The open pull request went stale it seems

@obestwalter
Copy link
Member

@symonk - the discussion here points to following a different approach, so I think it's ok to close this one then, right?

@obestwalter obestwalter added the status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity label Jun 20, 2024
@kurtmckee
Copy link

@obestwalter I opened this issue, and @symonk opened a PR linked to this issue.

Is "close this one" referring to the PR linked to this issue, or is it referring to this issue?

@obestwalter
Copy link
Member

@kurtmckee yes it's about the PR that is not the preferred solution anymore. I actually should have written it there. Thanks for clearing that up.

@obestwalter obestwalter removed the status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity label Jun 20, 2024
@obestwalter
Copy link
Member

Thanks @kurtmckee for picking this up, let's move the discussion back here then as the first PR to address this is likely to be closed soon then.

You wrote:

  1. Detect long IDs (cross-platform for consistency)
  2. Hash the IDs to avoid platform-specific length restrictions
  3. Issue a warning suggesting ways to choose IDs independently

My head is spinning from going through tons of Issues and PRs over the last few days after not really having followed the developments for a long time. But from my shallow understanding doing it like this is fixing the problem and letting the user know that something potentially surprising has been done to address it, giving them the info needed to handle it differently if they so wish. So sounds like a good plan to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform: windows windows platform-specific problem type: bug problem that needs to be addressed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants