Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for specifying environment variable with concurrency slot #10297

Merged
merged 4 commits into from Jul 9, 2020

Conversation

gshuflin
Copy link
Contributor

@gshuflin gshuflin commented Jul 8, 2020

Problem

In some scenarios, pants users might want their tests to be aware of the amount of concurrency those tests are being run with. For instance, database-heavy Django tests might want to be able to reuse a limited number of databases, instead of spawning and destroying a new test database instance for every single concurrent test, which is resource-intensive.

Solution

This PR introduces the ability to optionally create a variable in the environment of a Process that has a numerical ID corresponding to the "slot" that this Process is run under by the BoundedCommandRunner system. The name of this environment variable is controlled by a new pants option --process-execution-slot-environment-variable, and will not be present if this option is the empty string.

The ID presented to processes via this variable comes from the BoundedCommandRunner, and will always be a small numerical value greater than 0 and less than or equal to the maximum amount of concurrency allowed by that BoundedCommandRunner instance (which in turn is set by the --process-execution-{local,remote}-parallelism options).

@gshuflin gshuflin requested review from stuhood and benjyw July 9, 2020 00:41
@gshuflin gshuflin marked this pull request as ready for review July 9, 2020 00:41
@stuhood
Copy link
Sponsor Member

stuhood commented Jul 9, 2020

Interesting... is there a link to an explanation of how it's used in django? I'm not quite understanding.

Also, I'm curious how this might relate to #9964 ... in that case the variable would be a hint to the process about how parallel it should try to be.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 9, 2020

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 9, 2020

The Django settings.py can read the env var to pick a database name for the currently running process, and create the database if it doesn't exist. That way, for concurrency N, you only need N databases, and they will be reused across a much larger number of tests.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 9, 2020

Note that this doesn't make tests aware of the amount of concurrency those tests are being run with, since they still know nothing about how many slots there are (although that would be easy to add purely in Python, I think). And that's fine for now!

@gshuflin
Copy link
Contributor Author

gshuflin commented Jul 9, 2020

Note that this doesn't make tests aware of the amount of concurrency those tests are being run with, since they still know nothing about how many slots there are (although that would be easy to add purely in Python, I think). And that's fine for now!

If we need this we could add another option to expose it under another named environment variable, in a similar way to how --process-execution-slot-environment-variable toggles the presence of a variable.

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 9, 2020

If we need this we could add another option to expose it under another named environment variable, in a similar way to how --process-execution-slot-environment-variable toggles the presence of a variable.

@gshuflin : This probably shouldn't be a global toggle... rather, a setting on the Process struct that pytest can optionally set for the django case? It won't be possible to set it correctly globally, I don't think.

@gshuflin
Copy link
Contributor Author

gshuflin commented Jul 9, 2020

If we need this we could add another option to expose it under another named environment variable, in a similar way to how --process-execution-slot-environment-variable toggles the presence of a variable.

@gshuflin : This probably shouldn't be a global toggle... rather, a setting on the Process struct that pytest can optionally set for the django case? It won't be possible to set it correctly globally, I don't think.

That makes sense, but then there would need to be some way to tell tests to turn on this setting on Process. Either that or just always have every kind of test enable this variable, and call it something generic like PANTS_CONCURRENCY_SLOT and document that name so people could write their django tests against it.

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 9, 2020

If we need this we could add another option to expose it under another named environment variable, in a similar way to how --process-execution-slot-environment-variable toggles the presence of a variable.

@gshuflin : This probably shouldn't be a global toggle... rather, a setting on the Process struct that pytest can optionally set for the django case? It won't be possible to set it correctly globally, I don't think.

That makes sense, but then there would need to be some way to tell tests to turn on this setting on Process. Either that or just always have every kind of test enable this variable, and call it something generic like PANTS_CONCURRENCY_SLOT and document that name so people could write their django tests against it.

Could make it an argument to the python_tests target? cc @Eric-Arellano

@Eric-Arellano
Copy link
Contributor

Or an option on Pytest? I'm not sure if you need to set this value many times, or only once gloablly.

I agree with Stu's intuition to try to limit the scope to not be a global option.

@gshuflin
Copy link
Contributor Author

gshuflin commented Jul 9, 2020

Or an option on Pytest? I'm not sure if you need to set this value many times, or only once gloablly.

I agree with Stu's intuition to try to limit the scope to not be a global option.

I think in practice pants users are going to want to set this once, and write their own tests with the one variable in mind. So that to me suggests that putting this as a pytest-scoped option and made available to the engine via a flag on the Process type is the right way to go. @benjyw ?

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 9, 2020

Hmm, yes, setting this on every process is overkill, so an option on pytest is fine, and we only set this for invocations of pytest, but we do so for every pytest invocation if the option is set.

@gshuflin
Copy link
Contributor Author

gshuflin commented Jul 9, 2020

Updated this to pass the option scoped under pytest @stuhood @Eric-Arellano

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

jdk_home: None,
target_platform,
is_nailgunnable,
..original_req
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh snap. I frequently forget that this syntax is a thing: thanks!

Comment on lines +2830 to +2832
let mut req = Process::new(owned_string_vec(&["/bin/echo", "meoooow"]));
req.timeout = one_second();
req.description = "unleash a roaring meow".to_string();
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had started to add the builder pattern to Process for this usecase but... what you're doing here is probably just as easy?

///
/// Constructs a Process with default values for most fields, after which the builder pattern can
/// be used to set values.
///
/// We use the more ergonomic (but possibly slightly slower) "move self for each builder method"
/// pattern, so this method is only enabled for test usage: production usage should construct the
/// Process struct wholesale. We can reconsider this if we end up with more production callsites
/// that require partial options.
///

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had started to add the builder pattern to Process for this usecase but... what you're doing here is probably just as easy?

///
/// Constructs a Process with default values for most fields, after which the builder pattern can
/// be used to set values.
///
/// We use the more ergonomic (but possibly slightly slower) "move self for each builder method"
/// pattern, so this method is only enabled for test usage: production usage should construct the
/// Process struct wholesale. We can reconsider this if we end up with more production callsites
/// that require partial options.
///

It looks like there's a half-complete builder pattern in the codebase currently. It would be nice to either extend that to every option we care about, or get rid of it and just mutate member variables on Process directly, but either way, it's out of the scope of this PR.

type=str,
default="",
advanced=True,
help="If a non-empty string, the process execution slot id (an integer) will be exposed to tests under this environment variable name.",
Copy link
Sponsor Member

@stuhood stuhood Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a bit cleaner for this to default to None (Optional[str]) here and in the Process constructor, even if inside of Process it gets converted into the empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the options type kwarg know how to interpret modern Python typing type annotations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You keep type=str and set default=None. The type will end up being Optional[str].

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I only looked at Python code)

type=str,
default="",
advanced=True,
help="If a non-empty string, the process execution slot id (an integer) will be exposed to tests under this environment variable name.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this might be longer than 100 characters. Appreciated if you can split this up into multiple lines. I like the style of wrapping in parentheses:

help=(
    "blah blah"
    "blah"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be caught by the linter if it was?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. We turned off the check for Flake8 because it doesn't work well with Black. Black is careful when formatting strings and doesn't touch them generally.

@@ -48,6 +49,7 @@ def __init__(
timeout_seconds: Optional[Union[int, float]] = None,
jdk_home: Optional[str] = None,
is_nailgunnable: bool = False,
execution_slot_variable: str = "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make this Optional[str]. It expresses intent better. See jdk_home as an example.

@gshuflin gshuflin merged commit 62751f9 into pantsbuild:master Jul 9, 2020
@gshuflin gshuflin deleted the concurrency_slot branch July 9, 2020 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants