Skip to content

Conversation

@encukou
Copy link
Member

@encukou encukou commented Dec 2, 2025

This reads/writes data as available, making the CGI application responsible for managing any timeouts when receiving data from clients.

Data is read in chunks of bounded size, and passed on immediately (except stderr, which is combined into a single message as before).

This does need 3 threads. (As does process.communicate.)

This reads/writes data as available, making the CGI application
responsible for managing any timeouts when receiving data from
clients.

Data is read in chunks of bounded size, and passed on immediately
(except stderr, which is combined into a single message as before).
This does need 3 threads. (As does process.communicate.)
@encukou encukou changed the title [3.14] gh-142176: Read/write CGI data using worker threads [3.14] gh-119452: Read/write CGI data using worker threads Dec 2, 2025
@encukou
Copy link
Member Author

encukou commented Dec 2, 2025

@serhiy-storchaka See here.
Do you think it's better than loading all request data to memory, then sending it to the process all at once?

@serhiy-storchaka
Copy link
Member

This approach has great advantages, I like it, but creating a thread is not cheap. This is also much bigger change, so we should evaluate all the pros and cons.

Can the same select() loop be used for input and output? Can it be ran in the original thread?

@encukou
Copy link
Member Author

encukou commented Dec 2, 2025

creating a thread is not cheap

Sure, but can you make this cheaper? Note that Popen.communicate also creates 3 threads (generally: one for each pipe, except if there's only one pipe).

Can the same select() loop be used for input and output?

On Windows, select only works on sockets.

# already closed?
pass
if self.command.lower() == "post" and nbytes > 0:
data = self.rfile.read(nbytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was only around the memory allocation of an untrusted nbytes value here. introducing a thread seems unnecessary. There wasn't a resource exhaustion problem with this blocking for the issue at hand.

The memory-address-space-DoS consumption point is to only allocate an amount close in magnitude to the total data that actually arrives rather than the untrusted number up front. A simple loop reading in chunk increments as the earlier PR #119455 did, but without a select at all would handle the issue fine. (ie: just remove the select from the earlier pr entirely)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants