New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could I use a single schema instance on concurrent request? #783

Closed
yupeng0921 opened this Issue Apr 25, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@yupeng0921

yupeng0921 commented Apr 25, 2018

For example, I have an AlbumSchema class, then I create a global variable 'g_schema' from it:
g_schema = AlbumSchema()
Could I use the same g_schema on the concurrent http requests? Is it safe?

@lafrech

This comment has been minimized.

Member

lafrech commented Apr 26, 2018

Funny you ask because that very same day we've been facing issues that look quite like concurrency issues.

We use marshmallow + webargs and we pass webargs's use_args a Schema instance, so the instance is shared over all requests.

Some requests receive an error message that obviously corresponds to another request that occurred concurrently.

This does not occur when passing a schema class.

We're investigating this. The root cause could be the fact that the Unmarshaller is created at init time:

        #: Callable unmarshalling object
        self._unmarshal = marshalling.Unmarshaller()

(same issue with self._marshal and perhaps other attributes)

The Unmarshaller instance is then shared among all requests and called several times during the loading process, which does not seem thread safe.

I've checked in webargs code and it is meant to receive either a class or an instance, so this looks like a bug.

If this is really the case, then it is a serious show-stopper.

I'm not sure how to isolate/reproduce/fix/test. Help very much welcome.

And if I'm wrong, well... great, what a relief, sorry for the noise.

@lafrech

This comment has been minimized.

Member

lafrech commented Apr 26, 2018

From a quick analysis, I think the consequences are "limited" to spurious error messages.

Take the following with care, I have no certainty at this point:

  • When all goes fine, data is not mixed up between two concurrent requests.

  • When an error occurs, another concurrent requests can receive the error message meant for the other request.

  • A wrong request can not pass due to another concurrent passing request.

So you'd have false positives, but no false negatives.

This would mitigate the issue a bit.

@lafrech

This comment has been minimized.

Member

lafrech commented Apr 26, 2018

It should be noted that this happens only if two concurrent requests occur and one of them fails validation. In the case of a loaded server, as long as all requests are valid, there is no problem.

Also, we managed to see and reproduce this because we're using marshmallow to parse pretty fat inputs. In the general case of normal size inputs, the race condition is much less likely to occur.

@sloria sloria closed this May 11, 2018

@taion

This comment has been minimized.

Contributor

taion commented May 15, 2018

Looking at the code, the fix in #799 #785 is not quite sufficient. Schema._update_fields has some amount of shared state, and can cause issues with stomping on other updates if field types need to be inferred.

@deckar01

This comment has been minimized.

Member

deckar01 commented May 15, 2018

I'm not sure #799 is the right reference in that comment. Do you mean #785?

@lafrech

This comment has been minimized.

Member

lafrech commented May 15, 2018

I think @taion meant the fix in #785. And he provides an example of thread-safety issue in #809.

@taion

This comment has been minimized.

Contributor

taion commented May 15, 2018

oops, sorry – i did mean #785.

and i have a fix in #810 that would address this issue, though it's worth noting that it's a broader change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment