New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringType() allows casting from int to str #315
Comments
I am not aware of the reason, probably early contributors can chip in @schematics/owners |
Blame points to me 😁 I honestly can’t remember. I agree that it is weird. |
I think this is an acceptable compromise, because it's such a common thing. First, consider that all Python ints can be represented as strings. Second, consider a common serialization format: YAML. You can tag values to be of a specific type, but that's pretty rare. Instead, the default mode for YAML parsers is to infer a value's type based on regular expression. See how it handles ints, for example. Now consider this YAML document: name: Joe Smith
address: 123 Any Street.
zip: 12345 YAML would parse the zip code as an int, even though semantically it's a string. Other examples include USPS tracking numbers and phone numbers without the dashes. And YAML's not alone in representing int-looking strings as integers:
In any of those cases, we'd expect a Model with a StringType While I certainly get that it's not philosophically pure to implicitly cast ints to strings, it's so common (especially in the domains were Schematics is likely to be used) that I think it's a reasonable default behavior. |
Just bumped into this. I wasn't really expecting the behavior, and couldn't find mentioning about it from the docs. Would it make sense to add something like following to
StringType also has |
By and large, Schematics has a lax approach to type casting. I've been wanting to make it less so, or at least make the behavior configurable. This has already been addressed w.r.t numbers; see my monologue and the resulting patch in #364. In short, I ended up making One thing I'd love to hear people's thoughts on is the relative importance of controlling type strictness at data import time as opposed to field definition time. Should we have both? If yes, how should the settings interact? |
If you just want to validate the input type you could skip the conversion step. Maybe split an identity test step (currently done in the conversion, eg. *edited for clarity |
Just adding here: schematics is about importing data from possibly untrusted sources and making it fit into a schema, by doing the necessary lossless conversions when possible. If you want to enforce a specific input datatype, that is a subset of what the default schematics types do, and not supported by default. You can always subclass and do your own |
This is not really supported usage. Validator functions are not required to do any form of type-checking, so if the type is wrong, all bets are off — see #338. This is why |
Yes, you are right, sorry for not being clear. I mean that maybe splitting the current conversion into [conversion] + [identity test] would be a way to cleanly skip the conversion step when just type checking of input is needed. Edit: the logical order is identity test first and then conversion, with an exception raised in the identity test if strict type checking is requested. |
Okay I get it. Well, it's true that everything can be solved by subclassing, but the default behavior should at least be reasonable. As @tuukkamustonen pointed out, |
Is the issue about
For (1), [see my next message] schematics/schematics/types/base.py Line 386 in 879b770
For (2), |
Just looked again, I didn't match the if/else clause correctly for (1), the problem is that: The solution would be to replace isinstance with a strict |
I would just special-case For example, consider this class from class Constant(int):
def __new__(cls, name, value):
return int.__new__(cls, value)
def __init__(self, name, value):
self.name = name
int.__init__(self)
def __repr__(self):
return self.name It's an |
Do you mean something like: if isinstance(value, self.allow_casts):
if isinstance(value, int):
if type(value) != int: raise ConversionError(...) I agree it's a simple fix for this case. Not sure how complete. |
I only speak from a "customer's" perspective now: I am using schematics to validate incoming request data in a Flask application. I want to be strict with data - a string needs to be a string, an int needs to be an int, and so on. This is both to avoid accidentally introducing potential security problems and to keep the API as straightforward and simple as possible. I don't want to run in those scenarios were client accidentally sends us a number, and then we process that as a string, resulting in something else than what client expected. I do see it now that I should have read the project description better - maybe schematics just isn't intended for this use? Anyhow, I feel the documentation is a bit lacking in this sense - the docs do hint about type coercion, but an example or two (or the table from #364) would do better to enlighten reader's awareness of the data casting.
You put it nicely there, @lkraider. I would love to see this under the project description.
I would say this is correct, if the input was
Can you clarify the difference between data import and field definition time? I've been also following https://github.com/sloria/webargs and https://github.com/marshmallow-code/marshmallow lately. If the focus of schematics is not in API validation, would you encourage to rather use webargs for input validation in an API and leave schematics for YAML parsing (and similar)? |
First, Schematics is very much intended for API validation — it's even mentioned as a use case on the project's front page. YAML was brought up only as an example of a serialization format that doesn't automatically distinguish between integers and conceptual strings that look like integers, and I think it's a good point to keep in mind. Like I said earlier, the library seems to have a history of coercing datatypes at almost any cost. There's something to be said for this approach (https://en.wikipedia.org/wiki/Robustness_principle), but I've felt the behavior has been too loose, so I've been trying to weed out the questionable cases: @jvantuyl's comment in #324 mentioned giving users better control over the coercion logic. I agree this would be good. But I think having saner defaults should be the first step. Well-thought-out behavior out of the box will cover the vast majority of use cases. There are basically three ways a string field could work:
It's easy to see all of these can make sense, depending on circumstances.
I mean class Address(Model):
zipcode = StringType(strict=False) # strictness fixed at definition time
a = Address(any_data) vs class Address(Model):
zipcode = StringType()
a1 = Address(typed_data, strict_types=True) # strictness specified at invocation time
a2 = Address(yaml_data, strict_types=False) @lkraider Yeah, that's the gist of what I was suggesting, at least to fix the immediate issue. Alternatively, we could special-case |
One more thing: if we're coercing integers, shouldn't we be coercing floats too? What's the logic behind accepting one and rejecting the other? A deserialization layer that will incorrectly infer |
I agree, making sane defaults should be the first goal. Let me take back my thoughts a bit. Now that I realize (this is not in the docs, afaik) that the library does these data conversions, and after thinking about it, it doesn't feel so bad. It's actually something AWS does with their CloudFormation templates - you can give So, if I document that a field is of type Conversion for the other datatypes cannot be so loose, however. I think what you did in #364 is indeed the way to go. The old mode is confusing (silently converting For me, configurability in the field definitions would suffice. Whatever the style, all interfaces should function the same, so either I would enable the strict mode or not. Then of course, it would be rather boring to define Just one more thing, from #364:
It kinda reads that the default mode is now "sensible and silly"... :D |
|
Hi,
I noticed that StringType defines allow_casts = (str, int).
I wonder what the motivation is for this. I was quite surprised that
a StringType() field silently casts int input to str.
I can always make a subclass that overrides the allow_casts class variable though.
Thanks for your help!
The text was updated successfully, but these errors were encountered: