New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary-precision numerics support #416
Conversation
@dtolnay The main outstanding issue is the following test failure when the feature is on. (All tests pass when the feature is off.) ---- src/value/mod.rs - value::Value::is_f64 (line 598) stdout ----
thread 'rustc' panicked at 'test executable failed:
thread 'main' panicked at 'assertion failed: !v["b"].is_f64()', src/value/mod.rs:11:1
note: Run with `RUST_BACKTRACE=1` for a backtrace.
', librustdoc/test.rs:330:17
note: Run with `RUST_BACKTRACE=1` for a backtrace. This is of course due to the fact that the old logic for Once this is done, and I can do a quick rebase on master, I think we're ready for a proper full review. |
8435ba8
to
20d55dd
Compare
Okay, I decided to resolve that in a backwards-compatible manner, in the simplest way I thought possible. Ready for review now, @dtolnay! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I began looking through this and everything looks great so far. I continue to be on board with this approach. Will try to get through the rest by ~midweek.
Could you check on why the Rust 1.15.0 build is failing in Travis?
I agree with the decision to conservatively support |
20d55dd
to
182d211
Compare
Okay, glad to hear! The Rust 1.15 failure was due to a currently stable feature not being stable back then ( The clippy lint causing the failure for the nightly build is confusing me. What do you want to do about that? |
This one?
We should follow what it says and use The 1.15.0 build is going to need another look. |
src/number.rs
Outdated
{ | ||
for c in self.n.chars() { | ||
if c == '.' || c == 'e' || c == 'E' { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also check that parse::<f64>()
is finite. For example 1e999
should not be considered representable as f64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/number.rs
Outdated
#[cfg(feature = "arbitrary_precision")] | ||
{ | ||
let mut buf = Vec::new(); | ||
dtoa::write(&mut buf, f).ok()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be an unwrap()
because Vec<u8> as io::Write
never fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
src/number.rs
Outdated
{ | ||
let mut buf = Vec::new(); | ||
dtoa::write(&mut buf, f).ok()?; | ||
String::from_utf8(buf).ok()? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be an unwrap()
because the dtoa representation of a float is always UTF-8.
src/number.rs
Outdated
where | ||
V: de::Visitor<'de>, | ||
{ | ||
visitor.visit_str(&self.n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional behavior change or required somewhere else in the PR? The following fails without arbitrary_precision
but gives you a string with arbitrary_precision
.
println!("{:?}", String::deserialize(Number::from(0)).unwrap());
We have tended to stay away from such implicit type conversions because they make other functionality fail in surprising ways, for example deserializing to the following untagged enum would produce the string variant even though the input is a number.
#[derive(Deserialize)]
#[serde(untagged)]
enum E {
S(String),
N(i64),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. It was so long ago I originally wrote this, I don't fully remember my motivations, but I think it was mainly for convenience. Your point is well taken though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the methods that are resulting in a behavior change.
src/number.rs
Outdated
} | ||
} | ||
|
||
impl<'de, 'a> Deserializer<'de> for &'a Number { | ||
impl<'de, 'a: 'de> Deserializer<'de> for &'a Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? Looks like a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly yes. This is the most liberal bound that allows for it to compile. We could make the bound dependent on the feature, I suppose, but that might cause confusion for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposing that we cannot make this breaking change (whether or not it depends on arbitrary_precision), is there any other way to write this impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to need to be fixed. We cannot release with this breaking change.
Oh, I misunderstood that hint. It shouldn't say "remove closure", but rather "change closure to"... anything, that's an issue for the clippy people. |
Okay, I've pushed a new revision, which should resolve your above concerns. I suspect everything is in order now, but I'm getting some strange errors on testing now, even with the old version (previously confirmed working). |
The "unnecessary parentheses" warnings should go away with a |
9276d09
to
9811b02
Compare
Okay, that worked. (Had to clobber the Thanks for the review. LGTM? 😀 (And maybe roll a new release soon, if you think things are in a good place?) |
src/number.rs
Outdated
{ | ||
let value = visitor.next_key::<NumberKey>()?; | ||
if value.is_none() { | ||
return Err(de::Error::custom("number key was not found")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the map being visited does not have the right key, I think we need to assume not that "they probably misspelled the $__serde_private_number
field name" but that "they must be deserializing something that is not a JSON number." As such, would it be possible to show exactly the same error as they would have gotten without the arbitrary_precision feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds like a good idea.
src/number.rs
Outdated
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Error> | ||
where V: Visitor<'de> | ||
{ | ||
visitor.visit_map(NumberDeserializer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be somewhat more cautious here to avoid breaking existing code. For example in the untagged enum case which relies on deserialize_any:
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum E {
N(i64),
}
fn main() {
println!("{:?}", E::deserialize(Number::from(0)).unwrap());
}
One approach would be to cascade a sequence of checks: if self.n
can parse as u64, use visit_u64; if self.n
can parse as i64, use visit_i64; if self.n
parsed to f64 and then to_string'd gives you the same string as self.n
, use visit_f64; only otherwise use visit_map.
That way for the most part only code that relies on large numbers (which previously failed to deserialize into serde_json::Number) would be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is there something special about the way untagged enums work that causes this behaviour? It doesn't seem like the most elegant solution, but I'll have a go at this fix.
src/number.rs
Outdated
} | ||
} | ||
|
||
impl<'de, 'a> Deserializer<'de> for &'a Number { | ||
impl<'de, 'a: 'de> Deserializer<'de> for &'a Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to need to be fixed. We cannot release with this breaking change.
src/number.rs
Outdated
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Error> | ||
where V: Visitor<'de> | ||
{ | ||
visitor.visit_map(NumberDeserializer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as <Number as Deserializer<'de>>::deserialize_any
.
src/number.rs
Outdated
where | ||
V: de::Visitor<'de>, | ||
{ | ||
visitor.visit_str(&self.n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the methods that are resulting in a behavior change.
src/number.rs
Outdated
} else { | ||
Number { n: N::PosInt(i as u64) } | ||
} | ||
impl FromPrimitive for Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num-traits crate is not stable yet so we cannot have these trait impls in the serde_json public API. Please find a different way to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just hide the impl
s then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not have these impls in the public API even if we hide them. Is it possible to implement a different way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although FromPrimitive
/ToPrimitive
does the heavy lifting for us, which is quite nice. I also don't see what's "unstable" about them... is the fact the version is 0.2 really that bad? Serde was 0.x not so long ago, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand what heavy lifting you are referring to. Just looking at the current implementation of the From
impls it looks like making them work in the arbitrary_precision case should be not much more than 1 line of code: Number { n: i.to_string() }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been addressed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in the main thread.
src/number.rs
Outdated
} | ||
} | ||
|
||
impl ToPrimitive for Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto num-traits is not stable yet.
I don't see any way to get rid of the |
Yes, the NumberDeserializer struct should not need a lifetime in the first place because the implementation eventually calls to_owned() to make an owned string anyway. If you implement |
Ah, fair point that. I presume you're suggesting |
@dtolnay All should be in order now, except for the issue with |
src/number.rs
Outdated
where | ||
V: de::DeserializeSeed<'de>, | ||
{ | ||
seed.deserialize(self.number.to_owned().take().unwrap().into_deserializer()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may or may not be doing what you meant -- the to_owned()
call resolves to impl<T> ToOwned for T where T: Clone
so basically this is equivalent to self.number.clone().take().unwrap().into_deserializer()
. Either the clone
should not be there or the take
should not be there. Seems likely that you intended to take
but not clone
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the to_owned()
call was just a relic from the old code using a Cow
value.
src/number.rs
Outdated
// Not public API. Should be pub(crate). | ||
#[doc(hidden)] | ||
pub struct NumberDeserializer { | ||
pub visited: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that self.number
is an Option, this visited flag is redundant. The next_key_seed
method can check self.number.is_some()
instead.
src/value/de.rs
Outdated
V: Visitor<'de>, | ||
{ | ||
match self { | ||
Value::Number(n) => n.deserialize_str(visitor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in the impl Deserializer for Number
case, let's not implement these implicit conversions from number to string.
src/value/de.rs
Outdated
V: Visitor<'de>, | ||
{ | ||
match *self { | ||
Value::Number(ref n) => n.deserialize_str(visitor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto comment about not implementing implicit conversions from number to string.
src/number.rs
Outdated
} | ||
|
||
macro_rules! from_unsigned { | ||
($($unsigned_ty:ident)*) => { | ||
impl ToPrimitive for Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ToPrimitive impl needed by any other code in the PR or is it here for symmetry with FromPrimitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly for symmetry, and because I think it would be nice for consumers. I really don't see the problem with implementing FromPrimitive
and ToPrimitive
... people who use num-traits
will appreciate it, in fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People who use num-traits 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, or 0.9 may not appreciate it as much as you think. Look at how much people love the Serde 0.8 impls in the url
crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair point. But we'll still have the num-traits
dependency, as before. Its usage just won't be exposed. Are you okay with that?
src/value/de.rs
Outdated
visitor: V) -> Result<V::Value, Error> | ||
where V: Visitor<'de>, | ||
{ | ||
if name == SERDE_STRUCT_NAME && fields == &[SERDE_STRUCT_FIELD_NAME] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this case here because the Deserialize impls for Number and Value both call deserialize_any, not deserialize_struct (which is fine). AIUI nothing would ever hit this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been addressed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/test.rs
Outdated
); | ||
]); | ||
|
||
if cfg!(not(feature = "arbitrary_precision")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are also going to want to add some test cases that test the arbitrary precision behavior, i.e. tests that pass only if arbitrary_precision is enabled. What use cases do you think would be valuable to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been addressed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will add these. A few integers and floats that are too big for u64/i64/f64 should be good enough, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It essentially has been: there is a test_parse_arbitrary_precision_number
fn in tests
, it just has code commented out for the time being. Hope it looks alright to you.
src/de.rs
Outdated
visitor: V | ||
) -> Result<V::Value> | ||
where | ||
V: de::Visitor<'de>, | ||
{ | ||
#[cfg(feature = "arbitrary_precision")] | ||
{ | ||
if name == SERDE_STRUCT_NAME && fields == &[SERDE_STRUCT_FIELD_NAME] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is reachable because nothing calls deserialize_struct with this name and field names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure, but I think you're right. Removed now.
src/de.rs
Outdated
@@ -184,7 +198,7 @@ impl<'de, R: Read<'de>> Deserializer<R> { | |||
} | |||
|
|||
#[cold] | |||
fn peek_invalid_type(&mut self, exp: &Expected) -> Error { | |||
fn peek_invalid_type(&mut self, exp: &Expected, prim: bool) -> Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am reading this correctly, there are 8 calls to peek_invalid_type and all of them pass prim=false. So do we need the new parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, an oversight of mine. Gone now.
src/de.rs
Outdated
@@ -240,7 +254,7 @@ impl<'de, R: Read<'de>> Deserializer<R> { | |||
self.fix_position(err) | |||
} | |||
|
|||
fn deserialize_number<V>(&mut self, visitor: V) -> Result<V::Value> | |||
fn deserialize_number<V>(&mut self, visitor: V, prim: bool) -> Result<V::Value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method is only called when deserializing a primitive so it shouldn't need the new prim parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wasn't true before, but now that I've removed the block you suggested above in deserialize_struct
, it should be fine to get rid of.
Cargo.toml
Outdated
@@ -37,3 +37,5 @@ default = [] | |||
# This allows data to be read into a Value and written back to a JSON string | |||
# while preserving the order of map keys in the input. | |||
preserve_order = ["linked-hash-map"] | |||
|
|||
arbitrary_precision = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining what is (and maybe what is not) guaranteed when you enable this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like the following?
# Use `String` to represent numbers internally, thus allowing the serialization
# and deserialization of numbers of arbitrary size/precision, instead of just
# the primitive integers and floats.
# This feature does not provide provide the ability to do arithmetic on
# arbitrary-precision numbers; this is left to the discretion of the consumer.
@dtolnay The above review should be resolved now, modulo one comment. |
You can't fool me. 😜 There are 4 comments still visible in https://github.com/serde-rs/json/pull/416/files that all need to be addressed. I will respond to them individually. You need to expand all the files and search for dtolnay. |
I still need to review the serialization side -- src/ser.rs and src/value/ser.rs -- as well as the |
src/de.rs
Outdated
fn parse_any_number( | ||
&mut self, | ||
pos: bool, | ||
prim: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the callers of this function pass a boolean literal value for prim
. I think this would be simpler if we remove the prim
argument and callers that need a primitive value call parse_integer
directly instead of calling parse_any_number
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Since the changes in the code during the review, this makes more sense now.
src/value/ser.rs
Outdated
@@ -247,23 +266,33 @@ impl serde::Serializer for Serializer { | |||
} | |||
} | |||
|
|||
/// Not public API. Should be pub(crate). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this and the following few similar comments are meaningful because these types are effectively already pub(crate)
. They are not accessible from outside of serde_json. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I can get rid of the doc(hidden)
attributes too...
where | ||
T: Serialize, | ||
{ | ||
Err(key_must_be_a_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be invalid_number()
like the other serializer methods here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Not 100% sure, since I can't remember my motivation for ever writing that... or if it was sloppy copy & paste coding, perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/value/ser.rs
Outdated
} | ||
} | ||
|
||
impl serde::ser::SerializeStructVariant for SerializeStructVariant { | ||
type Ok = Value; | ||
type Error = Error; | ||
|
||
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Error> | ||
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Self::Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to make this change in this PR. The existing code is inconsistent but let's leave it to a later PR to pick one usage and standardize everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sure. I'll leave that to someone else. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this and similar unnecessary formatting changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I thought this was in code I introduced for some reason. Reverted, and left the inconsistency as-is.
src/value/ser.rs
Outdated
fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> { | ||
let n = Number::from_string_unchecked(value.to_owned()); | ||
*self.0 = Some(Value::Number(n)); | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer if we use the Self::Ok
return value to hand back the resulting Number. The caller can take care of placing that number where they want it.
Ok(Number::from_string...)
So-called "out parameters" like &'a mut Option<Value>
are not idiomatic in Rust when we can help it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Not sure why I did it this way originally... possibly my knowledge of Rust/Serde wasn't very good when I first wrote this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I believe I still need to use an "out value" internally for impl serde::ser::SerializeStruct for SerializeMap
.
src/value/ser.rs
Outdated
} | ||
|
||
fn serialize_str(self, value: &str) -> Result<Self::Ok, Self::Error> { | ||
let n = Number::from_string_unchecked(value.to_owned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this from_string_unchecked
method was copied from the specialization-based implementation. But in the specialization case it was not necessary to check the string because we could know it came from a number. Here that is not the case and checking would be sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also make sense to check in the NumberFromString
Visitor
in the number
module?
Anyway, how do you recommend doing this check? I think we should add a FromStr
implementation for Number
to be honest, which does the checking... perhaps using the scan_integer
function? Although this would necessitate the construction of a Deserializer
value. If that doesn't seem to "heavy" an approach for you, I can give it a go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems the use of from_string_unchecked
is okay here. Wouldn't test_parse_number_errors
fail otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of checking would be to uphold the invariant that in safe code you cannot get a Number holding anything other than what we consider a JSON number. This allows other code in serde_json to assume that the string inside of a Number contains something that is a number.
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
#[derive(Serialize)]
#[serde(rename = "$__serde_private_Number")]
struct S {
#[serde(rename = "$__serde_private_number")]
f: &'static str,
}
fn main() {
println!("{:#?}", serde_json::to_value(&S { f: "serde" }).unwrap());
}
A rename
attribute is the simplest way to illustrate the failure mode; a more realistic example would be transcoding from RON to JSON in which malicious RON input fed by an attacker is able to trigger exploitable undefined behavior by violating assumptions in downstream code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtolnay What were your thoughts on my above comment by the way?
Anyway, how do you recommend doing this check? I think we should add a FromStr implementation for Number to be honest, which does the checking... perhaps using the scan_integer function? Although this would necessitate the construction of a Deserializer value. If that doesn't seem to "heavy" an approach for you, I can give it a go.
I could even make it simpler and just do value::from_str
and extract the Number
from that... but again, is that too heavy? Would appreciate your thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not consider it heavy to construct a Deserializer. Implementing FromStr for Number in terms of scan seems like it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll give that a go, ta.
src/value/ser.rs
Outdated
next_key: None, | ||
}, | ||
) | ||
Ok(SerializeMap::Map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert unnecessary whitespace changes like this one. We can run the new rustfmt after the PR lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure how that happened!
src/value/ser.rs
Outdated
Ok(()) | ||
}, | ||
#[cfg(feature = "arbitrary_precision")] | ||
SerializeMap::Number { .. } => panic!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable? A malicious input should not be able to trigger a panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not reachable AFAIK. I should probably change to unreachable!()
.
src/ser.rs
Outdated
@@ -346,24 +361,20 @@ where | |||
.end_array(&mut self.writer) | |||
.map_err(Error::io) | |||
); | |||
Ok( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert these formatting changes. If rustfmt is formatting this code differently these days, we can run rustfmt separately from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/value/ser.rs
Outdated
} | ||
} | ||
|
||
impl serde::ser::SerializeStruct for SerializeMap { | ||
type Ok = Value; | ||
type Error = Error; | ||
|
||
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Error> | ||
fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<(), Self::Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert anywhere that you changed Error
to Self::Error
or vice versa. We can leave it to a later PR to pick one usage and standardize everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I'm having a hard time spotting where I did this, so I think you'll just have to continue pointing them out, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/value/ser.rs
Outdated
@@ -404,16 +631,186 @@ impl serde::ser::SerializeStructVariant for SerializeStructVariant { | |||
where | |||
T: Serialize, | |||
{ | |||
self.map | |||
.insert(String::from(key), try!(to_value(&value))); | |||
self.map.insert(String::from(key), try!(to_value(&value))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this formatting change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/test.rs
Outdated
@@ -568,6 +569,19 @@ fn test_write_newtype_struct() { | |||
test_encode_ok(&[(outer, r#"{"outer":{"inner":123}}"#)]); | |||
} | |||
|
|||
#[test] | |||
fn test_deserialize_number_to_untagged_enum() { | |||
use serde_json::Number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already imported on line 42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/value/ser.rs
Outdated
} | ||
|
||
fn end(self) -> Result<Value, Error> { | ||
serde::ser::SerializeMap::end(self) | ||
fn end(self) -> Result<Self::Ok, Self::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the change in this method signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/value/ser.rs
Outdated
Ok(()) | ||
} | ||
|
||
fn end(self) -> Result<Value, Error> { | ||
fn end(self) -> Result<Self::Ok, Self::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Added test for malicious numbers.
7153c82
to
2785540
Compare
@dtolnay Okay, I believe all outstanding issues are in order now! I think we're very close to being able to merge. Please tell me I'm right. 😉 |
type Err = Error; | ||
|
||
fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
super::de::Deserializer::from_str(s).parse_any_signed_number().map(|n| n.into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this act as much as possible like the FromStr for primitive types. For example the leading whitespace behavior should be the same. Compare:
extern crate serde_json;
fn main() {
println!("{}", " 0".parse::<serde_json::Number>().unwrap());
println!("{}", " 0".parse::<u64>().unwrap_err());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll give it a go. Unfortunately, the ui
tests are failing again now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. As for the primitive types, leading and trailing whitespace now cause errors.
src/number.rs
Outdated
formatter.write_str("string containing a number") | ||
} | ||
|
||
fn visit_string<E>(self, s: String) -> Result<NumberFromString, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the Visitor::visit_string
documentation:
It is never correct to implement
visit_string
without implementingvisit_str
.
In this case I think you want just visit_str
because the implementation does not benefit from taking ownership of the String data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
@dtolnay Question: should we implement the experimental |
@dtolnay How are we looking now? |
src/ser.rs
Outdated
@@ -1304,6 +1554,14 @@ pub trait Formatter { | |||
dtoa::write(writer, value).map(|_| ()) | |||
} | |||
|
|||
/// Writes a number that has already been rendered to a string. | |||
#[inline] | |||
fn write_number_str<W: ?Sized>(&mut self, writer: &mut W, value: &str) -> Result<()> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned Result type should be io::Result instead of serde_json::Result to be consistent with the rest of the Formatter methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/ser.rs
Outdated
fn serialize_some<T: ?Sized>(self, _value: &T) -> Result<Self::Ok> | ||
where T: Serialize | ||
{ | ||
Err(key_must_be_a_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we saw this somewhere else before, the error here is inconsistent with the invalid_number()
in all the other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, copy & paste coding. :-P
I would not do this yet until someone has a compelling real use case that benefits from TryFrom. |
@dtolnay Fair enough. |
Fix the two review comments first. :-)
…On Tue, Mar 20, 2018 at 8:10 AM, Alexander Regueiro < ***@***.***> wrote:
@dtolnay <https://github.com/dtolnay> Fair enough. How are we doing now;
ready to merge? :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#416 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB2cCnhQM0G8TTa2r4DSrAd7YVQ6jzsSks5tgRvPgaJpZM4SZYRa>
.
|
Serves me right for replying to your comment before I open up the issue. (GitHub notifies me for normal replies, but not review comments.) All in order now... I think! |
You may need to check the box for "Pull Request reviews" on https://github.com/settings/notifications. |
@dtolnay Yep. I think I turned it off because I was getting spam from reviews on another project. Seems it can't be enabled on a per-PR/repo basis. Anyway, I think we're coming to the end of this review now. Let me know if there's anything else to do before we can get this merged, and I'll try to fix it all tonight! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for sticking with this.
No problem, @dtolnay. Thanks for your patience, and guiding me through. A review for such a large PR was never going to be simple. Glad to see you already made a new release, as well! |
Added support for arbitrary-precision numerics, in a similar way that the toml crate does for date-times (using an internal special struct).