Orjson in Pydantic V2 #6388
Replies: 11 comments 27 replies
-
No, using |
Beta Was this translation helpful? Give feedback.
-
|
According to the Migration Guide:
however, no alternative is suggested, and no explanation of why, or to what effect. It seems like a very strange decision, and one that probably puts people off from upgrading from V1. |
Beta Was this translation helpful? Give feedback.
-
it doesn't matter to everyone what it is written on, even rust, even c++, as a library user in python, I want speed. Your rust is the same speed as Python json. What is the meaning of it for me as a user? besides the fact that it is written in RUST))kkak it's cool RUST - no) |
Beta Was this translation helpful? Give feedback.
-
|
Any highly loaded professional project removes the pydantic as it is slow. What is not clear? |
Beta Was this translation helpful? Give feedback.
-
|
@samuelcolvin Python's built-in Approach 1: 15 seconds (combined total over 210 server responses) result = MyModel[MyProps](**json.loads(body_bytes))Approach 2: 22 seconds (combined total over the same 210 server responses) result = MyModel[MyProps].model_validate_json(body_bytes)Both results are better than V1, however, I'm struggling to see the promised speed advantage of |
Beta Was this translation helpful? Give feedback.
-
|
JOOI - what's the latest benchmarking here? Is the native pydantic (de)serialization significantly slower than orjson? |
Beta Was this translation helpful? Give feedback.
-
|
There is another point, which is kind of discussed here, but not directly - custom encoders. I have a project, which for the past decade used vanilla Here are several examples of the encoder rules:
There are a couple of more rules, but you get the idea. I understand, that there is now a concept of Serialization, but that isn't quite the same as before, when I could re-use the well established serialization rules. I guess for now I will have to use I don't know if I have any specific questions right now, but I just wanted to point out that there is more to the discussion, then performance considerations. |
Beta Was this translation helpful? Give feedback.
-
|
Could you please provide the full code snippet used to analyze the differing performance? I'm doing a presentation on performance with pydantic, and want to collect the latest metrics for tests like yours :). |
Beta Was this translation helpful? Give feedback.
-
|
How did you solved it ? Are you still on Pydantic V1 with orjson ? I've similar case but, it's just that I use msgspec. |
Beta Was this translation helpful? Give feedback.
-
|
any latest update? |
Beta Was this translation helpful? Give feedback.
-
|
As discussed above, this is not required - pydantic 2 has an extremely fast JSON parser (jiter) built in, thus Here's an example: Code# /// script
# requires-python = ">=3.13"
# dependencies = [
# "orjson==3.11.1",
# "pydantic==2.11.7",
# ]
# ///
import json
import random
import timeit
import uuid
from datetime import date
from typing import Generic, TypeVar
import orjson
import pydantic_core
from pydantic import BaseModel
class PModel(BaseModel):
field1: str
field2: str
field3: int | None = None
field4: int | None = None
field5: int | None = None
field6: int | None = None
field7: int | None = None
field8: uuid.UUID | None = None
field9: uuid.UUID | None = None
field10: bool | None = None
field11: int | None = None
field12: str | None = None
field13: str | None = None
field14: bool | None = None
# field15: IdModel
id: str
uuid: uuid.UUID
field16: str | None = None
field17: float | None = None
field18: date | None = None
P = TypeVar('P')
class Result(BaseModel, Generic[P]):
id: int
uuid: uuid.UUID
properties: P
class Response(BaseModel, Generic[P]):
results: list[Result[P]]
def rand_str():
return ''.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(random.randint(1, 1000)))
def rand_num():
return random.randint(1, 1000)
def gen_p():
return {
'field1': rand_str(),
'field2': rand_str(),
'field3': rand_num(),
# skip half of the optional fields
'field5': rand_num(),
'field7': rand_num(),
'field9': str(uuid.uuid4()),
'field10': True,
'field12': rand_str(),
'field14': False,
'id': rand_str(),
'uuid': str(uuid.uuid4()),
}
def gen_result():
return {
'id': rand_num(),
'uuid': str(uuid.uuid4()),
'properties': gen_p(),
}
def gen_response():
return {'results': [gen_result() for _ in range(1000)]}
json_data = json.dumps(gen_response()).encode()
repeat_count = 1_000
timer = timeit.Timer(
'model_validate_json(json_data)',
setup='model_validate_json = Response[PModel].model_validate_json',
globals={'Response': Response, 'PModel': PModel, 'json_data': json_data},
)
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'model_validate_json(...): {timings[-1] / repeat_count * 1_000:.2f}ms')
timer = timeit.Timer(
'model_validate(json_loads(json_data))',
setup='model_validate = Response[PModel].model_validate',
globals={'json_data': json_data, 'json_loads': json.loads, 'Response': Response, 'PModel': PModel},
)
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'model_validate(json.loads(...)): {timings[-1] / repeat_count * 1_000:.2f}ms')
timer = timeit.Timer(
'model_validate(orjson_loads(json_data))',
setup='model_validate = Response[PModel].model_validate',
globals={'json_data': json_data, 'orjson_loads': orjson.loads, 'Response': Response, 'PModel': PModel},
)
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'model_validate(orjson.loads(...)): {timings[-1] / repeat_count * 1_000:.2f}ms')
timer = timeit.Timer('json_loads(json_data)', globals={'json_data': json_data, 'json_loads': json.loads})
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'json.loads(...): {timings[-1] / repeat_count * 1_000:.2f}ms')
timer = timeit.Timer(
'pydantic_core_from_json(json_data)',
globals={'json_data': json_data, 'pydantic_core_from_json': pydantic_core.from_json},
)
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'pydantic_core.from_json(...): {timings[-1] / repeat_count * 1_000:.2f}ms')
timer = timeit.Timer('orjson_loads(json_data)', globals={'json_data': json_data, 'orjson_loads': orjson.loads})
timings = timer.repeat(repeat=2, number=repeat_count)
print(f'orjson.loads(...): {timings[-1] / repeat_count * 1_000:.2f}ms')In addition, orjson has some shortcomings which pydantic_core's JSON parser (jiter) doesn't suffer from - for example orjson parses large ints as floats which looses detail and can break validation: # /// script
# requires-python = ">=3.13"
# dependencies = [
# "orjson==3.11.1",
# "pydantic==2.11.7",
# ]
# ///
import json
import orjson
import pydantic_core
from pydantic import BaseModel
class IntModel(BaseModel):
value: int
input_json = json.dumps({'value': 2**65})
print('input_json:', input_json)
# > input_json: {"value": 36893488147419103232}
print('IntModel.model_validate_json:', IntModel.model_validate_json(input_json))
# > IntModel.model_validate_json: value=36893488147419103232
print('pydantic_core.from_json:', pydantic_core.from_json(input_json))
# > pydantic_core.from_json: {'value': 36893488147419103232}
print('json.loads:', json.loads(input_json))
# > json.loads: {'value': 36893488147419103232}
print('orjson.loads:', orjson.loads(input_json))
# parses 2**65 as a float!
# > orjson.loads: {'value': 3.6893488147419103e+19}
print('IntModel.model_validate(json.loads(input_json)):', IntModel.model_validate(json.loads(input_json)))
# > IntModel.model_validate(json.loads(input_json)): value=36893488147419103232
print('IntModel.model_validate(orjson.loads(input_json)):', IntModel.model_validate(orjson.loads(input_json)))
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for IntModel
value
Unable to parse input string as an integer, exceeded maximum size [type=int_parsing_size, input_value=3.6893488147419103e+19, input_type=float]
For further information visit https://errors.pydantic.dev/2.11/v/int_parsing_size
""" |
Beta Was this translation helpful? Give feedback.







Uh oh!
There was an error while loading. Please reload this page.
-
In Pydantic V1, I use
orjsonfor myjson_dumpsandjson_loads:I am having a little trouble figuring out how to do this in Pydantic V2, or if this is even necessary. With pydantic-core rewritten in Rust and the performance improvements along with it, is using orjson even necessary? Or would the configuration look more like implementing a
SchemaSerializer? Thinking something like this in V2, but wondering what I would give up in Pydantic V2 for this somewhat-ham-fisted approach:Beta Was this translation helpful? Give feedback.
All reactions