Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks in ujson #4

Closed
sadovnychyi opened this issue Jun 3, 2019 · 3 comments · Fixed by #5
Closed

Memory leaks in ujson #4

sadovnychyi opened this issue Jun 3, 2019 · 3 comments · Fixed by #5
Labels
bug Something isn't working

Comments

@sadovnychyi
Copy link
Contributor

We've replaced the usage of json with srsly.ujson a while ago for that free performance boost since we are doing lots of JSON encoding/decoding and we already have it installed as part of spacy, but now we had to move back because of some terrible memory leaks:

import json
import random
import string
import psutil
from srsly import ujson as json

sample = lambda x: ''.join(
  random.choice(string.ascii_uppercase + string.digits) for _ in range(x))

process = psutil.Process()

for i in range(10):
  data = json.dumps({sample(99): sample(100000) for k in range(50)})
  json.loads(data)
  print(process.memory_info())

Output with ujson:

pmem(rss=24203264, vms=4400664576, pfaults=19049, pageins=0)
pmem(rss=29409280, vms=4414173184, pfaults=33898, pageins=0)
pmem(rss=34557952, vms=4419309568, pfaults=47960, pageins=0)
pmem(rss=39714816, vms=4424429568, pfaults=62855, pageins=0)
pmem(rss=44838912, vms=4429549568, pfaults=77571, pageins=0)
pmem(rss=50081792, vms=4434751488, pfaults=92307, pageins=0)
pmem(rss=55312384, vms=4439973888, pfaults=107288, pageins=0)
pmem(rss=60440576, vms=4445093888, pfaults=122711, pageins=0)
pmem(rss=65806336, vms=4451422208, pfaults=137578, pageins=0)
pmem(rss=70934528, vms=4456542208, pfaults=151382, pageins=0)

Output with stdlib json:

pmem(rss=17154048, vms=4385366016, pfaults=17692, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=32047, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=46541, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=61035, pageins=0)
pmem(rss=17358848, vms=4403191808, pfaults=75539, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=90039, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=104533, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=119036, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=133530, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=148024, pageins=0)

Benchmark ran on python3.7 on macos, but same leak exists on Debian with python27.
You can increase the range from 10 and you will eventually run of our memory.

Days were spent on this issue, because I would never suspect the JSON library to be at fault, but it is. I don't know if it affects Spacy in any way.

Could be related: ultrajson/ultrajson#270

@honnibal
Copy link
Member

honnibal commented Jun 3, 2019

Oh wow, thanks! This could explain a lot actually. Do you have time to port their patch over?

I have to say, I feel pretty sheepish about this. The fact that ujson had that PR open for a while was one of the things that made us think that vendoring our own fork might not be such a bad idea. But...then I got started doing srsly, and I forgot all about that PR :(. So I never ended up actually integrating the patch!

@sadovnychyi
Copy link
Contributor Author

Same benchmark with patch applied:

pmem(rss=28975104, vms=4545011712, pfaults=22611, pageins=9)
pmem(rss=29028352, vms=4554448896, pfaults=34844, pageins=9)
pmem(rss=29151232, vms=4572274688, pfaults=46294, pageins=9)
pmem(rss=29184000, vms=4572274688, pfaults=57722, pageins=9)
pmem(rss=29184000, vms=4572274688, pfaults=69142, pageins=9)
pmem(rss=29184000, vms=4572274688, pfaults=80562, pageins=9)
pmem(rss=29220864, vms=4572274688, pfaults=91991, pageins=9)
pmem(rss=29270016, vms=4581711872, pfaults=103423, pageins=9)
pmem(rss=29270016, vms=4581711872, pfaults=114843, pageins=9)
pmem(rss=29306880, vms=4581711872, pfaults=126272, pageins=9)

@honnibal
Copy link
Member

honnibal commented Jun 7, 2019

Should be live now 🎉 . Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants