-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use requests_futures to sync #2
Conversation
Using requests_futures allows to use a thread pool to request blocks in parallel. This speeds up sync, especially for large numbers of sensors.
First try at speeding up sync for large numbers of sensors. Requires requests_futures: |
Obvious improvements I can think of right now:
|
I tried this pull request for syncing data since a couple weeks. It seemed to work but I can't benchmark the speed. Then I got this exception (and some warning, dunno if they are related): /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no JSONDecodeError Traceback (most recent call last) /home/roel/data/work/opengrid/code/tmpo-py/tmpo.pyc in sync(self, *sids) /home/roel/data/work/opengrid/code/tmpo-py/tmpo.pyc in _rqsync(self, sid, rid, lvl, bid) /usr/local/lib/python2.7/dist-packages/requests/models.pyc in json(self, *_kwargs) /usr/lib/python2.7/dist-packages/simplejson/init.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw) /usr/lib/python2.7/dist-packages/simplejson/decoder.pyc in decode(self, s, _w, _PY3) /usr/lib/python2.7/dist-packages/simplejson/decoder.pyc in raw_decode(self, s, idx, _w, _PY3) JSONDecodeError: Expecting value: line 1 column 1 (char 0) |
Same thing happens on a new sync, after downloading some data first. So I think the data gets synced, and the exception occurs somewhere in the end? |
Oops, checked out a previous commit and I get the same exception. And now I have tested also different previous versions, and always the same exception. It could be linked to a specific sensor (the last I get before the exception). I'll try to remove it and sync again. |
ok, so I get it also after removing that last printed sensor ID, so it seems not linked to a specific sensor. |
as far as I can tell, the security warning and the exception are not related. It seems like one of the responses contains invalid JSON. It won't be the last one printed, but the first one not printed. That's a bit hard to debug :-S Can you add a print of 'sid' before the 'for t in r.json()' (line 275) ? That should print the sensor id that is about to parse some JSON for. Also, the problem doesn't seem to be in the code of this pull request, can you try the code on master? I expect the problem to be present there too. |
I agree it was not linked to the pull request, so if I encounter it again, I'll create a new issue |
Tmpo-py now defaults to 16 threads in the pool. Specify the "workers" named parameter when creating the tmpo Session to size the pool differently. Works great! One possible further optimization would be the use of a ProcessPoolExecutor to handle the syncing of multiple sensors in parallel processes. Each of these might then turn to a ThreadPoolExecutor to issue its HTTP requests and sink them to the SQLite DB. The latter uses table-level locking IIRC, so SQLite I/O might become a bottleneck in this scenario. |
Using requests_futures allows to use a thread pool to request blocks in
parallel. This speeds up sync, especially for large numbers of sensors.