Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use requests_futures to sync #2

Merged
merged 1 commit into from
Dec 16, 2014
Merged

Use requests_futures to sync #2

merged 1 commit into from
Dec 16, 2014

Conversation

wouterh
Copy link
Contributor

@wouterh wouterh commented Dec 3, 2014

Using requests_futures allows to use a thread pool to request blocks in
parallel. This speeds up sync, especially for large numbers of sensors.

Using requests_futures allows to use a thread pool to request blocks in
parallel. This speeds up sync, especially for large numbers of sensors.
@wouterh
Copy link
Contributor Author

wouterh commented Dec 3, 2014

First try at speeding up sync for large numbers of sensors. Requires requests_futures:
https://github.com/ross/requests-futures
pip install requests-futures

@wouterh
Copy link
Contributor Author

wouterh commented Dec 4, 2014

Obvious improvements I can think of right now:

  • put the import of requests_futures in a try/catch and fallback to normal requests if not available
  • make the number of threads in the threadpool an optional parameter to the constructor

@saroele
Copy link
Contributor

saroele commented Dec 14, 2014

I tried this pull request for syncing data since a couple weeks. It seemed to work but I can't benchmark the speed. Then I got this exception (and some warning, dunno if they are related):

/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py:251: SecurityWarning: Certificate has no subjectAltName, falling back to check for a commonName for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See urllib3/urllib3#497 for details.)
SecurityWarning


JSONDecodeError Traceback (most recent call last)
in ()
----> 1 tmpos.sync()

/home/roel/data/work/opengrid/code/tmpo-py/tmpo.pyc in sync(self, *sids)
176 else:
177 rid, lvl, bid = 0, 0, 0
--> 178 self._rqsync(sid, rid, lvl, bid)
179
180 def list(self, *sids):

/home/roel/data/work/opengrid/code/tmpo-py/tmpo.pyc in _rqsync(self, sid, rid, lvl, bid)
273 r = f.result()
274 fs = []
--> 275 for t in r.json():
276 fs.append((t,self._rqblock(sid, token, t["rid"], t["lvl"], t["bid"], t["ext"])))
277 for (t,f) in fs:

/usr/local/lib/python2.7/dist-packages/requests/models.pyc in json(self, *_kwargs)
795 # used.
796 pass
--> 797 return json.loads(self.text, *_kwargs)
798
799 @Property

/usr/lib/python2.7/dist-packages/simplejson/init.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
486 parse_constant is None and object_pairs_hook is None
487 and not use_decimal and not kw):
--> 488 return _default_decoder.decode(s)
489 if cls is None:
490 cls = JSONDecoder

/usr/lib/python2.7/dist-packages/simplejson/decoder.pyc in decode(self, s, _w, _PY3)
368 if _PY3 and isinstance(s, binary_type):
369 s = s.decode(self.encoding)
--> 370 obj, end = self.raw_decode(s)
371 end = _w(s, end).end()
372 if end != len(s):

/usr/lib/python2.7/dist-packages/simplejson/decoder.pyc in raw_decode(self, s, idx, _w, _PY3)
387 if _PY3 and not isinstance(s, text_type):
388 raise TypeError("Input string must be text, not bytes")
--> 389 return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@saroele
Copy link
Contributor

saroele commented Dec 14, 2014

Same thing happens on a new sync, after downloading some data first. So I think the data gets synced, and the exception occurs somewhere in the end?

@saroele
Copy link
Contributor

saroele commented Dec 14, 2014

Oops, checked out a previous commit and I get the same exception. And now I have tested also different previous versions, and always the same exception.

It could be linked to a specific sensor (the last I get before the exception). I'll try to remove it and sync again.

@saroele
Copy link
Contributor

saroele commented Dec 14, 2014

ok, so I get it also after removing that last printed sensor ID, so it seems not linked to a specific sensor.
And I can confirm that syncing seems to work up to the exception, cause my graphs contain recent data :-)

@wouterh
Copy link
Contributor Author

wouterh commented Dec 15, 2014

as far as I can tell, the security warning and the exception are not related.

It seems like one of the responses contains invalid JSON. It won't be the last one printed, but the first one not printed. That's a bit hard to debug :-S

Can you add a print of 'sid' before the 'for t in r.json()' (line 275) ? That should print the sensor id that is about to parse some JSON for.

Also, the problem doesn't seem to be in the code of this pull request, can you try the code on master? I expect the problem to be present there too.

icarus75 added a commit that referenced this pull request Dec 16, 2014
Use requests_futures to sync
@icarus75 icarus75 merged commit 987038f into flukso:master Dec 16, 2014
@saroele
Copy link
Contributor

saroele commented Dec 17, 2014

I agree it was not linked to the pull request, so if I encounter it again, I'll create a new issue

@icarus75
Copy link
Member

Tmpo-py now defaults to 16 threads in the pool. Specify the "workers" named parameter when creating the tmpo Session to size the pool differently. Works great!

One possible further optimization would be the use of a ProcessPoolExecutor to handle the syncing of multiple sensors in parallel processes. Each of these might then turn to a ThreadPoolExecutor to issue its HTTP requests and sink them to the SQLite DB. The latter uses table-level locking IIRC, so SQLite I/O might become a bottleneck in this scenario.

icarus75 pushed a commit that referenced this pull request Mar 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants