Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
For usage statistics, see Site usage.
These are historical notes; some have been addressed, some not yet.
The software is CPU-bound. There are 4 cores on the current WDTK server. These are the first two. Some cron jobs run with processor affinity set, which is why CPU0 has higher load than CPU1 (and the other cores not shown).
Why is it CPU-bound? It is possible there are some performance snafus here, since some of the processes that are chewing up CPU cycles are performing tasks one would not expect to be computationally-intensive (e.g. sending out reminder emails). There are some details below, but more needs doing to understand the causes.
Use AWS or similar for high-intensity operations, e.g. https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started
Specifically, use DocumentCloud service for document conversion and hosting.
Reduce storing the number of bogus post redirects that aren't people
Receiving email can be resource drain starting app instance each time - use daemon instead
Cache /feed/list/successful Cache /body/list/a
Cache parts of /body/xxxxx Cache parts of /user/xxxxx
Finish migration to Ruby 1.9 - for uncached requests, seems to be twice as fast.
Regular expression library - change to faster one. Oniguruma isn't enough. This shows slowness: e = InfoRequestEvent.find(213700) text = e.incoming_message.get_main_body_text (XXX alter to call internal not cache) IncomingMessage.remove_quoted_sections(text, "")
wvWare sometimes loops: https://github.com/mysociety/alaveteli/issues/299 pdftk sometimes loops: http://www.whatdotheyknow.com/request/87534/response/234022/attach/7/HC15.pdf
Some requests to lower memory use of still: PID: 676 CONSUME MEMORY: 16968 KB Now: 102604 KB http://www.whatdotheyknow.com/request/parking_ticket_data_81 PID: 2036 CONSUME MEMORY: 129368 KB Now: 179652 KB http://www.whatdotheyknow.com/request/14186/response/33740
- search engines shouldn't be going for those URLs. and do they really need to unpack so much? could use snippet cache.
Things to make bots not crawl somehow: /request/13683/response?internal_review=1 /request/febrile_neutropenia_154?unfold=1
Renaming of a body, or changing its domain, should clear the cached bubbles of all requests to that body.