-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fulltext search does not scale properly #345
Comments
@holta The error message "system overload error" seems not to be generated by kiwix-serve after a first investigation. |
Correct. The exact wording of the error message is something I'll try to obtain from one of the teachers in coming days. (Hopefully we can also get a photographed screenshot sent to us from the field, using WhatsApp?) |
@holta Yes, screenshot would be even better. |
@holta You can also start kiwix-serve with more threads, see |
@kelson42 here are the flags used by IIAB:
Perhaps we should also try...
PS i expect screenshot(s) from Mexican teachers by the end of the week hopefully. |
@holta Yes, I would definitly try to use --thread > 4 |
@kelson42 here is the error/warning that most every teacher/student is repeatedly running into: (in Oaxaca State, Mexico)
This results from their using the search box (textfield) at the top right of the "maxi" Spanish Wikipedia page:
|
@holta Thank you for completing your ticket. The error message displayed in the picture is generated by the Apache reverse proxy, not Kiwix serve. That does not mean of course that at the root of the problem there is not a problem with Kiwix serve, which is behind. But, it masks somehow what happens at Kiwix serve level. For the moment, my best theory is that considering that you start Please just try to increase the number of threads and let me know if it works better. |
|
The last error looks pretty similar ro the first one: kiwix serve does not answer. |
So far it would (appear) these serious problems {502, 503, 504 errors} arise regardless whether Apache or NGINX is proxying. |
@holta Simplify the problem: try directly to deal with |
I wrote a puppeteer script to test the kiwix load on an rpi 4 with 2 G RAM. With kiwix proxied through nginx I start to get gateway timeouts about half way through the run. If I change the requests to go directly to port 3000 I get no 504 errors. The 4 kiwix threads are pinned at nearly 100%, but tail off before all the results have been sent, even directly from port 3000. When proxied, it took about 40 seconds before any results appeared and the whole run took 2 minutes and 10 seconds. Each thread took 8 to 10% of memory, so I think not memory bound. I tried 8 threads and they ran at about 50% cpu, so I think no benefit. |
@tim-moody Could you please share your exact testing procedure in an attempt for me to reproduce it? |
Here is the nodejs code:
kiwix is running as a service: ExecStart=/opt/iiab/kiwix/bin/kiwix-serve --daemon --port 3000 --nolibrarybutton --library /library/zims/library.xml --urlRootLocation=/kiwix/ |
After investigating, there is several things: Puppeteer is a full browser. When it does a request to a search page, it also interpret the content and do sub-request if needed (image, css, js). Handling this and doing the layout takes time. When measuring all the time need by all request to render one page, kiwix-serve takes around 1.3s to handle all requests(9). But when measuring puppeteer request, it took more than 2s per page. This overhead is not on kiwix-serve and we cannot do a lot about this. All sub-requests are dues to the search bar. If we launch kiwix-serve with Then came the interesting part. It seems that most of the time is spend not doing the search, but get the article content to generate the snippet used in the result html page. Sound good ? Wait, there is more. Trying to not use Here my test script if you want to test on your side : const request = require('request');
host = 'http://localhost:8080';
baseUrl = host + '/kiwix/search?content=wikipedia_es_all_mini_2019-09&pattern=';
testUrls = [
'mexico+',
'futbol+',
'amlo+',
'gobierno+',
'pais+',
'nuevo+',
'juego+',
'oaxaca+',
'gastronomia+',
'pueblo+'
]
process.setMaxListeners(Infinity); // avoid max listeners warning
function main() {
for (let i=0; i<6; i++) {
loadSet(i);
}
}
function loadSet(index) {
for (let i = 0; i < testUrls.length; i++) {
loadPage(testUrls[i], index);
}
}
function loadPage(url, index) {
const options = {
url: baseUrl+url,
// Some custom headers to track the requests on kiwix-serve side.
// headers : {
// 'x-debuginfo': url+index
// }
};
console.time(url+index);
request(options, (err, res, body) => {
if (err)
console.log(err);
else
console.timeEnd(url+index);
});
}
main(); |
@tim-moody can you give me your setup ? (raspberry version, kiwix-serve version, file on the sdcard or external drive ?) |
If it helps, in Mexican classrooms in early Dec 2019 where these issues were uncovered:
|
CLARIF:
|
storage is an sdcard. I think kiwix was as @holta says 3.0.1-8 I started with request, I used puppeteer to simulate a real request with subrequests. |
So, I've played a bit with kiwix-serve on a RPi3. There is few things that impact the kiwix-serve search performance. FS cache.First requests on kiwix-serve (RPi just started) take indeed around 40s to finish. But request's times slowly decrease to around 10s (in less that 10 requests). After that, all requests take around 10s to finish. If you clean the cache ( There is nothing we can really do here (except not shutdown the RPi). And this cache is usefull in test context (were we have only one zim file), but it real case, the effect is probably reduced. Generating the snippet.To generate the snippet we have to read the articles content (so uncompress cluster, most of the time is spend here), parse it to remove the html and give the raw text to xapian to generate a text corresponding to the request. If we don't include snippet in the result page, request's times drop to less than a second. I've tried to increase the internal libzim cache. It has some interesting effect but only for large cache (128 instead of 64 cached cluster) and with very few different urls (around 3s per request instead of 10s). But with a lot of different requests, a large cache increase the request time. It may have a bug here, but increasing the internal cache is not a solution for now. The number of requests has few impact of the performance in itself. Requests are simply queued and kiwix-serve has to finish previous ones before. What can we do ?A easy workaround to implement would be to add a option to kiwix-serve to generate search page without snippet. A more reliable solution is to rethink how we generate the snippet. This probably need another issue to track this. So I will be succinct here. We can also thing about our cache system. We cache the clusters. But should we cache the article instead ? Should we compress it (lz4 is really fast https://lz4.github.io/lz4/) to be able to cache more value ? Rewrite it to be more efficient in multi-thread environment ? ... We could also "simply" improve the main decompression time by using another compression algorithme than lzma (zstd seems to have huge decompression speed x8 for the same compression speed and about the same ratio (+2%)) |
@mgautierfr To conclude, we have the few ft searches around 40s and then it will slowly come around 10s? All of them are successfully done? |
@mgautierfr @holta @tim-moody Thank you very much to all of you for participating to this ticket. Here are my conclusions:
Considering that there is no bug and nothing we can do more here, I propose to close this ticket. |
Tested: Kiwix Serve 1.2.1 has the same limitation, it is even slower. |
Kiwix load-testing stats published, thanks to @tim-moody: |
Feedback from @holta
This happened in a few rural Mexican schools after we leave so will not be able to reproduce this easily. But here's the background if it helps:
Teachers are using (trying to use) the most recent "maxi" Spanish Wikipedia ZIM file.
It only takes a couple kids to access Wikipedia (title search generally, very occasional fulltext search) to lock out all other students.
All kids are blocked when this happens, and teachers become extremely frustrated, as they can no longer depend on Wikipedia for any small group or classroom lessons going forward.
Nobody can use Wikipedia in any way for about 2 minutes or so, as a result of this new "system overload error" that did not occur in the past. (The RPi Internet-in-a-Box server is not actually overloaded as its other services all continue to work, during these times when kiwix-serve fails.)
There is a Kiwix error message they find very alarming, something to the effect of "system is overloaded" even when only a couple students are using it i.e. preventing the use of Wikipedia in group/classroom situations.
The language of the error message is very alarming to teachers, who believe everything is broken. The error message should be changed to explain better that access is generally restored after waiting a few minutes.
The text was updated successfully, but these errors were encountered: