-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random generator init is very slow #57
Comments
@patrikaxelsson Thanks for taking the time to look at this in so much detail. It's difficult to track this code, as the random stuff has changed through various OpenSSL versions. However, it seems https://github.com/jens-maus/amissl/blob/4.10/openssl/crypto/rand/rand_unix.c#L144 probably explains why @theantony chose to use only 1/4 of the data. I also see that @theantony switched from UNIT_MICROHZ to UNIT_VBLANK to reduce overhead. But, in your example, 128 bytes means 26 iterations and 416 (26 * 16) DoIO() calls. I guess you took the 128 byte RAND_seed() from the included examples. I need to check, but strictly speaking you probably don't actually need to call this at all. IIRC, it should get seeded on init, but I could be wrong - the random stuff is hard to follow, and it has changed again for OpenSSL 3.0. That said, I probably already did it, but it'll be worth investigating how often OpenSSL reseeds itself automatically, in case it is a bottleneck. The question is how can it be improved - it hasn't really changed since AmiSSL v3. |
You are correct that AmiSSL will reseed itself even without calling RAND_seed(). Did some experimenting/benchmarking with the AmiGemini source and the total time needed to complete a request will be exactly the same without RAND_seed(), the extra 2.2 seconds (in PAL) is instead spend around SSL_CTX_new() if I am not mistaken. However, it was more convenient for benchmarking testing to use RAND_init() which causes the exact same thing. I took the init procedure from the include amiga https.c example. 128 calls of 1microsec TR_ADDREQUEST to timer.device/UNIT_VBLANK takes the predicted ~2.56s on all Amigas running PAL, which suggests the number of iterations in AmiSSL should be no more than 2.2/(1.0/50)=110. Regarding improvements, changing back to UNIT_MICROHZ would be given. C= does document UNIT_VBLANK as the most efficient and stable device for delaying long periods. However I am not sure how that efficiency would be a problem for a low number of iterations (~110), to be blunt it doesn’t sound like the change was really tested as it totally ruined the performance of the entropy generation. Going by the documentation, UNIT_MICROHZ sounds like the perfect match for this use case:
Good for short requests (which this surely is at minimum time possible) and potential for variation in accuracy would be good for entropy. Regarding throwing away 3/4 of the data, I don’t see how not making use of all the data and then just making more measurements really helps entropy. |
I think it is more than 128 calls though, as it fills the buffer (20 bytes) 1 byte per TR_ADDREQUEST and only uses 5 bytes from the buffer, as you noticed. I'll stick some debug output in to check this out though. First, I also want to check your tests with AmiSSL v3, just to make sure I didn't mess something up due to the modifications required by the OpenSSL 1.1.x changes. I know parameters changes from bits to bytes and vice versa at various times. I probably did check it all at the time, but I can't remember! |
Ok, so I'll do some more testing to see what I can deduce. I'm currently working on porting OpenSSL 3.0 for AmiSSL v5, but if the random seeding can be improved, I'll certainly hope to get this fixed in another v4 release first (random seeding changed for OpenSSL 3.0 again too). |
I don't remember any context on this change, but found a message from @Futaura in my mail archive from October 10, 2005. Assuming it corresponds to the same code path it may explain why the change was made: (Initial quote and the last part are from Oliver, the "something wrong" part in the middle is from me)
I meant to put some debug output in there, and now I have :) Turns out |
@theantony Wow - thanks! I obviously completely forgot about that :-) I've now found your replies on this in my THOR archive (predates my switch to Gmail) and have refreshed my memory. @patrikaxelsson In summary, I built AmiSSL v3 with SAS/C profiling enabled, to find slowdown bottlenecks in AmiSSL v3 (before the 68k version was released), which I was experiencing with IBrowse on my A1200. This highlighted the bn stuff as being a bottleneck for DH connections, which has since been partly addressed recently with some asm optimisations for 68k. The other was that RAND_poll() took 10 seconds to complete with UNIT_MICROHZ and just over 1 second with UNIT_VBLANK, I concluded:
To confirm this, I've just now tried changing the code back to UNIT_MICROHZ and running your test on my A1200 (60Hz vblank). All that said, I'm still going to re-test this. I just want to make sure the code is behaving as intended (compared to other platforms) and I didn't break a calculation somewhere when porting over OpenSSL changes in this area. I'll also think about how this code can be improved - I mean 10 seconds to 2 seconds was a great speed boost at the time, but perhaps we can do better still. |
It is not reasonable that UNIT_MICROHZ would have a higher minimal time than UNIT_VBLANK because of overhead. That would mean that the calculations it is doing to handle the delay would need more CPU time than the time of a screen refresh, which imho would be an insane amount for crunching for such task. Also, such an inefficiency would not align with the C= documentation where they describe that UNIT_MICROHZ is suitable for short-burst timing - actually in case of such inefficiency, its entire existence would be pointless. Did a test of of the time for 112 1microsecond delays for all the relevant timer.device units. A3000 030@25Mhz:
UAE (very fast):
Something else must be happening. UNIT_MICROHZ does not manage 1microsecond delays, but I did not expect that as just the context switch most likely takes more than 1micro, but at 1/38 of the UNIT_VBLANK min delay on the 030 it is way faster. Attaching code, with this new test included. |
Damn...
That's on my A1200 BPPC 68060/50MHz (VBLANK 60Hz) and my A1XE for the latter, just for comparison (OS4/PPC has no UNIT_VBLANK - it uses UNIT_MICROHZ instead). Sorry @theantony - seems I was wrong 15 years ago, and nobody noticed until now. Obviously, something in the timer.device C rewrite was suboptimal on 68k, before development switched exclusively to PPC after 50.22. I never imagined the changes could have made such a big difference. I was rewritten again since then too. I have been using 50.22 on my main A1200 boot since 2002, along with lots of other OS4 bits (I don't want to lose anti-aliased text rendering in IBrowse, for example). I've switched out my timer.device for the 46.1 version now and I really must look at the other components too at some point. I will switch it back to UNIT_MICROHZ for AmiSSL 4.11. I will still do some further checking on the rand code to make sure everything is in order though. |
Good find! Looking forward to AmiSSL 4.11 :). |
Ok, so you were right that the I've have reworked the code a little to make the function of it a little clearer, so it makes more sense when reading it. I also took the opportunity to switch from SHA-1 to SHA-256, which did not appear to slow Other than that, I'm happy with the implementation - it is very similar to implementations for other platforms and generic code elsewhere in OpenSSL, so happy to leave the rest as-is as there doesn't seem to be a better way on OS3. |
Sounds great! Do you have any possibility to test it on a more modest machine? |
With the 68k version of AmiSSL (4.10 tested), the random generator init is very slow, apparently disregarding how fast processor you have. You can however vary the speed of it somewhat by changing the native screenmode - the higher refresh rate, the faster the random generator init runs.
First three runs on an A3000 030@25Mhz, with varying refresh rates from 50-70Hz (PAL, NTSC, Euro72 without VGAOnly). AmiSSL has already been pre-initialized with a run of InitAmiSSLTest, so the first test would not have a distractingly large OpenAmiSSL(). The initialization is taken from the https.c example and the function of interest is RAND_seed():
If the same thing is run on a comparatively blistering fast UAE in the same manner, the RAND_seed() times are still the same:
Have tested on several different machines with the same result for RAND_seed().
I think the issue is a combination of these two things:
Imagine using AmiSSL in a command line tool, disregarding how fast amiga you have, it would still have a minimal execution time of ~2s.
AmiSSLTest.zip
The text was updated successfully, but these errors were encountered: