Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading from stdin extremely slow #3281

Closed
Sp1l opened this issue Apr 22, 2017 · 5 comments
Closed

Reading from stdin extremely slow #3281

Sp1l opened this issue Apr 22, 2017 · 5 comments

Comments

@Sp1l
Copy link
Contributor

Sp1l commented Apr 22, 2017

Hi,

A user reported performance issues on the FreeBSD port of OpenSSL 1.1.0e using openssl enc. Investigating the issue, I found that reading stdin was extremely slow compared to using -in. This poor performance is not experienced with the 1.0.2 branch.

For reference and build options

$ /usr/local/bin/openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 10886156 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 2878605 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 256 size blocks: 727872 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 183097 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 22814 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 11369 aes-256 cbc's in 3.00s
OpenSSL 1.1.0e  16 Feb 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/openssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc      58059.50k    61250.73k    62111.74k    62497.11k    62297.43k    62089.90k
$  uname -a
FreeBSD build.brnrd.eu 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0 r316824: Fri Apr 14 21:01:03 CEST 2017     bernard@build.brnrd.eu:/usr/obj/usr/src/sys/BEASTIE110  amd64
$  clang -v
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
Target: x86_64-unknown-freebsd11.0
Thread model: posix
InstalledDir: /usr/bin

Created a 512M test file from /dev/zero named test.zero

$ dd if=/dev/zero bs=8m count=64 of=test.zero
64+0 records in
64+0 records out
536870912 bytes transferred in 0.363638 secs (1476389339 bytes/sec)

Using openssl internal read/write

$ time /usr/local/bin/openssl enc -e -aes-256-cbc -pass pass:fred -in test.zero -out /dev/null
1.29s user 0.17s system 99% cpu 1.457 total

Using internal read and stdout

$ time /usr/local/bin/openssl enc -e -aes-256-cbc -pass pass:fred -in test.zero > /dev/null
1.15s user 0.20s system 99% cpu 1.353 total

Using stdin and internal write (using dd so we have some statistics)

$ dd if=/dev/zero bs=8m count=64 | /usr/local/bin/openssl enc -e -chacha20 -pass pass:fred -out /dev/null
load: 1.32 cmd: openssl 14732 [running] 49.12r 8.31u 40.75s 100% 3632k
^C24+0 records in
22+2 records out
192937984 bytes transferred in 141.940640 secs (1359286 bytes/sec)
Killed after 140 seconds

Tested with -chacha20 as well with similar results.

Thanks for a great product!
Bernard Spil
Maintainer OpenSSL ports in FreeBSD

@richsalz
Copy link
Contributor

any chance you can get more detailed timing (strace syscall whatever) to see where the time is being spent?

thanks for maintaining the port!

@Sp1l
Copy link
Contributor Author

Sp1l commented Apr 23, 2017

Just ran ktrace on the command. It reads one byte at a time when using stdin.

dd if=/dev/zero bs=8m count=64 | /usr/local/bin/openssl enc -e -aes-256-cbc -pass pass:fred -out /dev/null

kdump shows repetition of

56198 openssl CALL read(0,0x801630677,0x1)
56198 openssl GIO fd 0 read 1 byte
"\0"
56198 openssl RET read 1

Inspecting the trace file

$ kdump | wc -l
11480965
kdump | grep -c 'read(0,0x801630677,0x1)'
2848491
$ bc -e 'scale=5; 11480965/2848491'
4.03054

So there's little activity apart from the single-byte reads.

 56198 openssl  CALL  write(0x3,0x801a46000,0x1000)
 56198 openssl  GIO   fd 3 wrote 4096 bytes
       0x0000 eda1 3f77 40dd 3df5 c7bc cee2 510b 9c3c 67da 00ba 8f5b 1832 6def fbeb 0349 a7e9 f15a  |..?w@.=.....Q..<g....[.2m....I...Z|
<snip>
       0x0ff0 cca7 4609 d752 76cf f2ef 6c39 22ff e1a2                                               |..F..Rv...l9"...|

 56198 openssl  RET   write 4096/0x1000
 56198 openssl  CALL  write(0x3,0x801a46000,0x1000)
 56198 openssl  GIO   fd 3 wrote 4096 bytes
       0x0000 d370 503d 86d7 7087 5db9 08cd 0c69 61a7 f15b 4af6 bd50 abe4 8029 bc0f d0df b315 3458  |.pP=..p.]....ia..[J..P...)......4X|
<snip>
       0x0fce d331 4c86 a0b6 ad2d 866f 94ea cf90 c14d 6b2f 8cad 2785 c832 99a4 b38b ebae a9dc 8af4  |.1L....-.o.....Mk/..'..2..........|
       0x0ff0 dd30 dcff 8bbd e1ef 5e4e 6013 92e1 383e                                               |.0......^N`...8>|

 56198 openssl  RET   write 4096/0x1000
 56198 openssl  CALL  read(0,0x801630677,0x1)
 56198 openssl  GIO   fd 0 read 1 byte
       "\0"
 56198 openssl  RET   read 1

Timing
First read after a write (line 18795)

56198 openssl 0.106893 CALL read(0,0x801630677,0x1)

First write (line 51562)

56198 openssl 0.296872 CALL write(0x3,0x801a46000,0x1000)

That's 0.189979 seconds for reading 8192 bytes, writing the 8k takes (0.106793 - 0.106883) 0.00009 seconds

Comparing to using -in

ktrace /usr/local/bin/openssl enc -e -aes-256-cbc -pass pass:fred -out /dev/null -in test.zero

The ktrace overhead is noticable but still completes in seconds

 56535 openssl  0.006865 CALL  read(0x3,0x801aadc80,0x20000)
 56535 openssl  0.006997 GIO   fd 3 read 4096 bytes
       "\0\0\0\0
<snip>
       0\0\0\0"
 56535 openssl  0.007017 RET   read 131072/0x20000
 56535 openssl  0.007038 CALL  write(0x4,0x801a46000,0x1000)
 56535 openssl  0.007054 GIO   fd 4 wrote 4096 bytes
       0x0000 5361 6c74 6564 5f5f 5b9f 56b7 4f4d e319 ba01 d2d7 a055 b0f1 b040 e778 1b4d 1e57 7652 870c ee7a 879f d788 9706 d32f a4da 4447 c804 ab00  |Salted__[.V.OM.......U...@.x.M.WvR...z......./..DG....|
<snip>
 56535 openssl  0.007071 RET   write 4096/0x1000

Reading 4k is 0.099876
Writing 4k is 0.000033

Adding -bufsize 4096 on the comandline does not change the speed.

@levitte
Copy link
Member

levitte commented Apr 23, 2017

Ah, I think the issue is, exactly, here. What this does is to setbuf() stdin with a NULL buffer, which leads to exact reads instead of having stdio doing its normal buffering job, and therefore the effect that you observe.

I don't know why we unbuffer stdin... or at least unconditionally. If was only done conditionally (if -bufsize was specified) before commit 7e1b748, but quite frankly, I'm not sure why we unbuffered stdin before then either.

That's not the whole explanation, though. The full explanation lies here. As you may notice, we ask fread to read outl one byte items. Some (all?) fread implementations are done in such a way that size items till be read nmemb, and since we have set the size to 1 (because we're interested in knowing the amount of bytes actually read), it means one-byte reads. With a buffered stdio, this is not a problem, but when unbuffered... well, you've seen the results for yourself.

The solution would really be not to unbuffer stdin...

@Sp1l
Copy link
Contributor Author

Sp1l commented Apr 24, 2017

Hi Richard,

Thanks! Confirmed...

--- apps/apps.c.orig    2017-02-16 12:58:20.000000000 +0100
+++ apps/apps.c 2017-04-24 11:14:20.846079000 +0200
@@ -2427,7 +2427,7 @@ void unbuffer(FILE *fp)
 # pragma environment save
 # pragma message disable maylosedata2
 #endif
-    setbuf(fp, NULL);
+    setvbuf(fp, NULL, _IOFBF, (int)8192);
 #if defined(OPENSSL_SYS_VMS) && defined(__DECC)
 # pragma environment restore
 #endif

$ dd if=/dev/zero bs=8m count=64 | apps/openssl enc -e -aes-256-cbc -pass pass:fred -out /dev/null
64+0 records in
64+0 records out
536870912 bytes transferred in 1.587700 secs (338143885 bytes/sec)

Doesn't make sense to call this void unbuffer(FILE *fp). I can provide a pull request if you want, let me know how you would like to progress.

@levitte
Copy link
Member

levitte commented Apr 24, 2017

I'd say that we should simply skip the call in apps/enc.c. The -bufsize argument isn't used for the stdio buffer size anyway... unbuffer is used in other places where it does make sense (see reading a key from stdin, and the input as well).

levitte pushed a commit that referenced this issue Apr 25, 2017
 - unbuffer causes single-byte reads from stdin and poor performance

Fixes #3281
CLA: trivial

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from #3299)
(cherry picked from commit 65d6248)
pracj3am pushed a commit to cdn77/openssl that referenced this issue Aug 22, 2017
 - unbuffer causes single-byte reads from stdin and poor performance

Fixes openssl#3281
CLA: trivial

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from openssl#3299)
(cherry picked from commit 65d6248)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants