Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow single frame writes #62

Closed
nhooyr opened this issue Apr 21, 2019 · 13 comments
Closed

Allow single frame writes #62

nhooyr opened this issue Apr 21, 2019 · 13 comments
Milestone

Comments

@nhooyr
Copy link
Contributor

nhooyr commented Apr 21, 2019

See #57

Some websocket implementations cannot handle fragmented messages and the current API only allows writing fragmented messages because you write all your data first to a writer and then call close which writes a continuation fin frame.

It may also be good for performance.

I'm not going to expose an API for this right now, opening this issue to see how badly its wanted.

@kenshaw
Copy link

kenshaw commented Apr 22, 2019

I would greatly appreciate if this could be implemented, as we're benchmarking the different websocket implementations for use with the chromedp package. Unfortunately, I can't fix Chrome's implementation, so without this ability, it excludes evaluating this package. I understand the issue with the frame, but I haven't dug into this package's implementation. Would be glad to implement it, if given guidance. Thanks in advance!

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 22, 2019

If you want pure performance, you definitely want to go with gobwas/ws, this library will not beat it.

I do not want to compromise this library's API early on without knowing how widespread this issue is or using benchmarks to see the speed improvement.

You can use the https://github.com/nhooyr/websocket/tree/exp-single branch if you still want to use it.

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 25, 2019

In terms of performance, seems to be decently faster but will analyze if its worth it later.

+ go test --vet=off '--run=^$' -bench=. -cpuprofile=profs/cpu -memprofile=profs/mem -blockprofile=profs/block -mutexprofile=profs/mutex .
goos: linux
goarch: amd64
pkg: nhooyr.io/websocket
BenchmarkConn/stream/32-4      	   20000	     64684 ns/op	   0.49 MB/s
BenchmarkConn/stream/128-4     	   20000	     70596 ns/op	   1.81 MB/s
BenchmarkConn/stream/512-4     	   20000	     68876 ns/op	   7.43 MB/s
BenchmarkConn/stream/1024-4    	   20000	     72584 ns/op	  14.11 MB/s
BenchmarkConn/stream/4096-4    	   10000	    107517 ns/op	  38.10 MB/s
BenchmarkConn/stream/16384-4   	   10000	    110108 ns/op	 148.80 MB/s
BenchmarkConn/stream/65536-4   	   10000	    196606 ns/op	 333.34 MB/s
BenchmarkConn/stream/131072-4  	    5000	    300750 ns/op	 435.82 MB/s
BenchmarkConn/buffered/32-4    	   10000	    114083 ns/op	   0.28 MB/s
BenchmarkConn/buffered/128-4   	   20000	     94734 ns/op	   1.35 MB/s
BenchmarkConn/buffered/512-4   	   10000	    117539 ns/op	   4.36 MB/s
BenchmarkConn/buffered/1024-4  	   20000	     97250 ns/op	  10.53 MB/s
BenchmarkConn/buffered/4096-4  	   10000	    109905 ns/op	  37.27 MB/s
BenchmarkConn/buffered/16384-4 	   10000	    125022 ns/op	 131.05 MB/s
BenchmarkConn/buffered/65536-4 	   10000	    163505 ns/op	 400.82 MB/s
BenchmarkConn/buffered/131072-4         	   10000	    213125 ns/op	 615.00 MB/s
PASS
ok  	nhooyr.io/websocket	28.581s

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 26, 2019

It's very confusing how at 4096 bytes, the buffered implementation is actually several thousand nanoseconds slower than the stream.

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 27, 2019

Was just my MacBook CPU throttling lol.

Here are the results on a GCP VM:

nhooyr@anmol:~/go/src/nhooyr.io/websocket$ go test -bench=. -run=xxx
+ go test -bench=. -run=xxx
go: downloading github.com/google/go-cmp v0.2.0
go: downloading golang.org/x/time v0.0.0-20190308202827-9d24e82272b4
go: extracting github.com/google/go-cmp v0.2.0
go: extracting golang.org/x/time v0.0.0-20190308202827-9d24e82272b4
goos: linux
goarch: amd64
pkg: nhooyr.io/websocket
BenchmarkConn/buffered/32-2      	  100000	     22014 ns/op	   1.45 MB/s
BenchmarkConn/buffered/128-2     	  100000	     22281 ns/op	   5.74 MB/s
BenchmarkConn/buffered/512-2     	  100000	     22602 ns/op	  22.65 MB/s
BenchmarkConn/buffered/1024-2    	  100000	     22659 ns/op	  45.19 MB/s
BenchmarkConn/buffered/4096-2    	   30000	     45452 ns/op	  90.12 MB/s
BenchmarkConn/buffered/16384-2   	   30000	     51056 ns/op	 320.90 MB/s
BenchmarkConn/buffered/65536-2   	   20000	     81324 ns/op	 805.86 MB/s
BenchmarkConn/buffered/131072-2  	   20000	     94991 ns/op	1379.82 MB/s
BenchmarkConn/stream/32-2        	   50000	     36525 ns/op	   0.88 MB/s
BenchmarkConn/stream/128-2       	   50000	     37903 ns/op	   3.38 MB/s
BenchmarkConn/stream/512-2       	   50000	     35850 ns/op	  14.28 MB/s
BenchmarkConn/stream/1024-2      	   50000	     36173 ns/op	  28.31 MB/s
BenchmarkConn/stream/4096-2      	   30000	     48555 ns/op	  84.36 MB/s
BenchmarkConn/stream/16384-2     	   20000	     62526 ns/op	 262.03 MB/s
BenchmarkConn/stream/65536-2     	   10000	    102465 ns/op	 639.59 MB/s
BenchmarkConn/stream/131072-2    	   10000	    169573 ns/op	 772.95 MB/s
PASS
ok  	nhooyr.io/websocket	34.457s

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 27, 2019

Given the significant difference for even 32 byte messages, at 65%, I think it makes sense to offer this API. Will expose it tomorrow.

@nhooyr nhooyr added this to the v0.2.0 milestone Apr 27, 2019
@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 27, 2019

I am confused as to why though, there shouldn't be such a huge difference at such a small byte size 🤔

@nhooyr nhooyr modified the milestones: v0.2.0, v0.3.0 Apr 27, 2019
@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 27, 2019

Ok my benchmarks sucked. Here are better ones:

nhooyr@anmol:~/websocket$ go test -bench=BenchmarkConn -run=xxx
goos: linux
goarch: amd64
pkg: nhooyr.io/websocket
BenchmarkConn/buffered/32-2      	  300000	      7823 ns/op	   4.09 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/128-2     	  200000	      8323 ns/op	  15.38 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/512-2     	  200000	      8339 ns/op	  61.39 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/1024-2    	  200000	     10306 ns/op	  99.35 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/4096-2    	  100000	     16132 ns/op	 253.89 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/16384-2   	  100000	     18691 ns/op	 876.55 MB/s	      64 B/op	       2 allocs/op
BenchmarkConn/buffered/65536-2   	   20000	     64167 ns/op	1021.32 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/buffered/131072-2  	   10000	    126063 ns/op	1039.73 MB/s	      64 B/op	       3 allocs/op
BenchmarkConn/stream/32-2        	  200000	      9874 ns/op	   3.24 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/128-2       	  200000	     10530 ns/op	  12.16 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/512-2       	  100000	     11944 ns/op	  42.86 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/1024-2      	  200000	     12139 ns/op	  84.36 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/4096-2      	  100000	     18846 ns/op	 217.34 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/16384-2     	   50000	     23377 ns/op	 700.84 MB/s	     128 B/op	       6 allocs/op
BenchmarkConn/stream/65536-2     	   20000	     66540 ns/op	 984.90 MB/s	     128 B/op	       5 allocs/op
BenchmarkConn/stream/131072-2    	   10000	    129499 ns/op	1012.15 MB/s	     128 B/op	       6 allocs/op
PASS
ok  	nhooyr.io/websocket	30.153s

So looks like at larger message sizes the difference is negligible but at smaller sizes, its very significant because each Writer/Reader call has to allocate the reader/writer because they're being put inside an interface value. Thats why the stream is allocating so much more every op.

Its still not a large enough difference to warrant this coming in for performance reasons. 2000 nanoseconds isn't that much. The allocation overhead might matter but I'd like to wait to see other people's thoughts.

@nhooyr
Copy link
Contributor Author

nhooyr commented Apr 27, 2019

I've also updated https://github.com/nhooyr/websocket/tree/exp-single

@kenshaw
Copy link

kenshaw commented Apr 28, 2019

@nhooyr cool, appreciate the effort, even if it's not something you plan on putting in the public API! We'll benchmark internally with chromedp and see how it works against the gorilla and gobwas implementations.

@nhooyr
Copy link
Contributor Author

nhooyr commented May 30, 2019

I've decided against bringing this into the library. It's an unfortunate situation with chrome but I believe it's an outlier. For almost every single language, there is a robust WebSocket library available. See https://github.com/facundofarias/awesome-websockets or https://github.com/crossbario/autobahn-testsuite#users

A cursory google search of websocket fragmentation not supported only brings up chrome.

So for now, I would recommend you use gorilla or gobwas.

If someone else runs into another stack in the wild that also does not support fragmentation or has issues with the performance, I'll reconsider this.

@nhooyr nhooyr closed this as completed May 30, 2019
nhooyr added a commit that referenced this issue May 30, 2019
- Closes #1 (Ping API)
- Closes #62 (Read/Write convienence methods)
- Closes #83 (SetReadLimit)
nhooyr added a commit that referenced this issue May 30, 2019
- Closes #1 (Ping API)
- Closes #62 (Read/Write convienence methods)
- Closes #83 (SetReadLimit)
@nhooyr
Copy link
Contributor Author

nhooyr commented May 30, 2019

Argh, so I thought about this some more and I think its a good idea to include a Write and Read method anyway, not because of chrome but to reduce ceremony when using the library with a protocol that's not JSON/Protobufs.

If you're just experimenting, it's frustrating to have to create a writer or a reader to just write or read a simple msg.

The API is very minimal regardless so I think the trade off for the convenience is worth it as the godoc still reads well.

@nhooyr nhooyr reopened this May 30, 2019
@kenshaw
Copy link

kenshaw commented May 30, 2019

OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants