Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i thought encoding/json is slow #2

Closed
ReiiSky opened this issue Oct 31, 2020 · 36 comments
Closed

i thought encoding/json is slow #2

ReiiSky opened this issue Oct 31, 2020 · 36 comments

Comments

@ReiiSky
Copy link

ReiiSky commented Oct 31, 2020

Woud you mind to change default JSON codec to json-iter and check the performance once again?

sorry for bad english

@lesismal
Copy link
Owner

lesismal commented Oct 31, 2020

Woud you mind to change default JSON codec to json-iter and check the performance once again?

sorry for bad english

I do agree with you that encoding/json is slow, but it means less dependency. Then, on some embedded devices with limited hardware, the smaller size the better it is.
Till now arpc imports stdlib and self-sub-package only, so I even do not use go mod.
Many features like network-protocol, middleware can be customized. and it's easy for users to custom codec using json-iter as below:

  • server
package main

import (
	"log"

	"github.com/json-iterator/go"
	"github.com/lesismal/arpc"
)

func main() {
	server := arpc.NewServer()
	server.Codec = jsoniter.ConfigCompatibleWithStandardLibrary

	// register router
	server.Handler.Handle("/echo", func(ctx *arpc.Context) {
		str := ""
		err := ctx.Bind(&str)
		ctx.Write(str)
		log.Printf("/echo: \"%v\", error: %v", str, err)
	})

	server.Run(":8888")
}
  • client
package main

import (
	"log"
	"net"
	"time"

	"github.com/json-iterator/go"
	"github.com/lesismal/arpc"
)

func main() {
	client, err := arpc.NewClient(func() (net.Conn, error) {
		return net.DialTimeout("tcp", "localhost:8888", time.Second*3)
	})
	if err != nil {
		panic(err)
	}
	client.Codec = jsoniter.ConfigCompatibleWithStandardLibrary
	defer client.Stop()

	req := "hello"
	rsp := ""
	err = client.Call("/echo", &req, &rsp, time.Second*5)
	if err != nil {
		log.Fatalf("Call /echo failed: %v", err)
	} else {
		log.Printf("Call /echo Response: \"%v\"", rsp)
	}
}

Wish U like it and sorry for my bad English either.

@lesismal
Copy link
Owner

lesismal commented Oct 31, 2020

I have committed some benchmark here to compare with other popular rpc-frameworks, using protobuffer as others did.

I tried to compare encoding/json and json-iter by this arpc-benchmark code on my computer and get results as:

  • encoding/json
2020/11/01 00:19:46 arpc_client.go:39: �[32mINFO �[0m: concurrency: 100
requests per client: 2000

2020/11/01 00:19:46 arpc_client.go:47: �[32mINFO �[0m: message size: 581 bytes

2020/11/01 00:20:05 stats.go:15: �[32mINFO �[0m: took 19000 ms for 200000 requests
2020/11/01 00:20:05 stats.go:36: �[32mINFO �[0m: sent     requests    : 200000
2020/11/01 00:20:05 stats.go:37: �[32mINFO �[0m: received requests    : 200000
2020/11/01 00:20:05 stats.go:38: �[32mINFO �[0m: received requests_OK : 200000
2020/11/01 00:20:05 stats.go:42: �[32mINFO �[0m: throughput  (TPS)    : 10526

2020/11/01 00:20:05 stats.go:45: �[32mINFO �[0m: mean: 471201 ns, median: 0 ns, max: 27998100 ns, min: 0 ns, p99.5: 11003000 ns
2020/11/01 00:20:05 stats.go:46: �[32mINFO �[0m: mean: 0 ms, median: 0 ms, max: 27 ms, min: 0 ms, p99.5: 11 ms
  • json-iter, nearly 2x faster than encoding/json
requests per client: 2000

2020/11/01 00:13:05 arpc_client.go:47: �[32mINFO �[0m: message size: 581 bytes

2020/11/01 00:13:24 stats.go:15: �[32mINFO �[0m: took 19000 ms for 200000 requests
2020/11/01 00:13:24 stats.go:36: �[32mINFO �[0m: sent     requests    : 200000
2020/11/01 00:13:24 stats.go:37: �[32mINFO �[0m: received requests    : 200000
2020/11/01 00:13:24 stats.go:38: �[32mINFO �[0m: received requests_OK : 200000
2020/11/01 00:13:24 stats.go:42: �[32mINFO �[0m: throughput  (TPS)    : 10526

2020/11/01 00:13:24 stats.go:45: �[32mINFO �[0m: mean: 242450 ns, median: 0 ns, max: 19001900 ns, min: 0 ns, p99.5: 7999100 ns
2020/11/01 00:13:24 stats.go:46: �[32mINFO �[0m: mean: 0 ms, median: 0 ms, max: 19 ms, min: 0 ms, p99.5: 7 ms

Wish that would help!

@ReiiSky
Copy link
Author

ReiiSky commented Oct 31, 2020

I see, thank you for your time. I enjoying your work

@ReiiSky ReiiSky closed this as completed Oct 31, 2020
@lesismal
Copy link
Owner

lesismal commented Nov 1, 2020

I see, thank you for your time. I enjoying your work

You are so welcome and thank you for your feedback! 😋😋😋

@lesismal
Copy link
Owner

Hi dear, I found that you starred achan, thank you very much!
I have already move it into arpc/extension/pubsub, here is example, you may use it instead.
I recently support some new feature extensions and examples such as service registry and discovery, opentracing, and well, in the extension dir, not a must dependency of arpc, so still not use go mod, wish you like them.

@ReiiSky
Copy link
Author

ReiiSky commented Dec 16, 2020

@lesismal well your project seems related with my service. by the way i use arpc and i found many errors. looks like some directory get moved. so i should adapt my project with current change, that's okay, just a little change haha. thank you

@ReiiSky
Copy link
Author

ReiiSky commented Dec 16, 2020

well if you want to have roadmap, i would suggest some feature like horizontal scalability support for websocket or pubsub. it would be awesome, thank you.

@ReiiSky
Copy link
Author

ReiiSky commented Dec 16, 2020

you may like this benchmark ws tho

@lesismal
Copy link
Owner

@lesismal well your project seems related with my service. by the way i use arpc and i found many errors. looks like some directory get moved. so i should adapt my project with current change, that's okay, just a little change haha. thank you

yes I changed dir structure of some sub package, sorry about that. you may use go mod in your project and v1.1.0 is preferred till this moment.

@lesismal
Copy link
Owner

well if you want to have roadmap, i would suggest some feature like horizontal scalability support for websocket or pubsub. it would be awesome, thank you.

client pool or service registry and discovery may meet your needs of horizontal scalability. for websocket, just use websocket's Listener for arpc.Server.Serve, and websocket's Conn for arpc.Client, like websocket example

@lesismal
Copy link
Owner

you may like this benchmark ws tho

I saw blogs about this package but didn't read the codes before.
As far as I know, goroutines are not native threads, and when epoll is used in the goroutines, the thread and cpu affinity of the epoll goroutines cannot be guaranteed, which sometimes causes cpu spikes.
I took a look at this package just now, but not sure is there Head-of-line blocking" or other problems, I'll check it more and do some test. we may implement a new arpc-websocket extension using this epoll-websocket if possible.

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

you may like this benchmark ws tho

try to use a tcp tunnel. when the tunnel receives a piece of client's data, it is sent to the server in two times, and sleeps for 1 second in the middle to simulate tcp sticky packets. and I found that it cost 1 second to wsutil.ReadClientData(conn), so, there is "Head-of-line blocking" problem int the read loop.
this epoll-websocket does not set the fd to a nonblocking-mod, so if one conn is read-blocking, all the other conns would be blocked. so, we should not use this epoll-websocket in a high-concurrency business.

refer to the following for the test code

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

replace 1m-go-websockets's server.go with this server.go , see step 1/step 2/

package main

import (
	"github.com/gobwas/ws"
	"github.com/gobwas/ws/wsutil"
	"log"
	"net/http"
	_ "net/http/pprof"
	"syscall"
	"time"
)

var epoller *epoll

func wsHandler(w http.ResponseWriter, r *http.Request) {
	// Upgrade connection
	conn, _, _, err := ws.UpgradeHTTP(r, w)
	if err != nil {
		return
	}
	if err := epoller.Add(conn); err != nil {
		log.Printf("Failed to add connection %v", err)
		conn.Close()
	}
}

func main() {
	// Increase resources limitations
	var rLimit syscall.Rlimit
	if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
		panic(err)
	}
	rLimit.Cur = rLimit.Max
	if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
		panic(err)
	}

	// Enable pprof hooks
	go func() {
		if err := http.ListenAndServe("localhost:6060", nil); err != nil {
			log.Fatalf("pprof failed: %v", err)
		}
	}()

	// Start epoll
	var err error
	epoller, err = MkEpoll()
	if err != nil {
		panic(err)
	}

	go Start()

	http.HandleFunc("/", wsHandler)
	if err := http.ListenAndServe("0.0.0.0:8000", nil); err != nil {
		log.Fatal(err)
	}
}

func Start() {
	for {
		connections, err := epoller.Wait()
		if err != nil {
			log.Printf("Failed to epoll wait %v", err)
			continue
		}
		for _, conn := range connections {
			if conn == nil {
				break
			}
			// step 1: record read start time
			t := time.Now()
			if msg, op, err := wsutil.ReadClientData(conn); err != nil {
				if err := epoller.Remove(conn); err != nil {
					log.Printf("Failed to remove %v", err)
				}
				conn.Close()
			} else {
				// step 2: log read time of whole packet
				log.Printf("wsutil.ReadClientData time used: %v\n", time.Since(t).Seconds())

				// set arpc response cmd, not importent
				msg[5] = 2
				err = wsutil.WriteServerMessage(conn, op, msg)
				if err != nil {
					log.Printf("write error: %v\n", err)
				}
			}
		}
	}
}

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

run this client.go , see step 1/step 2/step 3

package main

import (
	"fmt"
	"log"
	"net"
	"time"

	"github.com/lesismal/arpc"
	"github.com/lesismal/arpc/extension/protocol/websocket"
)

func newConnTunnel(clientConn *net.TCPConn, serverAddr string) {
	serverConn, dailErr := net.Dial("tcp", serverAddr)
	fmt.Println("111 Dial:", dailErr)

	if dailErr == nil {
		c2sCor := func() {
			defer func() {
				_ = recover()
			}()

			var nread int
			var nwrite int
			var err error
			var buf = make([]byte, 1024)
			for {
				nread, err = clientConn.Read(buf)
				if err != nil {
					fmt.Println("clientConn.Read: ", err)
					clientConn.Close()
					serverConn.Close()
					break
				}

				{
					// step 1: write half
					nwrite, err = serverConn.Write(buf[:nread/2])
					if nwrite != nread/2 || err != nil {
						fmt.Println("serverConn.Write 111: ", nread, nwrite, err)
						clientConn.Close()
						serverConn.Close()
						break
					}

					// step 2: sleep
					time.Sleep(time.Second)

					// step 3: write another half
					nwrite, err = serverConn.Write(buf[nread/2 : nread])
					if nwrite != nread-nread/2 || err != nil {
						fmt.Println("serverConn.Write 222: ", nread, nwrite, err)
						clientConn.Close()
						serverConn.Close()
						break
					}
				}
			}
		}

		s2cCor := func() {
			defer func() {
				_ = recover()
			}()

			var nread int
			var nwrite int
			var err error
			var buf = make([]byte, 1024)
			for {
				nread, err = serverConn.Read(buf)
				if err != nil {
					fmt.Println("serverConn.Read: ", err)
					clientConn.Close()
					serverConn.Close()
					break
				}

				nwrite, err = clientConn.Write(buf[:nread])
				if nwrite != nread || err != nil {
					fmt.Println("clientConn.Write: ", err)
					clientConn.Close()
					serverConn.Close()
					break
				}
			}
		}

		go c2sCor()
		go s2cCor()
	} else {
		clientConn.Close()
	}
}

func runTunnel(listenAddr string, serverAddr string) {
	tcpAddr, err := net.ResolveTCPAddr("tcp4", listenAddr)
	if err != nil {
		fmt.Println("ResolveTCPAddr Error: ", err)
		return
	}

	listener, err2 := net.ListenTCP("tcp", tcpAddr)
	if err2 != nil {
		fmt.Println("ListenTCP Error: ", err2)
		return
	}

	defer listener.Close()

	fmt.Println(fmt.Sprintf("Agent Start Running on: Agent(%s) -> Server(%s)!", listenAddr, serverAddr))
	for {
		conn, err := listener.AcceptTCP()

		if err != nil {
			fmt.Println("AcceptTCP Error: ", err2)
		} else {
			go newConnTunnel(conn, serverAddr)
		}
	}
}

func main() {
	go runTunnel(":8001", "127.0.0.1:8000")

	time.Sleep(time.Second / 10)

	client, err := arpc.NewClient(func() (net.Conn, error) {
		return websocket.Dial("ws://localhost:8001/ws")
	})
	if err != nil {
		panic(err)
	}
	defer client.Stop()

	req := "hello"
	rsp := ""
	err = client.Call("/call/echo", &req, &rsp, time.Second*500)
	if err != nil {
		log.Fatalf("Call failed: %v", err)
	} else {
		log.Printf("Call Response: \"%v\"", rsp)
	}
}

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

you may like this benchmark ws tho

Usually when we use epoll-event-loop-mod in go, we should set fd to non-blocking, otherwise it will be difficult to deal with the problem of Head-of-Link-Blocking if there are many connections. But in go, doing asynchronous parsing of websocket, tls, etc. is a huge project, which requires experts who are very familiar with the protocol. Before that, I would not use this epoll-event-loop-mod to deal with related services such as http/tls.
Customed long-connection protocols such as games can be considered, but the problem of cpu spikes should also be considered.
In addition, high-concurrency services and better horizontal expansion capabilities are sufficient to provide online levels of 1m or higher, so there is no need to pursue a single-machine ultra-high capacity.

@ReiiSky
Copy link
Author

ReiiSky commented Dec 17, 2020

wow, thanks for a lot of effort to explain to me about how it's works. I think this repo will suit you to reduce memory allocation.

run this client.go , see step 1/step 2/step 3

i've never use this before, but i'll try it

@ReiiSky
Copy link
Author

ReiiSky commented Dec 17, 2020

new insight for me haha, since i rarely touched low level network.
well, i hope your framework get improved since i would use this for my project.
maybe i will bring some pull request later.
thank you.

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

wow, thanks for a lot of effort to explain to me about how it's works. I think this repo will suit you to reduce memory allocation.

run this client.go , see step 1/step 2/step 3

i've never use this before, but i'll try it

bytebufferpool is a good repo, and my respect to most of the repos of the author such as fasthttp!

I had considered using sync.Pool before, but did not use it for the following reasons:

  1. golang's GC perform better after 1.8, and the effect of using sync.Pool in regular business scenarios may not be obvious.
  2. for arpc.Client: usually there are not too many arpc.Clients, 10k for example, is not too large, and the clients are keep in an active state most of the time.
  3. for arpc.Context: because it supports to Write asynchronously , it is difficult to determine the point in time when the life cycle ends. If you use sync.Pool for management, we need to provide other methods for users to call, such as Retain/Release, which will increase the complexity of the user layer.
  4. for arpc.rpcSession: it is complicated during use. the rpcSession is usually kept by the client side during a Call, no pressure on the server-side, and when the network status is normal, the number will not be too large.
  5. for arpc.Message: it usually cost the most memory. In the early days, sync.Pool was considered for memory management of arpc.Message, but was abandoned due to more flexible and convenient function support. the general reasons as below:
  1. arpc's message-sending does not like other frameworks which directly call the net.Conn.Write method, in that way Mutex competes may be obvious in high concurrent calls and leads to a poor performance. there's another goroutine handle the message-sending for arpc, and support writev/batch-write to reduce the number of syscall-calling.
  1. arpc supports broadcast. it's not the end of the life cycle when one message is written to one Conn, but should wait for it to be written to all the Conns successfully or failed. this requires reference count or distinguishing the broadcast message type or other feature support to use sync.Pool.
  1. arpc supports Message's encoding middleware, which processing a Message to make another new one. If sync.Pool is used, the user layer will also need to extend the encoding middleware with Pool's operation.

Each of the reasons above and may be some of other reasons which I do not remember clearly, would cost additional code complexity, and in normal business scenarios, the performance loss of these additional costs would be greater than the benefit of using sync.Pool.

I tried the pool version at this branch in the early days, but I finally gave it up.

😄 😄

@lesismal
Copy link
Owner

new insight for me haha, since i rarely touched low level network.
well, i hope your framework get improved since i would use this for my project.
maybe i will bring some pull request later.
thank you.

you are welcome, and I am looking forward to your pr ! 🥰🥰

@lesismal
Copy link
Owner

lesismal commented Dec 17, 2020

wow, thanks for a lot of effort to explain to me about how it's works. I think this repo will suit you to reduce memory allocation.

run this client.go , see step 1/step 2/step 3

i've never use this before, but i'll try it

BTW, the performance of this function of bytebufferpool can be optimized like here

@lesismal
Copy link
Owner

sorry for my poor english again 😂😂

@ReiiSky
Copy link
Author

ReiiSky commented Dec 17, 2020

I tried the pool version at this branch in the early days, but I finally gave it up.

ahh i see, another impressive explanation.

@ReiiSky
Copy link
Author

ReiiSky commented Dec 17, 2020

BTW, the performance of this function of bytebufferpool can be optimized like here

i wouldn't notice it before, i've been trying this pool to replace wsutil.ReadClientData(conn) function that causes allocate byte per incoming data. but i never benchmark this up yet.

@ReiiSky
Copy link
Author

ReiiSky commented Dec 17, 2020

sorry for my poor english again

nooo, that's okay i understand your intent. You don't need perfect english to talk to me. Thankyou @lesismal

@lesismal
Copy link
Owner

sorry for my poor english again

nooo, that's okay i understand your intent. You don't need perfect english to talk to me. Thankyou @lesismal

happy to chat with you !

@lesismal
Copy link
Owner

you may like this benchmark ws tho

Hi dear, I write another non-blocking framework, support for the real 1m-websocket-connections has been basically finished, you may check the examples here:
https://github.com/lesismal/nbio/tree/master/examples/websocket_1m

I will do more tests and improve the details 😄😄

@ReiiSky
Copy link
Author

ReiiSky commented Apr 29, 2021

you may like this benchmark ws tho

Hi dear, I write another non-blocking framework, support for the real 1m-websocket-connections has been basically finished, you may check the examples here:
https://github.com/lesismal/nbio/tree/master/examples/websocket_1m

I will do more tests and improve the details

sorry for late reply, i'll take a look soon. thank you

@lesismal
Copy link
Owner

lesismal commented Apr 29, 2021

I checked the example of gobwas and opened issues/18, there's the same problem as https://github.com/eranyanay/1m-go-websockets 😅😅

nbio is the real 1m-connections-http/ws solution.

@ReiiSky
Copy link
Author

ReiiSky commented Apr 29, 2021

nbio is the real 1m-connections-http/ws solution.

may i know the test results of both frameworks (if any) like memory and run-time?

@lesismal
Copy link
Owner

lesismal commented Apr 30, 2021

nbio is the real 1m-connections-http/ws solution.

may i know the test results of both frameworks (if any) like memory and run-time?

For gobwas, because of its problem of service-not-available, there's no sense of the testing result.

For nbio, on my 4c8t VM, the server cost about 270m mem and 180% cpu(or about 1.8 cpu load) for 100k conns and 50-100k qps.
Different software and hardware environments and test code leads to different test results, so, please check the examples here: the server listens on multiple ports to receive more than 65535 conns from the localhost. If you want to test 1m clients on multiple hardware, virtual networks or multiple docker node environments and connect to a single port of the server, please build your own environment and custome the code.

@ReiiSky
Copy link
Author

ReiiSky commented May 2, 2021

nbio is the real 1m-connections-http/ws solution.

may i know the test results of both frameworks (if any) like memory and run-time?

For gobwas, because of its problem of service-not-available, there's no sense of the testing result.

For nbio, on my 4c8t VM, the server cost about 270m mem and 180% cpu(or about 1.8 cpu load) for 100k conns and 50-100k qps.
Different software and hardware environments and test code leads to different test results, so, please check the examples here: the server listens on multiple ports to receive more than 65535 conns from the localhost. If you want to test 1m clients on multiple hardware, virtual networks or multiple docker node environments and connect to a single port of the server, please build your own environment and custome the code.

ahh i see, that would be nice improvement. But if i implement it on my hobby-project, probably it's overkill. i'll use your idea when i need it in future. Thank you

@lesismal
Copy link
Owner

lesismal commented May 3, 2021

overkill

Ok.
When the number of connections is small, the performance of the poller framework is even worse than that of the std/fasthttp. It can only play an advantage when the number of connections is large. So most projects do not need to use the poller framework, but we should know what should not be used in commercial projects.

@ReiiSky
Copy link
Author

ReiiSky commented May 3, 2021

overkill

Ok.
When the number of connections is small, the performance of the poller framework is even worse than that of the std/fasthttp. It can only play an advantage when the number of connections is large. So most projects do not need to use the poller framework, but we should know what should not be used in commercial projects.

yep, i've seen example you attach before. But i don't have much time to read it completely, i still building my project based on arpc though. Thank youu

@lesismal
Copy link
Owner

lesismal commented May 3, 2021

overkill

Ok.
When the number of connections is small, the performance of the poller framework is even worse than that of the std/fasthttp. It can only play an advantage when the number of connections is large. So most projects do not need to use the poller framework, but we should know what should not be used in commercial projects.

yep, i've seen example you attach before. But i don't have much time to read it completely, i still building my project based on arpc though. Thank youu

that's ok!
actually, I am not trying to encourage you to use nbio, but should use frameworks based on std lib.
I just saw that you have paid attention to 1m-go-ws and gobwas or some other high-concurrency open source libraries, worry that if you use gobwas' netpoll solution or other frameworks that have basic problems would lead you to mistakes.
enjoy your work 😄

@ReiiSky
Copy link
Author

ReiiSky commented May 22, 2021

can i ask something, what makes you interested about websocket-like project is it related to your job? or just a hobby?
cause recently i creating some aws-like service. maybe your job or hobby can be related to mine.

@lesismal
Copy link
Owner

lesismal commented May 23, 2021

can i ask something, what makes you interested about websocket-like project is it related to your job? or just a hobby?

it is both my job and my hobby.

most of my projects need to use tcp/http/websocket based protocols, sometimes udp, and I always handle framework-like things.

golang is better than any other languages that I used before in server-side programming, I have been loved it since the first line.

the traditional web frameworks are http/rpc like which is not effective for network communications, e.g. arpc's performance, server Notify, more transport-layer-protocols support. so I write arpc.

there's a problem with golang that it costs at least one goroutine for each connection due to lack of poller support for the application layer. it would cost a lot for memory and runtime schedule in high concurrency services. in order to reduce the num of goroutines then reduce memory usage and schedule, I write nbio.

I am interested that what kind of service you are working on, welcome to communicate about it if you like 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants