Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issue in home-message-list (obj not instance of ByteArrayOutputStream) #2

Closed
brdloush opened this issue Jun 13, 2021 · 28 comments

Comments

@brdloush
Copy link

Hi,

first let me say that this is a very interesting example project! Looking forward to see what else is possible with bb and scittle.

I tried running ab benchmark to see what performance characteristics project like this can offer when it comes to API response rate and found out that it actually returns quite a lot of non-200 responses. They all seem to be caused by object is not an instance of java.io.ByteArrayOutputStream error.

Steps to reproduce:

  1. clone the fresh copy of this repo

  2. insert some sample message

curl 'http://localhost:8083/message' \
  -H 'Accept: application/transit+json' \
  -H 'Content-Type: application/transit+json' \
  --data-raw '["^ ","~:name","sample name","~:message","sample message"]'
  1. check that the message is there
➜  babashka-scittle-guestbook git:(main) ✗ curl 'http://localhost:8083/messages' \
  -H 'Accept: application/transit+json'  
["^ ","~:messages",[["^ ","~:name","sample name","~:message","sample message","~:timestamp","~m1623573268846"]]]%   
  1. perform apache benchmark
➜  babashka-scittle-guestbook git:(main) ✗ ab -c 8 -n 10000 http://localhost:8083/messages
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        http-kit
Server Hostname:        localhost
Server Port:            8083

Document Path:          /messages
Document Length:        112 bytes

Concurrency Level:      8
Time taken for tests:   0.714 seconds
Complete requests:      10000
Failed requests:        19
   (Connect: 0, Receive: 0, Length: 19, Exceptions: 0)
Non-2xx responses:      19
Total transferred:      2468437 bytes
HTML transferred:       1118855 bytes
Requests per second:    14005.39 [#/sec] (mean)
Time per request:       0.571 [ms] (mean)
Time per request:       0.071 [ms] (mean, across all concurrent requests)
Transfer rate:          3376.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     0    0   0.2      0       5
Waiting:        0    0   0.2      0       5
Total:          0    1   0.2      1       5

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%      5 (longest request)

There's following error written in server output

Sun Jun 13 10:35:49 CEST 2021 [worker-4] ERROR - GET /messages
clojure.lang.ExceptionInfo: object is not an instance of java.io.ByteArrayOutputStream {:type :sci/error, :line 40, :column 3, :message "object is not an instance of java.io.ByteArrayOutputStream", :sci.impl/callstack #object[clojure.lang.Delay 0x43fcb833 {:status :pending, :val nil}], :file nil, :locals {_ {:remote-addr "0:0:0:0:0:0:0:1", :headers {"accept" "*/*", "host" "localhost:8083", "user-agent" "ApacheBench/2.3"}, :async-channel #object[org.httpkit.server.AsyncChannel 0x5ef3203c "0.0.0.0/0.0.0.0:8083<->null"], :server-port 8083, :content-length 0, :websocket? false, :content-type nil, :character-encoding "utf8", :uri "/messages", :server-name "localhost", :query-string nil, :body nil, :scheme :http, :request-method :get}, out #object[java.io.ByteArrayOutputStream 0x1d000a48 "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573268846\"]]]"], writer #object[cognitect.transit.Writer 0x3b95124c "cognitect.transit.Writer@3b95124c"]}}
	at sci.impl.utils$rethrow_with_location_of_node.invokeStatic(utils.cljc:115)
	at sci.impl.evaluator$eval_call.invokeStatic(evaluator.cljc:370)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:406)
	at sci.impl.evaluator$eval.invoke(evaluator.cljc:386)
	at sci.impl.evaluator$eval_let.invokeStatic(evaluator.cljc:79)
	at sci.impl.analyzer$expand_let_STAR_$fn__18308.invoke(analyzer.cljc:338)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:392)
	at sci.impl.evaluator$eval.invoke(evaluator.cljc:386)
	at sci.impl.fns$fun$arity_1__16647.invoke(fns.cljc:141)
	at sci.impl.vars.SciVar.invoke(vars.cljc:321)
	at sci.impl.analyzer$return_call$fn__18582.invoke(analyzer.cljc:811)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:392)
	at sci.impl.analyzer$return_call$fn__18592.invoke(analyzer.cljc:811)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:392)
	at sci.impl.analyzer$analyze_map$fn__18641.invoke(analyzer.cljc:981)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:392)
	at sci.impl.evaluator$eval.invoke(evaluator.cljc:386)
	at sci.impl.evaluator$eval_case.invokeStatic(evaluator.cljc:122)
	at sci.impl.evaluator$eval_special_call.invokeStatic(evaluator.cljc:331)
	at sci.impl.evaluator$eval_call.invokeStatic(evaluator.cljc:362)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:406)
	at sci.impl.evaluator$eval.invoke(evaluator.cljc:386)
	at sci.impl.evaluator$eval_let.invokeStatic(evaluator.cljc:79)
	at sci.impl.analyzer$expand_let_STAR_$fn__18308.invoke(analyzer.cljc:338)
	at sci.impl.evaluator$eval.invokeStatic(evaluator.cljc:392)
	at sci.impl.evaluator$eval.invoke(evaluator.cljc:386)
	at sci.impl.fns$fun$arity_1__16647.invoke(fns.cljc:141)
	at org.httpkit.server.HttpHandler.run(RingHandler.java:117)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(Thread.java:834)
	at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519)
	at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192)
Caused by: java.lang.IllegalArgumentException: object is not an instance of java.io.ByteArrayOutputStream
	at java.lang.reflect.AccessibleObject.canAccess(AccessibleObject.java:463)
	at java.lang.invoke.MethodHandle.invokeBasic(MethodHandle.java:153)
	at java.lang.invoke.MethodHandle.invokeBasic(MethodHandle.java:0)
	at java.lang.invoke.Invokers$Holder.invoke_MT(Invokers$Holder)
	at sci.impl.Reflector.canAccess(Reflector.java:54)
	at sci.impl.Reflector.invokeMatchingMethod(Reflector.java:161)
	at sci.impl.interop$invoke_instance_method.invokeStatic(interop.cljc:41)
	at sci.impl.evaluator$eval_instance_method_invocation.invokeStatic(evaluator.cljc:192)
	at sci.impl.evaluator$eval_special_call.invokeStatic(evaluator.cljc:331)
	at sci.impl.evaluator$eval_call.invokeStatic(evaluator.cljc:362)
	... 33 more

When I use -c 1, the error doesn't show. With -c 2 and greater, it does. So it may be caused by some non-thread-safe code.

BTW I now also tried removing messages.txt completely (and restarted the server) and the error is still there.

There seems to be something non-thread-safe in home-message-list. I tried following expressions in babashka nrepl session bb --nrepl-server:

;; -----------------------------------------------------------------------------------
;; 1) single execution works:
(home-message-list nil)
;; => "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"


;; -----------------------------------------------------------------------------------
;; 2) a few concurrently-executed executions work
(->> (range 10)
     (pmap (fn [_]
             (home-message-list nil)))
     doall)
;; => ("[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]"
;;     "[\"^ \",\"~:messages\",[[\"^ \",\"~:name\",\"sample name\",\"~:message\",\"sample message\",\"~:timestamp\",\"~m1623573905657\"]]]")

;; -----------------------------------------------------------------------------------
;; 3) many few concurrently-executed executions fail
(->> (range 1000)
     (pmap (fn [_]
             (home-message-list nil)))
     doall)
;; => clojure.lang.ExceptionInfo: object is not an instance of java.io.ByteArrayOutputStream
@brdloush
Copy link
Author

I also tried adding a very naive locking with no success. Interestingly, it made things worse. Now many different errors show up:

(def home-message-lock "")

(defn home-message-list [_]
  (locking home-message-lock
    (let [out (ByteArrayOutputStream. 4096)
          writer (transit/writer out :json)]    
      (transit/write writer {:messages (db-get-messages)})
      (.toString out))))

(let [x (->> (range 10000)
             (pmap (fn [_]
                     (home-message-list nil)))
             doall)]
  nil)
;; => clojure.lang.ExceptionInfo: null object for public boolean java.io.File.exists()
;; => clojure.lang.ExceptionInfo: non-null object for public static java.lang.Object borkdude.graal.LockFix.lock(java.lang.Object,clojure.lang.IFn)
;; => clojure.lang.ExceptionInfo: null object for public synchronized java.lang.String java.io.ByteArrayOutputStream.toString()

Strange. Perhaps some non-thread-safe/mutable context under the hood or something? But if so, where? httpkit, babashka?

@kloimhardt
Copy link
Owner

@brdloush Thank you for your very interesting and detailed analysis. I cannot answer directly where the problem is, but:

This example project is an outflow of an old (year 2020, pre-scittle) project of mine. Not having made any analysis, it was clear to me that a richer back-end is desirable. So I made a fork called babashka-web, worked to get ring and reitit+muuntaja Graal compilable and made a back-end more in line with Luminus. My guess is that the result is thread-safe.
One caveat: The fork is old and only a MS-Windows executable is provided. There sure is some work involved to compile it for other platforms, let alone to extend it. In any case, I'd like to keep babashka-scittle-guestbook a two-file project for the off-the-shelf babashka, as it is now.

@brdloush
Copy link
Author

I just tried running the server in regular clojure environment instead of babashka. My previous example with (range 10000) works without any issues. Similarly, when I used your original guestbook.clj code (ie. without any additional locking etc) and executed ab benchmark against it, 100% of the requests succeeded - no matter which concurrency level was used.

So it seems that the problem only occurs when executed via babashka. What I find insteresting is that when I use babashka, pass :thread 8 to httpkit server and benchmark it via ab -c 8 -n 10000, I get 10 failed requests of 10000 and performance of 14k req/s (!).

On the other hand, when I start the server with regular clojure, same ab command results in only ~2.7k req/s. Which is pretty much the single-threaded performance. So perhaps the CLJ execution somehow performs a lock somewhere, while babashka doesn't?

Perhaps @borkdude might have some idea what might be the cause of this? 🙏 (btw I'm using babashka v0.4.5 on Ubuntu 21.04 x86_64).

@brdloush
Copy link
Author

btw @borkdude the same issue also happens in @holyjak's http server.

https://gist.github.com/holyjak/36c6284c047ffb7573e8a34399de27d8. ab -c 1 -n 10000 http://localhost:8090/ works fine, concurrency level of 2 and higher results in errors.

@kloimhardt
Copy link
Owner

@brdloush In the Readme, I added a caveat that the code is not useful for production.

@borkdude
Copy link

@brdloush Can you make a very minimal repro? Then I can take a closer look. Preferably you would also make a GraalVM-native project with only httpkit and without sci, to see if this gives similar issues or not. Feel free to make an issue at the babashka repo for this.

@borkdude
Copy link

@brdloush On my laptop with this example:

(require '[org.httpkit.server :as srv])

(srv/run-server (fn [_req] {:body "hello"}) {:port 8083})

@(promise)

and ab -c8 -n 10000 http://localhost:8083/ I get no failed requests and 823.76 request per second.

@borkdude
Copy link

Same with this guestbook actually, can't reproduce the issue. 0 request failed.

@borkdude
Copy link

borkdude commented Jun 13, 2021

With the file server I can reproduce the issue, I'm going to see if I can reproduce this somehow with a smaller example.

@brdloush
Copy link
Author

I'm starting to think that (.exists (io/file filename) might be causing the problem, but still investigating.

@borkdude
Copy link

The problem might be related to interop in general it looks like.

@borkdude
Copy link

It would be interesting to check if the problem also occurs with 0.4.4 since I made some changes related to interop in bb in 0.4.5.

@borkdude
Copy link

Minimal repro:

(require '[org.httpkit.server :as srv]
         '[babashka.fs :as fs])

(import '[java.net URLDecoder])

(srv/run-server (fn [{:keys [:uri]}]
                  (URLDecoder/decode uri)
                  (.relativize (fs/path ".") (fs/path "."))
                  {:body "hello"})
                {:port 8083})

@(promise)

Doing two interop on different kinds of objects in a function that is executed in parallel results in these kinds of errors. Very interesting.

@brdloush
Copy link
Author

Here's my minimal case, but the one you just posted seems even more minimal :-)

(require '[clojure.java.io :as io]
         '[org.httpkit.server :as srv])

(import 'java.io.ByteArrayOutputStream)

(def port 8083)

(defn core-http-server []
  (srv/run-server
   (fn [_req]
     {:headers {"Content-type" "application/transit+json"}
      :body (let [out (ByteArrayOutputStream. 4096)]
              (.exists (io/file "non-existing-file.txt"))
              (.toString out)
              "response body")
      :status 200})
   {:port port
    :thread 8}))

(core-http-server)
@(promise)
➜  simplehttp ab -c8 -n 10000 http://localhost:8083/messages 2>/dev/null | grep Failed
Failed requests:        54

➜  simplehttp ab -c8 -n 10000 http://localhost:8083/messages 2>/dev/null | grep Failed
Failed requests:        47

➜  simplehttp ab -c1 -n 10000 http://localhost:8083/messages 2>/dev/null | grep Failed
Failed requests:        0

➜  simplehttp ab -c1 -n 10000 http://localhost:8083/messages 2>/dev/null | grep Failed
Failed requests:        0

➜  simplehttp curl http://localhost:8083                                              
response body%    

@brdloush
Copy link
Author

Btw we can maybe even remove the httpkit out of your minimal case, can't we? Something like (pmap) with your " interop on different kinds of objects in a function that is executed in parallel" migh be just enough.

@borkdude
Copy link

Yes. More minimal repro:

(import '[java.net URLDecoder])

(def f (fn [uri]
         (URLDecoder/decode uri)
         (.length uri)))

(println (pmap f (map str (range 10000))))

@borkdude
Copy link

Tracking the issue here: babashka/babashka#884

@borkdude
Copy link

When compiling babashka with GraalVM 21.1 instead if 21.0 it started working. Pfffff.

@brdloush Would you try out this new binary? What's your OS?

@brdloush
Copy link
Author

@borkdude Awesome 👍 🤞

Sure, I'll try. I'm on linux x64 ubuntu 21.04

@brdloush
Copy link
Author

I just compiled current babashka master v0.4.6-SNAPSHOT with graalvm 21.1 and your minimal test case seems to work. I'll try the other versions you enclosed.

@brdloush
Copy link
Author

Both enclosed versions are working fine 👏

What's more interesting is that it's also perfoming insanely well:

➜  babashka-scittle-guestbook git:(main) ab -c 8 -n 10000 http://localhost:8083/messages
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        http-kit
Server Hostname:        localhost
Server Port:            8083

Document Path:          /messages
Document Length:        112 bytes

Concurrency Level:      8
Time taken for tests:   0.690 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2470000 bytes
HTML transferred:       1120000 bytes
Requests per second:    14491.85 [#/sec] (mean)
Time per request:       0.552 [ms] (mean)
Time per request:       0.069 [ms] (mean, across all concurrent requests)
Transfer rate:          3495.59 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     0    0   0.1      0       2
Waiting:        0    0   0.1      0       2
Total:          0    1   0.1      1       2

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%      2 (longest request)

@borkdude
Copy link

@brdloush Excellent. I will release a new bb.

@brdloush
Copy link
Author

I wonder what the real issue with that older GraalVM 21.0 was and which of the fixes in 21.1 actually fixed this specific problem. Hopefully they'll not re-introduce that issue in the future :-)

@borkdude
Copy link

@brdloush I have asked about this in the graal channel, I am also interested in finding this out. To make sure to prevent a regression, I am adding a test now.

@borkdude
Copy link

@brdloush @kloimhardt Released 0.4.6: https://github.com/babashka/babashka/blob/master/CHANGELOG.md#046

@kloimhardt I think you can remove the warning in the README now and change it to a recommendation to use 0.4.6.

@borkdude
Copy link

@brdloush Btw, thanks for discovering this! I'll be looking forward to discover which commit(s) fixed this in GraalVM but it looks like searching for a needle in a haystack.

@kloimhardt
Copy link
Owner

Impressive seeing the masters at work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants