Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Oathkeeper (v0.14.2_oryOS.10) returning empty reply on slow/long distance database calls #178
Describe the bug
When running the following curl in the same k8s cluster as oathkeeper, I’m seeing the following
[oathkeeper-api-7f4cc5cb9f-dbmr6] time=“2019-04-24T18:50:58Z” level=info msg=“started handling request” method=GET remote=“10.39.0.47:60820” request="/rules?limit=50000&offset=0"
Running what I believe to be the same query from the postgres logs and the code (https://github.com/ory/oathkeeper/blob/master/rule/manager_sql.go#L79) from the same pod issuing the curl requests returns within a few ms.
There are currently around 50 roles configured. When I poll with numbers less than 20 (specifically, 19) it works. Once I try 20, it times out just over 10 seconds.
I tried to up the max_conns but didn't see any positive results. I dug through the code and it doesn't look like that'd help a ton anyway.
Once work starts on the issue referenced, I may be able to dedicate some time to helping out with it.
I can keep poking around and add some logging when I get a chance. I assume that part of it is that its making multiple queries, and 70ms+N adds up. Writing those to be nested with one query would probably be faster overall, but.....
#177 is definitely a big step in the right direction. Localized reads stored in memory will be a lot faster than constant db reads. Things would get a little trickier with hot reloads because you'd have to read repeatedly, but the odds of a KV store being faster than DB lookups are pretty strong.
I have some other thoughts about #177 that probably aren't appropriate for here. If I get time, I'll hop on discord to rant.
I was having the same problem with 15 rules due to the same cross region issue with europe-east and asia-south and trace the reason. In my case, it was happening due to sequential queries in ListRules logic
So I changed the query and used SQL joins that worked for me and it was necessary as we were in production with oathkeeper so it may be a workaround for you as after #177 it won't be a case