New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak in 6.2 #1865
Comments
Hello @fmoessbauer |
Done. The current usage when executing the statement (according to cgroup) is nearly 2G (with a 2G limit).
|
@fmoessbauer would you be able to repeat the query in 30-60 minutes on the same instance, so that we can have a diff? |
Right after executing the command, the DB crashed and restarted (no memory stats output). After that (running at ~250MB memory):
|
Thanks for all the info. Can you please include table definition? You can copy it to clipboard in the web console. |
Could you also share logs from time of crash ? |
Table definition (renamed tables and cols due to IP):
syslog at time of OOM:
|
One more question, do the distinct values of symbols column increase over time, if yes then how fast? |
No, we use just 2 fixed values there (3 and 14 characters each) |
Did you prepare table definition by copying from web console ? By the way - with memory limited to 2GB you might be getting much more frequent crashes if JVM tries to use more memory . I think the default 1G limit is due to ergonomics and doesn't include non-heap areas like metaspace or thread stacks . 2G doesn't leave a lot of room for mmap-ed files . |
Yes, and it is actually partitioned by hour on disk (dir names similar to
Yes, that was the intention. I wanted to make the OOM appear faster. But in fact you can choose an arbitrary size here (tested with 4GB, 8GB, 16GB). Only the time till OOM is longer. |
Could you go to our public slack channel -> https://slack.questdb.io/ . It will be faster to troubleshoot this way . |
After checking smaps, OS stats & jvm heap stats it looks like : Potential root causes :
@fmoessbauer I'm on the public slack channel if you'd like to continue troubleshooting . |
I tired to reproduce the issue sending randomly data for 3 tables with your schema with 14khz this over single write connection CREATE TABLE 'table_a' (a DOUBLE, b DOUBLE, c DOUBLE, d DOUBLE, e DOUBLE, f DOUBLE, timestamp TIMESTAMP) timestamp (timestamp) PARTITION BY HOUR;
CREATE TABLE 'table_b' (a SYMBOL capacity 256 CACHE, b LONG, c LONG, d LONG, timestamp TIMESTAMP) timestamp (timestamp) PARTITION BY HOUR;
CREATE TABLE 'table_c' (a DOUBLE, b DOUBLE, c DOUBLE, d DOUBLE, timestamp TIMESTAMP) timestamp (timestamp) PARTITION BY HOUR; public class LineTCPSenderMain {
public static void main(String[] args) {
long count = 0;
String hostIPv4 = "127.0.0.1";
int port = 9009;
int bufferCapacity = 4 * 1024;
final Rnd rnd = new Rnd();
long start = System.nanoTime();
try (LineTcpSender sender = new LineTcpSender(Net.parseIPv4(hostIPv4), port, bufferCapacity)) {
while (true) {
if (count * 1_000_000_000L / (System.nanoTime() - start) > 14_000) {
sender.flush();
Os.sleep(10);
} else {
int metric = (int) (rnd.nextLong() % 3);
switch (metric) {
case 0:
sender.metric("table_a")
.field("a", rnd.nextDouble())
.field("b", rnd.nextDouble())
.field("c", rnd.nextDouble())
.field("d", rnd.nextDouble())
.field("e", rnd.nextDouble())
.field("f", rnd.nextDouble());
break;
case 1:
sender.metric("table_b")
.tag("a", rnd.nextBoolean() ? "a" : "asldfkjasldalkdjf")
.field("b", rnd.nextLong())
.field("c", rnd.nextLong())
.field("d", rnd.nextLong());
break;
case 2:
sender.metric("table_c")
.field("a", rnd.nextDouble())
.field("b", rnd.nextDouble())
.field("c", rnd.nextDouble())
.field("d", rnd.nextDouble());
break;
default:
continue;
}
sender.$(Os.currentTimeNanos());
count++;
}
}
}
}
}
After 20 mins memory stable was,
Here are few questions to help to simulate your usage please:
|
I did another check with Felix and things seemed stable with memory limit set to 4GB, ongoing data ingestion and while querying big tables . |
Hi, after ~5hours of insertion, we are now at 5GB RSS. dump_memory_usage()
Quest Config
I just sent the smaps, GC and heap stats via slack. Reg. questions from above:
|
should be fixed in #1901 |
Hi @fmoessbauer in our latest release 6.2.1 this issue is fixed. Could you give it a try and let us know if it's working on your side? Thanks! |
@pswu11 Looks like that fixed the issue. We did no longer observe a monotonic grow of the RSS. |
Describe the bug
We are running QuestDB 6.2 (container) and ingesting data at 14kHz (via ILP, single writer) for a long period. During this period (couple of days), we do not run a single query.
There, we observed that the memory usage of QuestDB rises over time, finally allocating all available memory. When limiting the memory using cgroups (technically via docker-compose mem_limit setting), we observe that the process is periodically OOMd and restarted. Further we observed the following:
time between OOMs (restarts)
docker-compose.yml
QuestDB output after startup
To reproduce
No response
Expected Behavior
Fixed upper bound of allocated memory. At least for the containerized version, this limit should be read from the cgroup.
Environment
Additional context
Without memory limitation:
The text was updated successfully, but these errors were encountered: