Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questdb.jar is referenced in questdb-6.4.3-rt-linux-amd64 questdb.sh script - questdb.sh script seemingly not working on RHEL 8.4 #2348

Closed
CJCShadowsan opened this issue Jul 25, 2022 · 23 comments · Fixed by #2352
Labels
Bug Incorrect or unexpected behavior

Comments

@CJCShadowsan
Copy link

CJCShadowsan commented Jul 25, 2022

Describe the bug

Downloaded latest release to mistakenly think questdb.jar file is missing:

/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64
#> find . -name questdb.jar

This is because questdb.sh references questdb.jar, and the first step in looking when the script failed to run led me to that point.

To reproduce

  1. Download tar file
  2. Extract tar file
  3. Try run ./questdb.sh
  4. Check open ports - Nothing relevant to questdb
  5. Looked in questdb.sh that references questdb.jar
  6. Looked in bin directory of extracted tar

Expected Behavior

questdb.jar should... Work out of the box for RHEL 8.4.

Environment

- **QuestDB version**:6.4.3
- **OS**:RHEL8.4
- **Browser**:

Additional context

No response

@CJCShadowsan
Copy link
Author

And even if it's meant to be uncompiled in the libs dir... It doesn't actually do anything. It logs:

cat /cache/questdb/log/stdout-2022-07-25T23-24-56.txt
2022-07-25T23:24:56.734678Z A server-main QuestDB server 6.4.3. Copyright (C) 2014-2022, all rights reserved.
2022-07-25T23:24:56.908019Z I server-main web console is up to date
2022-07-25T23:24:56.908628Z A server-main Server config: /cache/questdb/conf/server.conf
2022-07-25T23:24:56.962257Z I i.q.c.t.t.InputFormatConfiguration loading input format config [resource=/text_loader.json]
2022-07-25T23:24:56.980850Z A server-main Config changes applied:
2022-07-25T23:24:56.980887Z A server-main   http.enabled : true
2022-07-25T23:24:56.980916Z A server-main   tcp.enabled  : true
2022-07-25T23:24:56.980940Z A server-main   pg.enabled   : true
2022-07-25T23:24:56.980960Z A server-main open database [id=6073491747499511394.-4686445700594837497]
2022-07-25T23:24:56.981005Z A server-main platform [bit=64]
2022-07-25T23:24:56.981030Z A server-main OS/Arch: linux/amd64 [AVX512,10]
2022-07-25T23:24:56.983734Z A server-main available CPUs: 48
2022-07-25T23:24:56.983839Z A server-main db root: /cache/questdb/db
2022-07-25T23:24:56.983865Z A server-main backup root: null
2022-07-25T23:24:56.983958Z A server-main db file system magic: 0xef53 [ext4] SUPPORTED
2022-07-25T23:24:56.984699Z A server-main SQL JIT compiler mode: on
2022-07-25T23:24:56.986168Z I i.q.g.FunctionFactoryCache loading functions [test=false]
2022-07-25T23:24:57.120244Z I i.q.c.m.EngineMigration table structures are up to date
2022-07-25T23:24:57.225889Z I i.q.c.p.ReaderPool open 'telemetry_config' [at=0:0]
2022-07-25T23:24:57.235471Z I i.q.TelemetryJob Failed to alter telemetry table [table=telemetry_config,error=column 'version' already exists]
2022-07-25T23:24:57.235701Z I i.q.TelemetryJob Failed to alter telemetry table [table=telemetry_config,error=column 'os' already exists]
2022-07-25T23:24:57.235844Z I i.q.TelemetryJob Failed to alter telemetry table [table=telemetry_config,error=column 'package' already exists]
2022-07-25T23:24:57.235949Z I i.q.c.p.WriterPool open [table=`telemetry`, thread=1]
2022-07-25T23:24:57.245166Z I i.q.c.TableWriter open 'telemetry'
2022-07-25T23:24:57.248333Z I i.q.c.TableWriter switched partition [path='/cache/questdb/db/telemetry/default']
2022-07-25T23:24:57.248701Z I i.q.c.p.WriterPool >> [table=`telemetry`, thread=1]
2022-07-25T23:24:57.248760Z I i.q.c.p.WriterPool open [table=`telemetry_config`, thread=1]
2022-07-25T23:24:57.248854Z I i.q.c.TableWriter open 'telemetry_config'
2022-07-25T23:24:57.250710Z I i.q.c.TableWriter switched partition [path='/cache/questdb/db/telemetry_config/default']
2022-07-25T23:24:57.251097Z I i.q.c.p.WriterPool >> [table=`telemetry_config`, thread=1]
2022-07-25T23:24:57.252488Z I i.q.g.SqlCompiler plan [q=`select-choose id, enabled, version, os, package from (select [id, enabled, version, os, package] from telemetry_config) limit -(1)`, fd=-1]
2022-07-25T23:24:57.256685Z I i.q.c.TableReader open partition /cache/questdb/db/telemetry_config/default [rowCount=1, partitionNameTxn=-1, transientRowCount=1, partitionIndex=0, partitionCount=1]
2022-07-25T23:24:57.257004Z A i.q.TelemetryJob instance [id=0x05e4a89928e5820025e01c1a5fc9fa, enabled=true]
2022-07-25T23:24:57.267918Z I i.q.c.p.WriterPool open [table=`sys.column_versions_purge_log`, thread=1]
2022-07-25T23:24:57.268068Z I i.q.c.TableWriter open 'sys.column_versions_purge_log'
2022-07-25T23:24:57.269275Z I i.q.c.TableWriter purging non attached partitions [path=/cache/questdb/db/sys.column_versions_purge_log]
2022-07-25T23:24:57.269548Z I i.q.c.p.WriterPool >> [table=`sys.column_versions_purge_log`, thread=1]
2022-07-25T23:24:57.270428Z I i.q.c.p.ReaderPool open 'sys.column_versions_purge_log' [at=0:0]
2022-07-25T23:24:57.271824Z I i.q.g.SqlCompiler plan [q=`select-choose ts, table_name, column_name, table_id, truncate_version, columnType, table_partition_by, updated_txn, column_version, partition_timestamp, partition_name_txn, completed from (select [ts, table_name, column_name, table_id, truncate_version, columnType, table_partition_by, updated_txn, column_version, partition_timestamp, partition_name_txn, completed] from sys.column_versions_purge_log timestamp (ts) where ts > dateadd('d',-(31),now()) and completed = null)`, fd=-1]
2022-07-25T23:24:57.278019Z I i.q.g.SqlCodeGenerator JIT enabled for (sub)query [tableName=sys.column_versions_purge_log, fd=-1]
2022-07-25T23:24:57.293596Z A http-server listening on 0.0.0.0:9000 [fd=89 backlog=256]
2022-07-25T23:24:57.380040Z A server-main Min health server is starting. Health check endpoint will not consider unhandled errors when metrics are disabled.
2022-07-25T23:24:57.381485Z I server-main started
2022-07-25T23:24:57.381568Z I server-main os scheduled [name=questdb-worker-2]
2022-07-25T23:24:57.383595Z A pg-server listening on 0.0.0.0:8812 [fd=93 backlog=10]
2022-07-25T23:24:57.380360Z A http-min-server listening on 0.0.0.0:9003 [fd=91 backlog=4]

And then nothing happens. No process running, nothing.

@CJCShadowsan
Copy link
Author

So either this is extremely poorly documented, or the binary release flat-out doesn't work.

@CJCShadowsan
Copy link
Author

Before anyone says "Why can't you just pull the docker container?"

The node is airgapped from the internet, and if you provide a binary? It should work!

@jerrinot
Copy link
Contributor

jerrinot commented Jul 26, 2022

hi @CJCShadowsan,

the RT version of QuestDB distributions does not contain questdb.jar. That's by design. If you inspect the archive, you will see a file ./lib/moduleswhich has approx. 20MB. This is a Java module image which contains QuestDB classes.

The logs you posted indicate QuestDB was started. QuestDB itself writes the logs.
Can you please clarify this step in your reproducer?

Try run ./questdb.sh - Wonder why nothing works

How did you establish that nothing works? I just tried it on Ubuntu@x64 and it works just fine for me. Have you tried to connect to localhost:9000? There should be a web server listening on that port.

@bluestreak01
Copy link
Member

RT is assembled by jlink documentation.

if you need Jar, download no-jre version

@CJCShadowsan
Copy link
Author

hi @CJCShadowsan,

the RT version of QuestDB distributions does not contain questdb.jar. That's by design. If you inspect the archive, you will see a file ./lib/moduleswhich has approx. 20MB. This is a Java module image which contains QuestDB classes.

The logs you posted indicate QuestDB was started. QuestDB itself writes the logs. Can you please clarify this step in your reproducer?

Try run ./questdb.sh - Wonder why nothing works

How did you establish that nothing works? I just tried it on Ubuntu@x64 and it works just fine for me. Have you tried to connect to localhost:9000? There should be a web server listening on that port.

Because literally... Nothing works:

#> [as1][ 2022-07-26 11:24:18 ] root@test:/root/questdb-6.4.3
#> md5sum questdb-6.4.3-rt-linux-amd64.tar.gz
1bf6b29da74bbd6e93fbec1132bd4220  questdb-6.4.3-rt-linux-amd64.tar.gz

#> [as1][ 2022-07-26 11:24:28 ] root@test:/root/questdb-6.4.3
#> tar -xf questdb-6.4.3-rt-linux-amd64.tar.gz

#> [as1][ 2022-07-26 11:24:35 ] root@test:/root/questdb-6.4.3
#> cd questdb-6.4.3-rt-linux-amd64

#> [as1][ 2022-07-26 11:24:45 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64
#> ls
bin  conf  legal  lib  release

#> [as1][ 2022-07-26 11:24:48 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64
#> cd bin

#> [as1][ 2022-07-26 11:24:51 ] root@as1-insight1:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64/bin
#> ls
java  keytool  questdb.sh

#> [as1][ 2022-07-26 11:24:52 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64/bin
#> ./questdb.sh start -d /cache/questdb -t testing

  ___                  _   ____  ____
 / _ \ _   _  ___  ___| |_|  _ \| __ )
| | | | | | |/ _ \/ __| __| | | |  _ \
| |_| | |_| |  __/\__ \ |_| |_| | |_) |
 \__\_\\__,_|\___||___/\__|____/|____/
                        www.questdb.io

JAVA: ./java

#> [as1][ 2022-07-26 11:25:15 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64/bin
#> QuestDB server 6.4.3
Copyright (C) 2014-2022, all rights reserved.

Reading log configuration from /cache/questdb/conf/log.conf
Log configuration loaded from: /cache/questdb/conf/log.conf
Created log directory: log
Log configuration loaded from default internal file.


#> [as1][ 2022-07-26 11:25:23 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64/bin
#> netstat -anp | grep 9000

#> [as1][ 2022-07-26 11:25:34 ] root@test:/root/questdb-6.4.3/questdb-6.4.3-rt-linux-amd64/bin
#>

@CJCShadowsan
Copy link
Author

hi @CJCShadowsan,

the RT version of QuestDB distributions does not contain questdb.jar. That's by design. If you inspect the archive, you will see a file ./lib/moduleswhich has approx. 20MB. This is a Java module image which contains QuestDB classes.

This is a fair enough point - I wouldn't have gone looking for the words questdb.jar in the questdb.sh script if it had started and ran - red-herring perhaps and a misleading title. I'll update.

@CJCShadowsan CJCShadowsan changed the title questdb.jar is missing in questdb-6.4.3-rt-linux-amd64 questdb.jar is referenced in questdb-6.4.3-rt-linux-amd64 questdb.sh script - questdb.sh script seemingly not working on RHEL 8.4 Jul 26, 2022
@CJCShadowsan
Copy link
Author

Unfortunately in those logs is as far as it goes... It stops doing anything at that point. No process running, no ports listening, the status option says not running.

No further logs to indicate what's happened either.

@jerrinot
Copy link
Contributor

jerrinot commented Jul 26, 2022

hi @CJCShadowsan, that's weird. I just tried it on

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)

and it's starting fine:

$ curl "http://localhost:9000/exec?query=select%20now();"
{"query":"select now();","columns":[{"name":"now","type":"TIMESTAMP"}],"dataset":[["2022-07-26T12:11:05.601440Z"]],"count":1}

Is there anything in a system log? Perhaps kernel OOM killer kicking in? What are the specs of the machine you are using? Is there any other server running on the same box? Are you using the default configuration?

@CJCShadowsan
Copy link
Author

hi @CJCShadowsan, that's weird. I just tried it on

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)

and it's starting fine:

$ curl "http://localhost:9000/exec?query=select%20now();"
{"query":"select now();","columns":[{"name":"now","type":"TIMESTAMP"}],"dataset":[["2022-07-26T12:11:05.601440Z"]],"count":1}

Is there anything in a system log? Perhaps kernel OOM killer kicking in? What are the specs of the machine you are using? Is there any other server running on the same box? Are you using the default configuration?

No OOM, There are other servers running on the same box (Influx, KairosDB, Cassandra, Grafana, Prometheus) but none using the ports requested.

System is a 48-core system with 256GB Ram.

Literally using the default configuration - nothing changed, just attempting to get it to start up with the exact commands listed.

@bluestreak01
Copy link
Member

bluestreak01 commented Jul 26, 2022

This is a beefy box. Problem is likely to be very simple.

There should be file /cache/questdb/log/stdout-*. Could you cat it please. Sometime there are port conflict and instance does not start because that. All errors can be found in that log file (latest file if there multiple).

Also, any error QuestDB gets from the OS is logger with errno, sometimes in [], such as [5] (this would be permission denied). This is just an example.

Right, just reading this thread from top - there is log, but no errors and process exited ?

@CJCShadowsan
Copy link
Author

Exactly. Nothing in the log other than the output I posted... It just exits seemingly.

@CJCShadowsan
Copy link
Author

That last log entry:

2022-07-25T23:24:57.380360Z A http-min-server listening on 0.0.0.0:9003 [fd=91 backlog=4]

That's the last it says.

@bluestreak01
Copy link
Member

what is the exact CPU model of this system and the type of the file system ?

@bluestreak01
Copy link
Member

ah, ok, i think know what this is

@bluestreak01
Copy link
Member

edit the conf/server.conf file to add this line:

line.udp.enabled=false

@bluestreak01
Copy link
Member

it looks like you don't have network interfaces with multicast enabled and UDP receiver tumbles over. Let us know if disabling it helps.

@CJCShadowsan
Copy link
Author

it looks like you don't have network interfaces with multicast enabled and UDP receiver tumbles over. Let us know if disabling it helps.

Will do

@bluestreak01
Copy link
Member

I could reproduce the same symptoms after shutting down network interfaces with BROADCAST flags. We'll fix this.

@pswu11 pswu11 added the Bug Incorrect or unexpected behavior label Jul 26, 2022
@bluestreak01
Copy link
Member

When server cannot join multicast group, it should log something like:

2022-07-26T15:32:08.865316Z E server-main [errno=19] cannot join group [fd=92, bind=0.0.0.0, group=232.1.2.3]

Issue is that logger is async and on fast system process exits before logger can flush the log line. The problem is fixed in this PR: #2345

@CJCShadowsan
Copy link
Author

I could reproduce the same symptoms after shutting down network interfaces with BROADCAST flags. We'll fix this.

Funnily enough, broadcast is enabled, but the service sits behind a keepalived instance, which listens on a floating IP (high-availability was setup for InfluxDB this way, with cross-replication between both instances)

The moment I set:

line.udp.unicast=true

It began to work:

2022-07-26T15:46:46.563476Z I i.q.c.l.u.AbstractLineProtoUdpReceiver receiving unicast on 0.0.0.0:9009 [fd=94, commitRate=1000000]
2022-07-26T15:46:46.565196Z I i.q.c.l.t.LineTcpReceiver using default context
2022-07-26T15:46:46.565775Z A tcp-line-server listening on 0.0.0.0:9009 [fd=95 backlog=10]
2022-07-26T15:46:46.581735Z I server-main os scheduled [name=questdb-ilpio-3]
2022-07-26T15:46:46.581873Z I server-main os scheduled [name=questdb-ilpio-4]
2022-07-26T15:46:46.581989Z I server-main os scheduled [name=questdb-ilpio-5]
2022-07-26T15:46:46.582338Z I server-main os scheduled [name=questdb-ilpio-6]
2022-07-26T15:46:46.582390Z I server-main os scheduled [name=questdb-ilpio-7]
2022-07-26T15:46:46.582436Z I server-main started
2022-07-26T15:46:46.582773Z I server-main os scheduled [name=questdb-ilpio-8]
2022-07-26T15:46:46.582813Z I server-main started
2022-07-26T15:46:46.582888Z I server-main os scheduled [name=questdb-ilpwriter-9]
2022-07-26T15:46:46.582997Z I server-main os scheduled [name=questdb-worker-10]
2022-07-26T15:46:46.583072Z I server-main os scheduled [name=questdb-worker-11]
2022-07-26T15:46:46.583424Z I server-main os scheduled [name=questdb-worker-12]
2022-07-26T15:46:46.583568Z I server-main os scheduled [name=questdb-worker-13]
2022-07-26T15:46:46.584577Z I server-main os scheduled [name=questdb-worker-14]
2022-07-26T15:46:46.584611Z I server-main os scheduled [name=questdb-worker-15]
2022-07-26T15:46:46.585518Z I server-main os scheduled [name=questdb-worker-17]
2022-07-26T15:46:46.585674Z I server-main os scheduled [name=questdb-worker-18]
2022-07-26T15:46:46.586294Z I server-main os scheduled [name=questdb-worker-16]
2022-07-26T15:46:46.586500Z I server-main os scheduled [name=questdb-worker-21]
2022-07-26T15:46:46.586701Z I server-main os scheduled [name=questdb-worker-19]
2022-07-26T15:46:46.586712Z I server-main os scheduled [name=questdb-worker-20]
2022-07-26T15:46:46.586715Z I server-main os scheduled [name=questdb-worker-22]
2022-07-26T15:46:46.587160Z I server-main started
2022-07-26T15:46:46.587190Z I server-main web console URL(s):

        http://10.96.200.100:9000
        http://10.96.0.3:9000
        http://127.0.0.1:9000


2022-07-26T15:46:46.587662Z I server-main os scheduled [name=questdb-worker-23]
2022-07-26T15:46:47.151098Z A server-main enjoy

@CJCShadowsan
Copy link
Author

For reference:

2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ac:1f:6b:0c:87:54 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.3/16 brd 10.96.255.255 scope global noprefixroute eno1
       valid_lft forever preferred_lft forever
    inet 10.96.200.100/16 scope global secondary eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:fe0c:8754/64 scope link
       valid_lft forever preferred_lft forever

And keepalived:

#> cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

vrrp_script chk_nginx {
    script "pidof nginx"
    interval 2
}

vrrp_script chk_snmpreceiver {
    script "pidof snmpreceiver"
    interval 2
}

vrrp_script chk_di-rest {
    script "pidof direst"
    interval 2
}
vrrp_instance VI_1 {
    priority 105
    interface eno1
    virtual_router_id 33
    advert_int 1
    virtual_ipaddress {
        10.96.200.100/16
    }
    track_script {
        chk_snmpreceiver
        chk_nginx
        chk_di-rest
    }
    authentication {
        auth_type PASS
        auth_pass ********
    }
}

@bluestreak01
Copy link
Member

We make network call and it is refused for whatever reason. Could be Linux core or an intermediary. Bummer is that we didn't report the error. Correctly logger error brings error codes that let diagnosing this issue further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Incorrect or unexpected behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants