Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: http response code 404 from gpfdist #30

Closed
skazka064 opened this issue Nov 8, 2022 · 19 comments
Closed

ERROR: http response code 404 from gpfdist #30

skazka064 opened this issue Nov 8, 2022 · 19 comments

Comments

@skazka064
Copy link

Hello
I am getting an error while running the script.
Help me figure it out
Sincerely, Sergey Berezin.

stop gpfdist on all ports
92962
killing 93075
killing 93128
93166
killing 93204
280453
killing 280505
killing280543
killing280581
killing280670
killing290935
killing291026
killing291069
killing291107
killing291145
executing on vitrina02 ./start_gpfdist.sh 4001 /data1/primary/gpseg0/pivotalguru
Started gpfdist on port 4001
executing on vitrina02 ./start_gpfdist.sh 4002 /data1/primary/gpseg1/pivotalguru
Started gpfdist on port 4002
executing on vitrina02 ./start_gpfdist.sh 4003 /data1/primary/gpseg2/pivotalguru
Started gpfdist on port 4003
executing on vitrina02 ./start_gpfdist.sh 4004 /data1/primary/gpseg3/pivotalguru
Started gpfdist on port 4004
executing on vitrina02 ./start_gpfdist.sh 4005 /data1/primary/gpseg4/pivotalguru
Started gpfdist on port 4005
executing on vitrina03 ./start_gpfdist.sh 4001 /data1/primary/gpseg5/pivotalguru
Started gpfdist on port 4001
executing on vitrina03 ./start_gpfdist.sh 4002 /data1/primary/gpseg6/pivotalguru
Started gpfdist on port 4002
executing on vitrina03 ./start_gpfdist.sh 4003 /data1/primary/gpseg7/pivotalguru
Started gpfdist on port 4003
executing on vitrina03 ./start_gpfdist.sh 4004 /data1/primary/gpseg8/pivotalguru
Started gpfdist on port 4004
executing on vitrina03 ./start_gpfdist.sh 4005 /data1/primary/gpseg9/pivotalguru
Started gpfdist on port 4005
executing on vitrina04 ./start_gpfdist.sh 4001 /data1/primary/gpseg10/pivotalguru
Started gpfdist on port 4001
executing on vitrina04 ./start_gpfdist.sh 4002 /data1/primary/gpseg11/pivotalguru
Started gpfdist on port 4002
executing on vitrina04 ./start_gpfdist.sh 4003 /data1/primary/gpseg12/pivotalguru
Started gpfdist on port 4003
executing on vitrina04 ./start_gpfdist.sh 4004 /data1/primary/gpseg13/pivotalguru
Started gpfdist on port 4004
executing on vitrina04 ./start_gpfdist.sh 4005 /data1/primary/gpseg14/pivotalguru
Started gpfdist on port 4005
psql -v ON_ERROR_STOP=1 -f /pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql:1: ERROR: http response code 404 from gpfdist (gpfdist://vitrina03:4002/time_dim_[0-9]_[0-9 ].dat): HTTP/1.0 404 file not found (seg6 slice1 192.168.11.24:10001 pid=97255)

@RunningJon
Copy link
Owner

ssh to vitrina03 and run ps -ef | grep gpfdist to see if it is running.
You can view the detailed logs in the home directory of gpadmin on that host too.
Refer to this:
https://github.com/RunningJon/TPC-DS/blob/master/04_load/start_gpfdist.sh#L8

@skazka064
Copy link
Author

Hello.
Thanks for the answer.
I send the output of the command, as well as the log of the program.

Sincerely, Sergey Berezin.

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[root@vitrina03 ~]# ps -ef | grep gpfdist
gpadmin 81087 1 0 16:57 ? 00:00:00 gpfdist -p 4001 -d /data1/primary/gpseg5/pivotalguru
gpadmin 81125 1 0 16:57 ? 00:00:00 gpfdist -p 4002 -d /data1/primary/gpseg6/pivotalguru
gpadmin 81163 1 0 16:57 ? 00:00:00 gpfdist -p 4003 -d /data1/primary/gpseg7/pivotalguru
gpadmin 81254 1 0 16:57 ? 00:00:00 gpfdist -p 4004 -d /data1/primary/gpseg8/pivotalguru
gpadmin 81297 1 0 16:57 ? 00:00:00 gpfdist -p 4005 -d /data1/primary/gpseg9/pivotalguru
root 101134 101007 0 17:23 pts/0 00:00:00 grep --color=auto gpfdist
[root@vitrina03 ~]#

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2022-11-08 16:57:03 81125 INFO Before opening listening sockets - following listening sockets are available:
2022-11-08 16:57:03 81125 INFO IPV6 socket: [::]:4002
2022-11-08 16:57:03 81125 INFO IPV4 socket: 0.0.0.0:4002
2022-11-08 16:57:03 81125 INFO Trying to open listening socket:
2022-11-08 16:57:03 81125 INFO IPV6 socket: [::]:4002
2022-11-08 16:57:03 81125 INFO Opening listening socket succeeded
2022-11-08 16:57:03 81125 INFO Trying to open listening socket:
2022-11-08 16:57:03 81125 INFO IPV4 socket: 0.0.0.0:4002
2022-11-08 16:57:03 81125 INFO Opening listening socket succeeded
Serving HTTP on port 4002, directory /data1/primary/gpseg6/pivotalguru
2022-11-08 16:57:13 81125 INFO [0:1:0:8] 192.168.11.24 requests /time_dim_[0-9]_[0-9].dat
2022-11-08 16:57:13 81125 INFO [0:1:0:8] got a request at port 30656:
GET /time_dim_[0-9]_[0-9].dat HTTP/1.1
2022-11-08 16:57:13 81125 INFO [0:1:0:8] request headers:
2022-11-08 16:57:13 81125 INFO [0:1:0:8] Host:192.168.11.24:4002
2022-11-08 16:57:13 81125 INFO [0:1:0:8] Accept:/
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-XID:1667912973-0000000268
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-CID:2
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-SN:0
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-SEGMENT-ID:6
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-SEGMENT-COUNT:15
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-LINE-DELIM-LENGTH:-1
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-PROTO:1
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-MASTER_HOST:192.168.11.63
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-MASTER_PORT:5432
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-CSVOPT:m0x 92q 0n0h0
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP_SEG_PG_CONF:/data1/primary/gpseg6/postgresql.conf
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP_SEG_DATADIR:/data1/primary/gpseg6
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-DATABASE:adb
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-USER:gpadmin
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-SEG-PORT:10001
2022-11-08 16:57:13 81125 INFO [0:1:0:8] X-GP-SESSION-ID:722
2022-11-08 16:57:13 81125 INFO remove sessions
2022-11-08 16:57:13 81125 INFO [0:1:6:8] r->path /data1/primary/gpseg6/pivotalguru/time_dim_[0-9]_[0-9].dat
2022-11-08 16:57:13 81125 INFO [0:1:6:8] new session trying to open the data stream
gfile stat /data1/primary/gpseg6/pivotalguru/time_dim_[0-9]_[0-9].dat failure: No such file or directory
fstream unable to open file /data1/primary/gpseg6/pivotalguru/time_dim_[0-9]_[0-9].dat
2022-11-08 16:57:13 81125 WARN [0:1:6:8] reject request from 192.168.11.24, path /data1/primary/gpseg6/pivotalguru/time_dim_[0-9]_[0-9].dat
2022-11-08 16:57:13 81125 WARN [0:1:6:8] HTTP ERROR: 192.168.11.24 - 404 file not found

2022-11-08 16:57:13 81125 INFO [0:1:6:8] request end
2022-11-08 16:57:13 81125 INFO [0:1:6:8] detach segment request from session
2022-11-08 16:57:13 81125 INFO [0:1:6:8] successfully shutdown socket
2022-11-08 16:57:13 81125 INFO [0:1:6:8] peer closed after gpfdist shutdown
2022-11-08 16:57:13 81125 INFO [0:1:6:8] unsent bytes: 0 (-1 means not supported)
2022-11-08 16:57:13 81125 INFO [0:1:6:8] successfully closed socket

@RunningJon
Copy link
Owner

This means gpfdist is working properly and simply couldn't find any files.

Did the generate data step finish?

Look for the log file on that host named generate_data.x.log. Refer to this:
https://github.com/RunningJon/TPC-DS/blob/master/04_load/start_gpfdist.sh#L8

@skazka064
Copy link
Author

Here is the data in the Log.
Sincerely, Sergey Berezin.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

GEN_DATA_SCALE: 10000
CHILD: 1
PARALLEL: 15
GEN_DATA_PATH: /data1/primary/gpseg5/pivotalguru
./generate_data.sh: line 32: /home/gpadmin/dsdgen: No such file or directory

@RunningJon
Copy link
Owner

This means the dsdgen binary didn't get copied to the host.

https://github.com/RunningJon/TPC-DS/blob/master/00_compile_tpcds/rollout.sh#L33

Do you see this file? 00_compile_tpcds/tools/dsqgen

If so, does segment_hosts.txt have all of the correct segment hosts in it including this host?

@skazka064
Copy link
Author

Yes, the dsqgen file is in place.
The hostnames are spelled correctly.

Sincerely, Sergey Berezin.

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

-rwxrwxr-x 1 gpadmin gpadmin 455416 Oct 31 15:53 dsdgen
-rwxrwxr-x 1 gpadmin gpadmin 286416 Oct 31 15:53 dsqgen

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

[root@vitrina01 ~]# find / -name segment_hosts.txt
find: ‘/proc/255094’: No such file or directory
/pivotalguru/TPC-DS/segment_hosts.txt
[root@vitrina01 ~]# cat /pivotalguru/TPC-DS/segment_hosts.txt
vitrina03
vitrina04
vitrina02
[root@vitrina01 ~]# ping vitrina03
PING vitrina03 (192.168.11.24) 56(84) bytes of data.
64 bytes from vitrina03 (192.168.11.24): icmp_seq=1 ttl=64 time=0.198 ms
64 bytes from vitrina03 (192.168.11.24): icmp_seq=2 ttl=64 time=0.180 ms
^C
--- vitrina03 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.180/0.189/0.198/0.009 ms
[root@vitrina01 ~]# ping vitrina04
PING vitrina04 (192.168.11.25) 56(84) bytes of data.
64 bytes from vitrina04 (192.168.11.25): icmp_seq=1 ttl=64 time=0.250 ms
64 bytes from vitrina04 (192.168.11.25): icmp_seq=2 ttl=64 time=0.184 ms
^C
--- vitrina04 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.184/0.217/0.250/0.033 ms
[root@vitrina01 ~]# ping vitrina02
PING vitrina02 (192.168.11.49) 56(84) bytes of data.
64 bytes from vitrina02 (192.168.11.49): icmp_seq=1 ttl=64 time=0.268 ms
64 bytes from vitrina02 (192.168.11.49): icmp_seq=2 ttl=64 time=0.242 ms
^C
--- vitrina02 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.242/0.255/0.268/0.013 ms
[root@vitrina01 ~]#

@RunningJon
Copy link
Owner

10TB with only 15 segments on 3 nodes will take a very long time to complete.

The error was on vitrina03:
./generate_data.sh: line 32: /home/gpadmin/dsdgen: No such file or directory

But you said it is in place. Correct? ls -la /home/gpadmin/dsdgen

@skazka064
Copy link
Author

The file is in a different location.

Sincerely, Sergey Berezin.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

[root@vitrina01 tools]# ls -la /home/gpadmin/dsdgen
ls: cannot access /home/gpadmin/dsdgen: No such file or directory
[root@vitrina01 tools]# pwd
/pivotalguru/TPC-DS/00_compile_tpcds/tools

@RunningJon
Copy link
Owner

dsdgen is the program that is compiled on vitrina01 (master) in /pivotalguru/TPC-DS/00_compile_tpcds/tools and copied to every segment host (vitrina02, vitrina03, vitrina04). The error message said that the file didn't exist on vitrina03.

@skazka064
Copy link
Author

Sincerely, Sergey Berezin.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[gpadmin@vitrina03 ~]$ ls -la /home/gpadmin/dsdgen
ls: cannot access /home/gpadmin/dsdgen: No such file or directory
[gpadmin@vitrina03 ~]$ pwd
/home/gpadmin
[gpadmin@vitrina03 ~]$

@RunningJon
Copy link
Owner

https://github.com/RunningJon/TPC-DS/blob/master/00_compile_tpcds/rollout.sh#L33

Are you able to ssh between the nodes as gpadmin? Can you scp files from the master node to every segment host?

@skazka064
Copy link
Author

Yes, there is a connection.
Sincerely, Sergey Berezin.

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

[root@vitrina01 TPC-DS-master]# scp tpcds.log vitrina02:/home/gpadmin/
tpcds.log 100% 197KB 33.4MB/s 00:00
[root@vitrina01 TPC-DS-master]#

@RunningJon
Copy link
Owner

I would set RUN_COMPILE_TPCDS to "true" and RUN_GEN_DATA to "true" in tpcds_variables.sh and then run the benchmark again. When it ran the compile the first time, it did not copy the binary to the segment hosts.

@skazka064
Copy link
Author

Now the log is showing:
copy tpcds binaries to vitrina03:/home/gpadmin
scp: /home/gpadmin//dsdgen: Text file busy

@RunningJon
Copy link
Owner

You need to kill all of the dsdgen processes on the segment hosts and make sure the benchmark isn't running.

@skazka064
Copy link
Author

Hello.
Everything seems to have started.
Thank you.
And where is the number of Threads configured?

Where can i change variable for this one, if this is possible?
"Starting analyze with 5 workers..." ?

Sincerely, Sergey Berezin.

@RunningJon
Copy link
Owner

It is running analyzedb which started the 5 workers. The command does allow you to run more workers but that isn't exposed in this benchmark utility.

@skazka064
Copy link
Author

Please tell me how to run more or less than five worker processes?

Sincerely, Sergey Berezin.

@RunningJon
Copy link
Owner

for analyze? You could hard code it in the script but it really won't help much. That isn't what is taking a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants