Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

PoET not accepting any requests after getting to many requests error 429 #2384

Open
wejdeneHaouari opened this issue Jul 12, 2021 · 10 comments

Comments

@wejdeneHaouari
Copy link

1. Issue
I am running a heavy workload on Sawtooth network for test purpose. When I run the network using PBFT or Raft consensus I have Too many requests error but the network continue to accept requests. However using PoET the network stop accepting any request after I got 429 error.

2. System information

This is the validator configuration for PoET

validator-0:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-0
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-0 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-0/ && \
        while [ ! -f /poet-shared/poet-enclave-measurement ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet-enclave-basename ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet.batch ]; do sleep 1; done && \
        cp /poet-shared/poet.batch / && \
        sawset genesis \
          -k /etc/sawtooth/keys/validator.priv \
          -o config-genesis.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
          sawtooth.consensus.algorithm.name=PoET \
          sawtooth.consensus.algorithm.version=0.1 \
          sawtooth.poet.report_public_key_pem=\
          \\\"$$(cat /poet-shared/simulator_rk_pub.pem)\\\" \
          sawtooth.poet.valid_enclave_measurements=$$(cat /poet-shared/poet-enclave-measurement) \
          sawtooth.poet.valid_enclave_basenames=$$(cat /poet-shared/poet-enclave-basename) \
          -o config.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
             sawtooth.poet.target_wait_time=5 \
             sawtooth.poet.initial_wait_time=25 \
             sawtooth.publisher.max_batches_per_block=100 \
          -o poet-settings.batch && \
        sawadm genesis \
          config-genesis.batch config.batch poet.batch poet-settings.batch && \
        sawtooth-validator -v \
          --bind network:tcp://eth0:8800 \
          --bind component:tcp://eth0:4004 \
          --bind consensus:tcp://eth0:5050 \
          --peering static \
          --endpoint tcp://validator-0:8800 \
          --scheduler parallel \
          --maximum-peer-connectivity 10000
    \""
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

I am running the sawtooth network on a AWS VM, with 8GO of RAM and 2CPU running ubuntu 18.04

2. Question
Any idea how to solve this issue?
Is there a way to disable back pressure test ?

@rowaisi
Copy link

rowaisi commented Jul 18, 2021

Hi-

There is a problem with PoET, after we reach 10 concurrent users, it stops working and it says too many requests error. This usually happens after 5 minutes of working. This might be due to the backpressure module in the sawtooth. But the point is that we have the same backpressure for both RAFT and PBFT. We have only the issue when we use PoET consensus. So generally, PoET is useless because we keep getting too many request errors and the system crashes. But for both PBFT and Raft and it keeps working.

@rowaisi
Copy link

rowaisi commented Jul 18, 2021

Hi @agunde406 @vaporos and @rbuysse ,

I would need your help as I am going to use Hyperledger Sawtooth for a big project for our company. We have the issue with PoET but neither for PBFT nor Raft. After 5 to 10 min maximum, when the users start sending more requests and sending transactions. The system crashes for PoET with the error of 429 error. The issue is the whole system crashes. We have the error with both PBFT and raft but after the users reduce the error will go. But for PoET, it never continues working. Can you please check this bug with the PoET? In this case we have to ignore using of PoET and stick with either Raft or PBFT but we prefer to use PoET as it is a large scale project in Canada.

I appreciate your kind consideration on this matter and looking forward hearing form you

@peterschwarz
Copy link
Contributor

peterschwarz commented Jul 19, 2021

What is the transaction rate that you are submitting against your validators? Are you spreading them across the network or firing them at a single node? How many blocks deep are you when this occurs?

@wejdeneHaouari
Copy link
Author

wejdeneHaouari commented Jul 19, 2021

  1. The transaction rate reaches a maximum of 100 transactions per 10 seconds . This is a graph showing the transaction rate and the error rate. After 450 seconds the network stop working.
    issue

  2. We are using the default PoET network with 5 validators and we are spreading the transaction randomly between them. Please find bellow the complete docker compose file.

  3. Only 33 blocks are created before this error occurs.

Thank you in advance @peterschwarz @agunde406 @vaporos and @rbuysse


version: "2.1"

volumes:
  poet-shared:

services:
  shell:
    image: hyperledger/sawtooth-shell:chime
    container_name: sawtooth-shell-default
    entrypoint: "bash -c \"\
        sawtooth keygen && \
        tail -f /dev/null \
        \""

  validator-0:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-0
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-0 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-0/ && \
        while [ ! -f /poet-shared/poet-enclave-measurement ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet-enclave-basename ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet.batch ]; do sleep 1; done && \
        cp /poet-shared/poet.batch / && \
        sawset genesis \
          -k /etc/sawtooth/keys/validator.priv \
          -o config-genesis.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
          sawtooth.consensus.algorithm.name=PoET \
          sawtooth.consensus.algorithm.version=0.1 \
          sawtooth.poet.report_public_key_pem=\
          \\\"$$(cat /poet-shared/simulator_rk_pub.pem)\\\" \
          sawtooth.poet.valid_enclave_measurements=$$(cat /poet-shared/poet-enclave-measurement) \
          sawtooth.poet.valid_enclave_basenames=$$(cat /poet-shared/poet-enclave-basename) \
          -o config.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
             sawtooth.poet.target_wait_time=5 \
             sawtooth.poet.initial_wait_time=25 \
             sawtooth.publisher.max_batches_per_block=100 \
          -o poet-settings.batch && \
        sawadm genesis \
          config-genesis.batch config.batch poet.batch poet-settings.batch && \
        sawtooth-validator -v \
          --bind network:tcp://eth0:8800 \
          --bind component:tcp://eth0:4004 \
          --bind consensus:tcp://eth0:5050 \
          --peering static \
          --endpoint tcp://validator-0:8800 \
          --scheduler parallel \
          --network-auth trust
    \""
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-1:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-1
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-1 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-1/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-1:8800 \
            --peers tcp://validator-0:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-2:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-2
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-2 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-2/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-2:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-3:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-3
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-3 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-3/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-3:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800,tcp://validator-2:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-4:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-4
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-4 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-4/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-4:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800,tcp://validator-2:8800,tcp://validator-3:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  rest-api-0:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-0
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-0:4004 \
          --bind rest-api-0:8008
      "
    stop_signal: SIGKILL

  rest-api-1:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-1
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-1:4004 \
          --bind rest-api-1:8008
      "
    stop_signal: SIGKILL

  rest-api-2:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-2
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-2:4004 \
          --bind rest-api-2:8008
      "
    stop_signal: SIGKILL

  rest-api-3:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-3
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-3:4004 \
          --bind rest-api-3:8008
      "
    stop_signal: SIGKILL

  rest-api-4:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-4
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-4:4004 \
          --bind rest-api-4:8008
      "
    stop_signal: SIGKILL



  kvstore-processor-0:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-0
    container_name: kvstore-processor-0
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-0:4004
       "

  kvstore-processor-1:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-1
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-1:4004
       "

  kvstore-processor-2:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-2
    container_name: kvstore-processor-2
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-2:4004
       "

  kvstore-processor-3:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-3
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-3:4004
       "

  kvstore-processor-4:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-4
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-4:4004
       "





  settings-tp-0:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-0
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-0:4004
    stop_signal: SIGKILL

  settings-tp-1:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-1
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-1:4004
    stop_signal: SIGKILL

  settings-tp-2:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-2
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-2:4004
    stop_signal: SIGKILL

  settings-tp-3:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-3
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-3:4004
    stop_signal: SIGKILL

  settings-tp-4:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-4
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-4:4004
    stop_signal: SIGKILL

  poet-engine-0:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-0
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        if [ ! -f /poet-shared/poet-enclave-measurement ]; then \
            poet enclave measurement >> /poet-shared/poet-enclave-measurement; \
        fi && \
        if [ ! -f /poet-shared/poet-enclave-basename ]; then \
            poet enclave basename >> /poet-shared/poet-enclave-basename; \
        fi && \
        if [ ! -f /poet-shared/simulator_rk_pub.pem ]; then \
            cp /etc/sawtooth/simulator_rk_pub.pem /poet-shared; \
        fi && \
        while [ ! -f /poet-shared/validator-0/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-0/keys /etc/sawtooth && \
        poet registration create -k /etc/sawtooth/keys/validator.priv -o /poet-shared/poet.batch && \
        poet-engine -C tcp://validator-0:5050 --component tcp://validator-0:4004 \
    \""

  poet-engine-1:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-1
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-1/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-1/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-1:5050 --component tcp://validator-1:4004 \
    \""

  poet-engine-2:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-2
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-2/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-2/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-2:5050 --component tcp://validator-2:4004 \
    \""

  poet-engine-3:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-3
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-3/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-3/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-3:5050 --component tcp://validator-3:4004 \
    \""

  poet-engine-4:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-4
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-4/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-4/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-4:5050 --component tcp://validator-4:4004 \
    \""

  poet-validator-registry-tp-0:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-0
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-0:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-1:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-1
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-1:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-2:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-2
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-2:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-3:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-3
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-3:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-4:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-4
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-4:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

    # --------------- block server subscriber & transaction server ----------------#
  intkey-rest-api:
    build:
      context: .
      dockerfile: rest_api/Dockerfile
    image: intkey-rest-api
    container_name: intkey-rest-api
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '3000:8000'
    command: |
      bash -c "
      chmod +x /project/sawtooth_sdk_python/bin/rest-api
      rest-api \
          -b intkey-rest-api:8000 \
          --keyfile /root/.sawtooth/keys/root.priv \
          --url rest-api-0:8008
      "
  block-server-subscriber:
    build:
      context: .
      dockerfile: block_server_subscriber/Dockerfile
    image: block-server-subscriber
    container_name: block-server-subscriber
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '9002:9002'
    depends_on:
      - validator-0
    command: |
      sh -c "
      chmod +x /project/sawtooth_sdk_python/bin/block-server-subscriber
      block-server-subscriber \
          -C tcp://validator-0:4004 \
          --url rest-api-0:8008 \
          --uri  mongodb://root:password@bb:27017/ \
          -vv
      "
  block-server-rest-api:
    build:
      context: .
      dockerfile: block_server_api/Dockerfile
    image: block-server-rest-api
    container_name: block-server-rest-api
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '9001:9001'
    command: |
      sh -c "
      chmod +x /project/sawtooth_sdk_python/bin/block-server-api
      block-server-api \
           -b block-server-rest-api:9001 \
           --uri  mongodb://root:password@bb:27017/ \
           -vv
      "
    # -------------database for off chain data --------
  bb:
    image: mongo:3-xenial
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: password
    restart: always
    ports:
      - 27018:27017

  bbmanager:
    image: mongo-express
    links:
      - bb
    ports:
      - 27019:8081
    restart: always
    environment:
      ME_CONFIG_MONGODB_SERVER: bb
      ME_CONFIG_MONGODB_ADMINUSERNAME: root
      ME_CONFIG_MONGODB_ADMINPASSWORD: password

@rowaisi
Copy link

rowaisi commented Jul 20, 2021

Dear @arsulegai,

could you please check this bug above? We have no problem with either Raft or PBFT in Hyperledger Sawtooth. But we have issue that after 5-10 mins when we start sending transactions, with 5 nodes, the system starts crashes with the error 429 too many requests. Do you have any suggestions? I assume that you work with this repository and you might have experienced some idea to cope with such a problem.

Thanks for your kind consideration.

@arsulegai
Copy link
Contributor

@rowaisi it was long ago, but sure. I need logs from the PoET engine and the Validator to analyze in detail.

Following is a possibility, only logs can confirm

The tag sawtooth-poet-engine:chime on docker hub appears to be from 2 years ago, I am not sure if it has all the fixes. I have a debugger tool to check if state machine has corrupted. Please run the tool https://github.com/arsulegai/state-checker on the PoET engine logs. Run the tool against the configuration in https://github.com/arsulegai/state-checker#use-case-1-simple-state-transition-with-pattern-matching. See if you observe any corruption, this tool was initially written to analyze the state corruption in PoET logs.

I observed one such behavior as you are facing, it was long back and hyperledger-archives/sawtooth-poet@754766d was the fix for that issue.

@wejdeneHaouari
Copy link
Author

Hello @arsulegai @peterschwarz @agunde406 @vaporos and @rbuysse,

this is the logs of sawtooth-validator-default-0 container when I got the error, same thing for the other validators

[2021-07-21 16:58:46.699 INFO     (unknown file)] [src/journal/publisher.rs: 172] Now building on top of block, Block(id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0, block_num: 48, state_root_hash: 86159582893d41fa3d7896bd07372bfd8e8dfe77f51f858656166b1c17c9ebbc, previous_block_id: 83419debbcc8ffba1459127ef0552c32fe63f3e4d16bcd95f6393d02b39ea9c10fe417e9938981860529e0c6fd9cce708de22a9ed0343847d6e75e12b1bd0495)
[2021-07-21 16:58:56.666 INFO     (unknown file)] [src/journal/block_validator.rs: 265] Block 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634 passed validation
[2021-07-21 16:58:56.804 INFO     (unknown file)] [src/journal/chain.rs: 206] Building fork resolution for chain head 'Block(id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0, block_num: 48, state_root_hash: 86159582893d41fa3d7896bd07372bfd8e8dfe77f51f858656166b1c17c9ebbc, previous_block_id: 83419debbcc8ffba1459127ef0552c32fe63f3e4d16bcd95f6393d02b39ea9c10fe417e9938981860529e0c6fd9cce708de22a9ed0343847d6e75e12b1bd0495)' against new block 'Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)'
[2021-07-21 16:58:56.810 INFO     (unknown file)] [src/journal/chain.rs: 791] Chain head updated to Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)
[2021-07-21 16:58:56.820 INFO     (unknown file)] [src/journal/publisher.rs: 172] Now building on top of block, Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)
[2021-07-21 16:59:25.719 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:35.720 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:45.722 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:55.723 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 17:00:05.725 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 17:00:10.431 INFO     back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 421, limit: 420


@rowaisi
Copy link

rowaisi commented Jul 21, 2021

Dear Members: @arsulegai @vaporos @peterschwarz @buysse and @isabeltomb,

We are very interested in to use Hypereldger Sawtooth for Canadian SupplyChain project and the issue with PoET is stopping us. I will be glad if someone can run the image of PoET and explore the issue. This backpressure causes PoET to crash and it does not work after we start using it with more users in 5 - 10 min max. This issue does not happen on both Raft and PBFT in Sawtooth.

We would be very much appreciate if somebody can take a look on logs and images.

Your consideration is highly appreciated.

@agunde406
Copy link
Contributor

Hello, I was not able to replicate this issue locally.

The commit that @arsulegai mentioned above has not been released but should be available in the nightly build.

I would suggest the following:

  1. Try with the PoET nightly image. This can be done just by swapping in hyperledger/sawtooth-poet-engine:nightly and hyperledger/sawtooth-poet-validator-registry-tp:nightly for the chime equivalent images. If that seems to fix your issues we can work towards getting a new release out.
  2. If not, you may need to double check that you're not finding a hash-mismatch somewhere. This could be caused by some kind of non-determinism issue in your transaction processor.

Any other errors that show up would also be helpful in figuring out what may be wrong, especially from the poet-engine. For example when you say crash is there an error message?

@rowaisi
Copy link

rowaisi commented Aug 5, 2021

Dear @vaporos,

could you please help us regarding the specified issue. Please kindly look at the images that we use to run our Sawtooth network with PoET and after 5-10 of running network and sending transactions, the system is saturated with too many requests errors and even by reducing the transactions it wont be recovered. This issue does not exist in both PBFT and Raft.

I hope that you can guide us and let us know an option to see how can we surpass this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

5 participants