Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test-tooling): failing fabric AIO container launch #320

Closed
petermetz opened this issue Oct 19, 2020 · 4 comments · Fixed by #1300
Closed

fix(test-tooling): failing fabric AIO container launch #320

petermetz opened this issue Oct 19, 2020 · 4 comments · Fixed by #1300
Assignees
Labels
bug Something isn't working dependencies Pull requests that update a dependency file Developer_Experience Fabric
Milestone

Comments

@petermetz
Copy link
Member

Describe the bug

CI tests are failing in the test tooling package that verifies that the Fabric all in one (AIO) image works as expected.

To Reproduce

Appears to be flaky.
Run the CI script to attempt to reproduce: npm run run-ci

Expected behavior

Test should be consistently failing or succeeding.

Logs/Stack traces

https://travis-ci.org/github/hyperledger/cactus/jobs/735204836

Cloud provider or hardware configuration:

Travis CI virtual machine

Operating system name, version, build:

Details are in the linked logs above.

Hyperledger Cactus release version or commit (git rev-parse --short HEAD):

a74a7ed

Hyperledger Cactus Plugins/Connectors Used

Fabric

Additional context

Issue appears to be about ports that are already allocated. The Fabric AIO image has this limitation that it does not randomly assign published ports because we did not yet finish the docker in docker (DIND) support for it and that is needed for the peer containers (or we must bind to specific host ports instead of random ones and for now that's just how we do it)

@petermetz petermetz added bug Something isn't working Fabric labels Oct 19, 2020
@petermetz petermetz added this to the v1.0.0 milestone Oct 19, 2020
@petermetz
Copy link
Member Author

A little more investigation and I'm thinking this is most likely due to the fixed host ports we use. The CI VM runs the CI script twice against both NodeJS 12 and 14 and so one of them gets knocked out when it tires to launch the AIO container while the other instance of the CI script is doing the same thing sitting on the host port they both want.

Meaning that this issue will most likely be fixed by #279 or anything else that gives us PublishAllPorts: true capabilities for the Fabric AIO images.

@petermetz
Copy link
Member Author

Related: MiniFabric might provide a solution idea that we can re-use or just flat out make our own Fabric AIO image inherit from the MiniFabric image... Not sure yet, but I've made some inquiries here: hyperledger-labs/minifabric#105

@petermetz
Copy link
Member Author

Related: MiniFabric might provide a solution idea that we can re-use or just flat out make our own Fabric AIO image inherit from the MiniFabric image... Not sure yet, but I've made some inquiries here: hyperledger-labs/minifabric#105

Unfortunately MiniFabric does not support pulling up multiple ledgers, just multiple channels within the same ledger but that is not a good fit for our tests. Still worth evaluating as some kind of workaround to use it if we cannot make DinD work within reasonable time.

@petermetz
Copy link
Member Author

This is pretty unbounded in complexity because we are missing a feature from the Fabric NodeJS SDK to support discovery on non-standard ports/customizable ports. One option would be to monkey patch the Fabric SDK, which, if it works, then this was actually pretty easy to solve. We haven't done the exploration yet to check this though.

@petermetz petermetz self-assigned this Sep 2, 2021
petermetz added a commit to petermetz/cacti that referenced this issue Sep 3, 2021
Epic facepalm once again. Turns out the default restart try
count of supervisord is too low which leads to race conditions.
Increasing the retry count from 4 to 20 should do it, this way
the fabric-network process (see supervisord.conf file) should
be 5 times as "patient" waiting for the docker daemon to launch
within the AIO container.

What was happening before is that the fabric-network script
tried launching itself in parallel with the docker daemon, but
it would time out before the docker daemon could come online.

Published these images as
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2021-09-02--fix-876-supervisord-retries
and
ghcr.io/hyperledger/cactus-fabric-all-in-one:2021-09-02--fix-876-supervisord-retries

Fixes hyperledger#718
Fixes hyperledger#876
Fixes hyperledger#320
Fixes hyperledger#319

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
petermetz added a commit that referenced this issue Sep 7, 2021
Epic facepalm once again. Turns out the default restart try
count of supervisord is too low which leads to race conditions.
Increasing the retry count from 4 to 20 should do it, this way
the fabric-network process (see supervisord.conf file) should
be 5 times as "patient" waiting for the docker daemon to launch
within the AIO container.

What was happening before is that the fabric-network script
tried launching itself in parallel with the docker daemon, but
it would time out before the docker daemon could come online.

Published these images as
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2021-09-02--fix-876-supervisord-retries
and
ghcr.io/hyperledger/cactus-fabric-all-in-one:2021-09-02--fix-876-supervisord-retries

Fixes #718
Fixes #876
Fixes #320
Fixes #319

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
RafaelAPB pushed a commit to RafaelAPB/blockchain-integration-framework that referenced this issue Mar 9, 2022
Epic facepalm once again. Turns out the default restart try
count of supervisord is too low which leads to race conditions.
Increasing the retry count from 4 to 20 should do it, this way
the fabric-network process (see supervisord.conf file) should
be 5 times as "patient" waiting for the docker daemon to launch
within the AIO container.

What was happening before is that the fabric-network script
tried launching itself in parallel with the docker daemon, but
it would time out before the docker daemon could come online.

Published these images as
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2021-09-02--fix-876-supervisord-retries
and
ghcr.io/hyperledger/cactus-fabric-all-in-one:2021-09-02--fix-876-supervisord-retries

Fixes hyperledger#718
Fixes hyperledger#876
Fixes hyperledger#320
Fixes hyperledger#319

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
ryjones pushed a commit that referenced this issue Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file Developer_Experience Fabric
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant