Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bollard is reading partial json data in create_image #190

Closed
Veetaha opened this issue Jan 25, 2022 · 10 comments · Fixed by #196
Closed

Bollard is reading partial json data in create_image #190

Veetaha opened this issue Jan 25, 2022 · 10 comments · Fixed by #196

Comments

@Veetaha
Copy link

Veetaha commented Jan 25, 2022

When using create_image we've stumbled with the following error on our CI. It appears randomly just due to the nature of this bug. Looks like the response stream that bollard receives from docker may be read in chunks that are not aligned with the JSON data.

Here is the log from our test that failed due to this bug:

  Jan 25 16:27:41.489 DEBUG bollard::read: Decoding JSON line from stream: {"status":"Extracting","progressDetail":{"current":33980416,"total":102266715}

  Jan 25 16:27:41.489 DEBUG bollard::read: Decoding JSON line from stream: ,"progress":"[================\u003e                                  ]  33.98MB/102.3MB","id":"14d349bd5978"}


thread 'smoke::local_smoke_test' panicked at 'called `Result::unwrap()` on an `Err` value: DockerSdk(Raw(JsonSerdeError { err: Error("expected value", line: 1, column: 1) }))',

The first read ended up with a chunk in the middle of the JSON object, so is_eof() returned true here:

Err(e) if e.is_eof() => Ok(None),

But the next line resulted in a JSON parsing error right at the start, so the error was returned.

The invocation of create_image() in our test looks like this:

self.create_image(
    Some(bollard::image::CreateImageOptions {
        from_image: "amazon/dynamodb-local:1.18.0",
        ..Default::default()
    }),
    /* root_fs */ None,
    /* credentials */ None,
)
.try_for_each(|info| {
    debug!(
        image,
        status = info.status.as_deref(),
        progress = info.progress.as_deref(),
        "Pulling..."
    );
    future::ok(())
})
.await;
@fussybeaver
Copy link
Owner

Thanks for reporting, we added the is_eof call recently, I'll take another look.

@Veetaha
Copy link
Author

Veetaha commented Jan 25, 2022

@fussybeaver, to be clear, the chunks that were read were the following:

{"status":"Extracting","progressDetail":{"current":33980416,"total":102266715}

and

,"progress":"[================\u003e                                  ]  33.98MB/102.3MB","id":"14d349bd5978"}

If you take the first chunk and run serde_json::from_str() on it, it will return EOF error. Here is the link to the playground.

But the next chunk was failed to be read with not EOF, but syntax error.

This means the first chunk was just skipped, but the rest of the object caused a hard error

fussybeaver added a commit that referenced this issue Feb 8, 2022
Improves the json parser to handler scenarios:
 - the buffer ends in a '}' and is not a parseable json
 - newlines inside the json payload

Closes #190
fussybeaver added a commit that referenced this issue Feb 8, 2022
Improves the json parser to handler scenarios:
 - the buffer ends in a '}' and is not a parseable json
 - newlines inside the json payload

Closes #190
fussybeaver added a commit that referenced this issue Feb 8, 2022
Improves the json parser to handler scenarios:
 - the buffer ends in a '}' and is not a parseable json
 - newlines inside the json payload

Closes #190
@fussybeaver
Copy link
Owner

Hi @Veetaha can you check if this is now fixed ?

@Veetaha
Copy link
Author

Veetaha commented Feb 9, 2022

@fussybeaver great, thank you!
I don't think I will be able to test it thoroughly, but our CI runs every day and we currently just retry this error, just as any network operation.
If I notice it again, I'll make sure to create an issue!

@Veetaha
Copy link
Author

Veetaha commented Feb 9, 2022

@fussybeaver I would be greatful if you could make a patch release for this, and I'll be able to upgrade us to the new version

@fussybeaver
Copy link
Owner

Since this is a change in some 'hot' code, I'd like get this tested perhaps also in some upstream projects that are used more heavily, before releasing a new version.

@Veetaha
Copy link
Author

Veetaha commented Feb 14, 2022

@fussybeaver I don't think that we will be able to debug it on our CI, taking into account how much flaky CI costs to us. We've already added retries for errors from bollard there, so we won't notice if this was fixed or not.
However, I've left the following test running overnight:

use futures::prelude::*;

#[tokio::test]
async fn luck_testing() {
    env_logger::init();

    let docker = bollard::Docker::connect_with_local_defaults().unwrap();

    let image = "amazon/dynamodb-local:1.18.0";
    let timeout = std::time::Duration::from_secs(6 * 60 * 60);
    let start = std::time::Instant::now();

    for i in 0.. {
        eprintln!("[#{i}] Pulling image...");
        docker
            .create_image(
                Some(bollard::image::CreateImageOptions {
                    from_image: image,
                    ..Default::default()
                }),
                /* root_fs */ None,
                /* credentials */ None,
            )
            .for_each(|info| {
                if let Err(err) = info {
                    eprintln!("[#{i}]: Pulling failed! {err:#?}");
                }
                future::ready(())
            })
            .await;

        docker.remove_image(image, None, None).await.unwrap();

        if start.elapsed() >= timeout {
            eprintln!("[#{i}] Timeout reached, exitting...");
            break;
        }
    }
}

It failed on an unrelated error in 1h 10m 59s with this:

[2022-02-14T06:50:11Z DEBUG bollard::read] Decoding JSON line from stream: {"errorDetail":{"message":"error pulling image configuration: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/39/39d290d0ed044e20481d8a02dfb84509e48426e99a7175d1412299db837818eb/data?verify=1644824401-2%2BNoBGk55KVtN7yebBVVT%2B%2Fmd1A%3D\": dial tcp: lookup production.cloudflare.docker.com: Temporary failure in name resolution"},"error":"error pulling image configuration: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/39/39d290d0ed044e20481d8a02dfb84509e48426e99a7175d1412299db837818eb/data?verify=1644824401-2%2BNoBGk55KVtN7yebBVVT%2B%2Fmd1A%3D\": dial tcp: lookup production.cloudflare.docker.com: Temporary failure in name resolution"}
[2022-02-14T06:50:11Z DEBUG bollard::uri] Parsing uri: unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0, client_type: Unix, socket: /var/run/docker.sock
[2022-02-14T06:50:11Z DEBUG bollard::docker] unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0
[2022-02-14T06:50:11Z DEBUG bollard::docker] request: Request { method: DELETE, uri: unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0, version: HTTP/1.1, headers: {"x-registry-auth": "eyJ1c2VybmFtZSI6bnVsbCwicGFzc3dvcmQiOm51bGwsImF1dGgiOm51bGwsImVtYWlsIjpudWxsLCJzZXJ2ZXJhZGRyZXNzIjpudWxsLCJpZGVudGl0eXRva2VuIjpudWxsLCJyZWdpc3RyeXRva2VuIjpudWxsfQ==", "content-type": "application/json"}, body: Body(Empty) }
thread 'bollard::luck_testing' panicked at 'called `Result::unwrap()` on an `Err` value: DockerResponseNotFoundError { message: "{\"message\":\"No such image: amazon/dynamodb-local:1.18.0\"}\n" }', crates/foo/src/bollard.rs:32:54

This looks like an unrelated error, so I'm going to open a separate issue.
I'll add an ignore for this kind of error to the test and try it one more time anyway.

@fussybeaver
Copy link
Owner

Thanks for your efforts..

@Veetaha
Copy link
Author

Veetaha commented Feb 15, 2022

@fussybeaver Today I've corrected that test not to exit early in remove_image() errors and ran it for 6 hours and it didn't fail a single time finishing at 1702 iterations.

[#1702] Timeout reached, exiting...

I ran the test with

~/junk/rust-sandbox $ RUST_LOG=debug cargo test --release --lib -- luck_testing --nocapture 2>&1 | dd bs=1 count=40000000000 > log.log
63910839+0 records in
63910839+0 records out
63910839 bytes (64 MB, 61 MiB) copied, 21610.1 s, 3.0 kB/s
Took 6h 0m 10s                                         
                                                                                                                                 
~/junk/rust-sandbox $ cat log.log | grep fail                         
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 21605.39s

~/junk/rust-sandbox $ du -h log.log             
61M     log.log

This makes me more confident that the bug should've been fixed

@fussybeaver
Copy link
Owner

Excellent. That sounds promising.

tot0 added a commit to tot0/bollard that referenced this issue Aug 24, 2022
Improves the json parser to handler scenarios:
 - the buffer ends in a '}' and is not a parseable json
 - newlines inside the json payload

Closes fussybeaver#190
 master (fussybeaver#196)
 v0.13.0 v0.12.0
tot0 added a commit to tot0/bollard that referenced this issue Aug 24, 2022
Improves the json parser to handler scenarios:
 - the buffer ends in a '}' and is not a parseable json
 - newlines inside the json payload

Closes fussybeaver#190
 master (fussybeaver#196)
 v0.13.0 v0.12.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants