Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANNOT_READ_ALL_DATA when calling end() #109

Open
riccardogabellone opened this issue Jun 18, 2024 · 3 comments
Open

CANNOT_READ_ALL_DATA when calling end() #109

riccardogabellone opened this issue Jun 18, 2024 · 3 comments

Comments

@riccardogabellone
Copy link

riccardogabellone commented Jun 18, 2024

Hi! I'm having this kind of issue, already encountered by someone else:

ERROR bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 38. Bytes expected: 116.: (at row 1) : While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.2.2.16288 (official build))

I cannot figure out what is going wrong...

Here are my rust code:

use std::{
    net::{Ipv4Addr, Ipv6Addr},
    u64,
};
use serde_repr::{Deserialize_repr, Serialize_repr};
use time::OffsetDateTime;

#[derive(Debug, serde::Serialize, serde::Deserialize, clickhouse::Row)]
pub struct MyLogMessage {
    #[serde(with = "clickhouse::serde::time::datetime64::millis")]
    pub timestamp: OffsetDateTime,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub api_key: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde(with = "clickhouse::serde::ipv4::option")]
    pub ip4: Option<Ipv4Addr>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub ip6: Option<Ipv6Addr>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub auth: Option<bool>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub user_agent: Option<String>,
    pub req_size: u32,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub req_content_type: Option<String>,
    pub content_length: u64,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub content_type: Option<String>,
    pub method: HttpMethod,
    pub host: String,
    pub service: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub path: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub query: Option<String>,
    pub status_code: u16,
    pub execution_time_ms: u32,
    pub version: String,
}

#[derive(Debug, Serialize_repr, Deserialize_repr)]
#[repr(u8)]
pub enum HttpMethod {
    _UN = 0,
    GET = 1,
    HEAD = 2,
    POST = 3,
    PUT = 4,
    DELETE = 5,
    CONNECT = 6,
    OPTIONS = 7,
    TRACE = 8,
    PATCH = 9,
}

fn main() {
    let mut inserter = clickhouse
        .inserter::<MyLogMessage>("my_log")
        .unwrap();

    let mut messages = vec![MyLogMessage { ... }, ...];
    while let Some(msg) = &messages.pop() {
        inserter.write(&msg).await.unwrap();
    }

    match inserter.end().await {
        Ok(_) => {
            consumer.commit_consumer_state(CommitMode::Sync).unwrap();
            tracing::info!("ALL MSGs SAVED.");
        }
        Err(e) => {
            tracing::error!("{e}");  // Here is the CANNOT_READ_ALL_DATA DB::Exception
        }
    };
}

And the clickhouse DB:

CREATE TABLE my_log
(
    `timestamp` DateTime64(3),
    `api_key` String,
    `ip4` IPv4,
    `ip6` IPv6,
    `version` String,
    `auth` Bool,
    `user_agent` String,
    `req_size` UInt32,
    `req_content_type` String,
    `content_length` UInt64,
    `content_type` String,
    `method` Enum8('_' = 0, 'GET' = 1, 'HEAD' = 2, 'POST' = 3, 'PUT' = 4, 'DELETE' = 5, 'CONNECT' = 6, 'OPTIONS' = 7, 'TRACE' = 8, 'PATCH' = 9),
    `host` String,
    `service` String,
    `path` String,
    `query` String,
    `status_code` UInt16,
    `execution_time_ms` UInt32
)
ENGINE = MergeTree;

Maybe I missed something from docs?

Could be any of those Option<_> that are not annotated in any way (despite I guess skip_serializing_if should be enough)?

Or, do I need to put Nullable for each column to match rust impl? as I test, CH put default values anyway, so, as long as they are skipped, there should not be any exception, right?

Or, is it better to put serde defaults without Option<_> wrappers?

EDIT: I also tried the wa-37420 feature flag. Same results

@riccardogabellone
Copy link
Author

I tested some of MyLogMessage elements built from json raw data:

  • with this example, it raises CANNOT_READ_ALL_DATA
{
    "timestamp":1718746919678
    "ip4":2130706433
    "user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
    "req_size":0
    "content_length":0
    "content_type":"image/x-icon"
    "method":1
    "host":"localhost"
    "service":"favicon.ico"
    "path":"/"
    "status_code":200
    "execution_time_ms":0
    "version":"HTTP/1.1"
}
  • with this example, it raises ATTEMPT_TO_READ_AFTER_EOF
{
    "timestamp":1718788113783
    "api_key":"<my-key>"
    "ip4":2130706433
    "auth":true
    "user_agent":"insomnia/9.2.0"
    "req_size":0
    "req_content_type":"application/json"
    "content_length":4242
    "content_type":"application/json"
    "method":1
    "host":"localhost"
    "service":"my-service"
    "path":"/"
    "query":"f1=true&f2=true"
    "status_code":200
    "execution_time_ms":77
    "version":"HTTP/1.1"
}

@riccardogabellone
Copy link
Author

Or, is it better to put serde defaults without Option<_> wrappers?

Ok. I managed it to fix only in this way! the other ones don't seem to work

@riccardogabellone
Copy link
Author

maybe, it is better to close the issue only after some of your checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant