Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure.Messaging.EventHubs Checkpointing Implementation Changed as of v5.11.0 Causing Incorrect Behavior In Event Hub scaler #5574

Closed
windy1 opened this issue Mar 5, 2024 · 7 comments · Fixed by #5600
Labels
bug Something isn't working

Comments

@windy1
Copy link
Contributor

windy1 commented Mar 5, 2024

Report

Related: Azure/azure-sdk-for-net#42409

As of Azure.Messaging.EventHubs v5.11.0, the format of checkpoints written to blob storage has changed. Notably, the value of offset is null. In our case, this caused KEDA to over-scale a service. Downgrading the SDK to v5.10.0 resolved the issue.

In response to my original issue, it was asserted by a Microsoft contributor that this change is intentional and that the implementation of checkpoints are not to be relied upon.

Please refer to the original issue for more in-depth details and reproduction steps.

Expected Behavior

I expected KEDA to scale my service appropriately.

Actual Behavior

KEDA over-scaled the service until the Azure SDK was downgraded and the old checkpoint format restored.

Steps to Reproduce the Problem

private async Task ProcessEventAsync(ProcessEventArgs eventArgs)
{
    await eventArgs.UpdateCheckpointAsync(); // breakpoint here and examine Offset in memory
    log.LogInformation("checkpoint created"); // breakpoint here and examine offset in checkpoint
}

Logs from KEDA operator

No response

KEDA Version

2.12.1

Kubernetes Version

None

Platform

Microsoft Azure

Scaler Details

Azure Event Hubs

Anything else?

No response

@windy1 windy1 added the bug Something isn't working label Mar 5, 2024
@JorTurFer
Copy link
Member

Hello @windy1 ,
I can reproduce the issue, thanks for reporting it!

After checking the code, it looks that we still use offset field sometimes within the code, although we don't use it for any calculation (just for some ifs). For example here and here

Are you willing to open a PR including another check for the sequence? I guess that we cannot get rid of the offset field as it's used by Az functions AFAIK, but we should check if offset AND sequence are empty before returning an error

@windy1
Copy link
Contributor Author

windy1 commented Mar 11, 2024

Hi @JorTurFer I decided I would take a look, but I haven't been able to build the dev container successfully on my machine:

[2024-03-11T18:42:59.417Z] ERROR: failed to solve: process "/bin/sh -c apt-get update     && apt-get -y install --no-install-recommends apt-utils dialog unzip 2>&1     && apt-get -y install git iproute2 procps lsb-release     && go get -x -d github.com/stamblerre/gocode 2>&1     && go build -o gocode-gomod github.com/stamblerre/gocode     && mv gocode-gomod $GOPATH/bin/     && go get -u -v         github.com/mdempsky/gocode         github.com/uudashr/gopkgs/cmd/gopkgs         github.com/ramya-rao-a/go-outline         github.com/acroca/go-symbols         github.com/godoctor/godoctor         golang.org/x/tools/cmd/guru         golang.org/x/tools/cmd/gorename         github.com/rogpeppe/godef         github.com/zmb3/gogetdoc         github.com/haya14busa/goplay/cmd/goplay         github.com/sqs/goreturns         github.com/josharian/impl         github.com/davidrjenni/reftools/cmd/fillstruct         github.com/fatih/gomodifytags         github.com/cweill/gotests/...         golang.org/x/tools/cmd/goimports         golang.org/x/lint/golint
[2024-03-11T18:42:59.417Z]          github.com/alecthomas/gometalinter 2>&1         github.com/mgechev/revive         github.com/derekparker/delve/cmd/dlv 2>&1     && go install honnef.co/go/tools/cmd/staticcheck@latest     && go install golang.org/x/tools/gopls@latest     && PROTOC_VERSION=21.9     && if [ $(dpkg --print-architecture) = \"amd64\" ]; then PROTOC_ARCH=\"x86_64\"; else PROTOC_ARCH=\"aarch_64\" ; fi     && curl -LO \"https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$PROTOC_ARCH.zip\"     && unzip \"protoc-${PROTOC_VERSION}-linux-$PROTOC_ARCH.zip\" -d $HOME/.local     && mv $HOME/.local/bin/protoc /usr/local/bin/protoc     && mv $HOME/.local/include/ /usr/local/bin/include/     && protoc --version     && curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.55.2     && groupadd --gid $USER_GID $USERNAME     && useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME     && apt-get in
[2024-03-11T18:42:59.417Z] stall -y sudo     && echo $USERNAME ALL=\\(root\\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME     && chmod 0440 /etc/sudoers.d/$USERNAME     && sudo install -m 0755 -d /etc/apt/keyrings     && curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg     && sudo chmod a+r /etc/apt/keyrings/docker.gpg     && echo       \"deb [arch=\"$(dpkg --print-architecture)\" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian       \"$(. /etc/os-release && echo \"$VERSION_CODENAME\")\" stable\" |       sudo tee /etc/apt/sources.list.d/docker.list > /dev/null     && sudo apt-get update     && apt-get install -y docker-ce-cli     && apt-get -y install python3-pip     && python3 -m pip install --no-cache-dir --break-system-packages pre-commit     && apt-get autoremove -y     && apt-get clean -y     && rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 1
[2024-03-11T18:42:59.423Z] Stop (43323 ms): Run: docker buildx build --load --build-arg BUILDKIT_INLINE_CACHE=1 -f /var/folders/bg/dth_vb4s44g88qnk9g42r2vr0000gp/T/devcontainercli/container-features/0.56.2-1710182536098/Dockerfile-with-features -t vsc-keda-bde1e7825acddf40d31270b78aad0daad7b61f69f004dcf7a3d5ac01177433e8 --target dev_containers_target_stage --build-arg _DEV_CONTAINERS_BASE_IMAGE=dev_container_auto_added_stage_label /Users/wzs02/code/atrius/keda/.devcontainer
[2024-03-11T18:42:59.424Z] Error: Command failed: docker buildx build --load --build-arg BUILDKIT_INLINE_CACHE=1 -f /var/folders/bg/dth_vb4s44g88qnk9g42r2vr0000gp/T/devcontainercli/container-features/0.56.2-1710182536098/Dockerfile-with-features -t vsc-keda-bde1e7825acddf40d31270b78aad0daad7b61f69f004dcf7a3d5ac01177433e8 --target dev_containers_target_stage --build-arg _DEV_CONTAINERS_BASE_IMAGE=dev_container_auto_added_stage_label /Users/wzs02/code/atrius/keda/.devcontainer
[2024-03-11T18:42:59.425Z]     at BtA (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:465:1933)
[2024-03-11T18:42:59.425Z]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[2024-03-11T18:42:59.425Z]     at async K0 (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:464:1841)
[2024-03-11T18:42:59.425Z]     at async yH (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:464:610)
[2024-03-11T18:42:59.425Z]     at async StA (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:481:3660)
[2024-03-11T18:42:59.425Z]     at async ZC (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:481:4775)
[2024-03-11T18:42:59.425Z]     at async trA (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:614:11269)
[2024-03-11T18:42:59.425Z]     at async erA (/Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js:614:11010)
[2024-03-11T18:42:59.429Z] Stop (44293 ms): Run: /Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper (Plugin).app/Contents/MacOS/Code Helper (Plugin) /Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js up --user-data-folder /Users/wzs02/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-containers/data --container-session-data-folder /tmp/devcontainers-9e07ae93-2d2b-4cef-9981-98e1be114e9a1710182534355 --workspace-folder /Users/wzs02/code/atrius/keda --workspace-mount-consistency cached --id-label devcontainer.local_folder=/Users/wzs02/code/atrius/keda --id-label devcontainer.config_file=/Users/wzs02/code/atrius/keda/.devcontainer/devcontainer.json --log-level debug --log-format json --config /Users/wzs02/code/atrius/keda/.devcontainer/devcontainer.json --default-user-env-probe loginInteractiveShell --mount type=volume,source=vscode,target=/vscode,external=true --skip-post-create --update-remote-user-uid-default on --mount-workspace-git-root
[2024-03-11T18:42:59.429Z] Exit code 1
[2024-03-11T18:42:59.432Z] Command failed: /Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper (Plugin).app/Contents/MacOS/Code Helper (Plugin) /Users/wzs02/.vscode/extensions/ms-vscode-remote.remote-containers-0.348.0/dist/spec-node/devContainersSpecCLI.js up --user-data-folder /Users/wzs02/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-containers/data --container-session-data-folder /tmp/devcontainers-9e07ae93-2d2b-4cef-9981-98e1be114e9a1710182534355 --workspace-folder /Users/wzs02/code/atrius/keda --workspace-mount-consistency cached --id-label devcontainer.local_folder=/Users/wzs02/code/atrius/keda --id-label devcontainer.config_file=/Users/wzs02/code/atrius/keda/.devcontainer/devcontainer.json --log-level debug --log-format json --config /Users/wzs02/code/atrius/keda/.devcontainer/devcontainer.json --default-user-env-probe loginInteractiveShell --mount type=volume,source=vscode,target=/vscode,external=true --skip-post-create --update-remote-user-uid-default on --mount-workspace-git-root
[2024-03-11T18:42:59.432Z] Exit code 1

OS: macOS Sonoma 14.3.1
Docker: latest

@JorTurFer
Copy link
Member

Nice catch! I've drafted a PR with a fix for devcontainers image. Could you apply that change? Basically you have to remove the line golang.org/x/tools/cmd/guru because it's deprecated

@windy1
Copy link
Contributor Author

windy1 commented Mar 13, 2024

Are you willing to open a PR including another check for the sequence? I guess that we cannot get rid of the offset field as it's used by Az functions AFAIK, but we should check if offset AND sequence are empty before returning an error

Regarding this bit, I was toying around with this today and I eventually came to the conclusion that checking if the offset, or the sequence number, is wholly redundant; at least in a dotnet / Azure SDK context.

Unless I am misunderstanding here, there shouldn't ever exist checkpoints where the sequence number is empty, and as the code is written right now, this would fail anyway as it is expected to be an integer. Checkpoints for partitions that have never been checkpointed would simply not exist, thus; initialized checkpoints will always have a sequence number.

It's very possible I am missing information about the checkpointing implementations in other contexts, such as Azure Functions, as you mentioned, but as far as I can tell, this is an impossible scenario if you are creating checkpoints through the Azure SDK.

@JorTurFer
Copy link
Member

Yeah, that's the point I meant. Sequence number is the used property for all the calculations, but although the offset isn't used anywhere for the calculations, there are some if statements where the offset is used instead of the sequence number. We have to get rid of the offset usage for is statements in favor of sequence number

@windy1
Copy link
Contributor Author

windy1 commented Mar 13, 2024

What I'm saying is that I think the section you linked before can be removed completely and not replaced by some alternative method.

@JorTurFer
Copy link
Member

I get your point, and I think you're right xD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ready To Ship
Development

Successfully merging a pull request may close this issue.

2 participants