Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errored merging output after 273.873702ms: The XML you provided was not well-formed or did not validate against our published schema. #4466

Open
benwbooth opened this issue Jan 3, 2020 · 7 comments
Assignees

Comments

@benwbooth
Copy link

benwbooth commented Jan 3, 2020

What happened?:

I'm running pachyderm on-premises using a rook v1.2/ceph v14.2.5 object store.

Job got stuck on merge step, with the following error appearing over and over in the logs for the worker:

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v2-bt5qv","ts":"2020-01-03T15:45:55.453110590Z","message":"errored merging output after 273.873702ms: The XML you provided was not well-formed or did not validate against our published schema."}

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v2-bt5qv","ts":"2020-01-03T15:45:55.453378447Z","message":"worker: watch closed or error running the worker process: acquire/process/merge datums for job 39917eaab6a640eab82d4312f0fac93b exited with err: The XML you provided was not well-formed or did not validate against our published schema.; retrying in 11.981423969s"}

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v2-bt5qv","ts":"2020-01-03T15:46:07.438545002Z","message":"skipping job 8c8c05031708420fa2ebb7926cea5aa8 as it is already in state JOB_KILLED"}

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v2-bt5qv","ts":"2020-01-03T15:46:07.477194040Z","message":"processing job 39917eaab6a640eab82d4312f0fac93b"}

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v2-bt5qv","ts":"2020-01-03T15:46:07.520008023Z","message":"starting to merge output"}

What you expected to happen?:

Job complete successsfully

How to reproduce it (as minimally and precisely as possible)?:

Anything else we need to know?:

Environment?:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • Pachyderm CLI and pachd server version (use pachctl version):
COMPONENT           VERSION
pachctl             1.9.10
pachd               1.9.10
  • Cloud provider (e.g. aws, azure, gke) or local deployment (e.g. minikube vs dockerized k8s): on-premises
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Others:

gz#43

@benwbooth
Copy link
Author

I had to use --isS3V2 to set the ceph object store as a MINIO store in order to get pachyderm to work. I wonder if this is related to #4001? Is Minio support currently broken?

@benwbooth
Copy link
Author

My guess is that there is some code that is trying to make a v4 call instead of a v2 call

@benwbooth
Copy link
Author

I attempted to work around the problem by removing the --isS3V2 option, but doing so causes pachd to fail to start:

2020-01-03T16:28:01Z INFO no Jaeger collector found (JAEGER_COLLECTOR_SERVICE_HOST not set)

2020-01-03T16:28:07Z INFO started setting up Internal Block API GRPC Server

2020-01-03T16:28:07Z WARNING TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory

2020-01-03T16:28:07Z INFO started setting up External PFS API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External PFS API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External PPS API GRPC Server

2020-01-03T16:28:07Z WARNING s3gateway TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory

2020-01-03T16:28:07Z INFO validating kubernetes access returned no errors

2020-01-03T16:28:07Z INFO finished setting up External PPS API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Auth API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Auth API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Transaction API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Transaction API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Enterprise API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Enterprise API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Admin API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Admin API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Health GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Health GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Version API GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Version API GRPC Server

2020-01-03T16:28:07Z INFO started setting up External Debug GRPC Server

2020-01-03T16:28:07Z INFO finished setting up External Debug GRPC Server

2020-01-03T16:28:37Z INFO error starting githook server context deadline exceeded

2020-01-03T16:28:51Z INFO pps.API.ListPipeline {"request":{}} []

2020-01-03T16:28:54Z INFO errored setting up Internal Block API GRPC Server

@brycemcanally
Copy link
Contributor

Closing since this issue is closely related to #4432. Feel free to reopen with logs from a release >=1.9.12.

@pappasilenus
Copy link
Contributor

Going to reopen as we're seeing this at another user.

@pappasilenus pappasilenus reopened this May 12, 2020
@pappasilenus
Copy link
Contributor

Agent comment from John Karabaic in Zendesk ticket #43:

John,

Is this still an issue for us to work on this week?

jk

@pappasilenus
Copy link
Contributor

Agent comment from John in Zendesk ticket #43:

Nope, solved by changing to s3v4.

-- You received this message because you are subscribed to the Google Groups "Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants