Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean runTimeouts when deleting a run #409

Closed
agoldis opened this issue Jul 30, 2021 · 6 comments · Fixed by #427
Closed

Clean runTimeouts when deleting a run #409

agoldis opened this issue Jul 30, 2021 · 6 comments · Fixed by #427
Assignees
Labels
bug Something isn't working

Comments

@agoldis
Copy link
Collaborator

agoldis commented Jul 30, 2021

Summary

runTimeouts is not getting cleaned when deleting a run, causing log error messages

@agoldis agoldis self-assigned this Jul 30, 2021
@agoldis agoldis added the bug Something isn't working label Jul 30, 2021
@tico24
Copy link
Contributor

tico24 commented Jul 30, 2021

Just in case anyone stumbles across this issue, the problem is that the director will constantly complain that it can't check runtimeouts. Here's a sample:

[run-timeout] Error checking run timeout for runId: 460e57b7401e9038af2297320636704d, task id: 60f57d1283c2b7633c962c58                                                                                                                                      AppError: AppError                                                                                                                                                                                                                                               at allRunSpecsCompleted (/app/packages/director/dist/execution/mongo/runs/run.controller.js:181:11)                                                                                                                                                          at processTicksAndRejections (internal/process/task_queues.js:95:5)                                                                                                                                                                                          at async maybeSetRunCompleted (/app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:17:7)                                                                                                                                              at async checkRunCompletionOnTimeout (/app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:31:7)                                                                                                                                       at async /app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:56:7 {                                                                                                                                                                 code: 'RUN_NOT_EXISTS'                                                                                                                                                                                                                                     } 

The workaround for now is to run this against the db:
mongo <mongodb_uri> --eval 'db.runTimeouts.deleteMany({})'

@everton-nasc
Copy link

@tico24 - Thanks for sharing the workaround!

Not sure what's going on, but still getting the errors after try to run the workaround command.

We have set sorry-cypress using helm/charts in order to spin up the services and this morning after deleting and recreating everything, this error started to popping up.

Currently we have 3 mongodb services:

  • mongodb-0 (master, I guess :P)
  • mongodb-1 (slave)
  • mongodb-arbiter-0

Installed version: sorry-cypress v1.0.3

Tried to run mongo <our-mongodb_uri> --eval 'db.runTimeouts.deleteMany({})' in mongodb-0

MongoDB server version: 4.4.6
{ "acknowledged" : true, "deletedCount" : 0 }

Weird.. didn't delete anything, even though the director is screaming a bunch of errors:

[run-timeout] Error checking run timeout for runId: b26788a178d75552779886c0905a26dc, task id: 6109794f7fc1664b04cb0da0         │
│ AppError: AppError                                                                                                              │
│     at allRunSpecsCompleted (/app/packages/director/dist/execution/mongo/runs/run.controller.js:181:11)                         │
│     at runMicrotasks (<anonymous>)                                                                                              │
│     at processTicksAndRejections (internal/process/task_queues.js:95:5)                                                         │
│     at async maybeSetRunCompleted (/app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:17:7)             │
│     at async checkRunCompletionOnTimeout (/app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:31:7)      │
│     at async /app/packages/director/dist/execution/mongo/runCompletion/runCompletion.js:56:7 {                                  │
│   code: 'RUN_NOT_EXISTS'                                                                                                        │
│ }


 MongoError: not master and slaveOk=false                                                                                        │
│     at MessageStream.messageHandler2 (/app/packages/mongo/dist/index.js:17055:24)                                               │
│     at MessageStream.emit (events.js:376:20)                                                                                    │
│     at MessageStream.emit (domain.js:470:12)                                                                                    │
│     at processIncomingData (/app/packages/mongo/dist/index.js:16773:16)                                                         │
│     at MessageStream._write (/app/packages/mongo/dist/index.js:16704:9)                                                         │
│     at writeOrBuffer (internal/streams/writable.js:358:12)                                                                      │
│     at MessageStream.Writable.write (internal/streams/writable.js:303:10)                                                       │
│     at Socket.ondata (internal/streams/readable.js:745:22)                                                                      │
│     at Socket.emit (events.js:376:20)                                                                                           │
│     at Socket.emit (domain.js:470:12) {                                                                                         │
│   topologyVersion: { processId: 61096dfc8a9b8936e9aceb3d, counter: 3 },                                                         │
│   operationTime: Timestamp2 { _bsontype: 'Timestamp', low_: 1, high_: 1628020667 },                                             │
│   ok: 0,                                                                                                                        │
│   code: 13435,                                                                                                                  │
│   codeName: 'NotPrimaryNoSecondaryOk',                                                                                          │
│   '$clusterTime': {                                                                                                             │
│     clusterTime: Timestamp2 { _bsontype: 'Timestamp', low_: 1, high_: 1628020667 },                                             │
│     signature: { hash: [Binary2], keyId: 0 }                                                                                    │
│   },                                                                                                                            │
│   [Symbol(errorLabels)]: Set(1) { 'RetryableWriteError' }                                                                       │
│ }                                                                                                                               │
│ [run-timeout] Checking run timeouts.

Tried to run mongo <our-mongodb_uri> --eval 'db.runTimeouts.deleteMany({})' in mongodb-1 - slave?!

MongoDB server version: 4.4.6
uncaught exception: WriteCommandError({
	"topologyVersion" : {
		"processId" : ObjectId("61096dfc8a9b8936e9aceb3d"),
		"counter" : NumberLong(3)
	},
	"operationTime" : Timestamp(1628020187, 1),
	"ok" : 0,
	"errmsg" : "not master",
	"code" : 10107,
	"codeName" : "NotWritablePrimary",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1628020187, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}) :
WriteCommandError({
	"topologyVersion" : {
		"processId" : ObjectId("61096dfc8a9b8936e9aceb3d"),
		"counter" : NumberLong(3)
	},
	"operationTime" : Timestamp(1628020187, 1),
	"ok" : 0,
	"errmsg" : "not master",
	"code" : 10107,
	"codeName" : "NotWritablePrimary",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1628020187, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
})
WriteCommandError@src/mongo/shell/bulk_api.js:417:48
executeBatch@src/mongo/shell/bulk_api.js:915:23
Bulk/this.execute@src/mongo/shell/bulk_api.js:1163:21
DBCollection.prototype.deleteMany@src/mongo/shell/crud_api.js:432:17

Do you guys have another thoughts or suggestions?

Thanks!

@tim-sendible
Copy link
Contributor

Your uri is most likely wrong. Make sure you're using the correct database.

@everton-nasc
Copy link

everton-nasc commented Aug 3, 2021

@tim-sendible
The database uri seems to be correct:

@mongodb-0:/$ mongo --host mongodb://mongodb-headless.cypress.svc.cluster.local:27017
MongoDB shell version v4.4.6
connecting to: mongodb://mongodb-headless.cypress.svc.cluster.local:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("9b4b2843-ce71-4189-9739-ad86408ed6b1") }
MongoDB server version: 4.4.6
---
The server generated these startup warnings when booting:
        2021-08-03T16:24:56.821+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
---
---
        Enable MongoDB's free cloud-based monitoring service, which will then receive and display
        metrics about your deployment (disk utilization, CPU, operation statistics, etc).

        The monitoring data will be available on a MongoDB website with a unique URL accessible to you
        and anyone you share the URL with. MongoDB may use this information to make product
        improvements and to suggest MongoDB products and deployment options to you.

        To enable free monitoring, run the following command: db.enableFreeMonitoring()
        To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
rs0:PRIMARY> exit
bye

Another quick test:

I have no name!@mongodb-0:/$ mongo mongodb://mongodb-headless.cypress.svc.cluster.local:27017 --eval 'db.getCollectionNames()'
MongoDB shell version v4.4.6
connecting to: mongodb://mongodb-headless.cypress.svc.cluster.local:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("0f5709bf-ce2c-4d22-be0e-41b17076e8da") }
MongoDB server version: 4.4.6
[ ]
I have no name!@mongodb-0:/$

Tried to delete all mongo services and recreate them again, but I'm getting a bunch of mongodb error on the director side:

[run-timeout] Checking run timeouts...                                                                                          │
│ (node:1) UnhandledPromiseRejectionWarning: MongoError: not master and slaveOk=false                                             │
│     at MessageStream.messageHandler2 (/app/packages/mongo/dist/index.js:17055:24)                                               │
│     at MessageStream.emit (events.js:376:20)                                                                                    │
│     at MessageStream.emit (domain.js:470:12)                                                                                    │
│     at processIncomingData (/app/packages/mongo/dist/index.js:16773:16)                                                         │
│     at MessageStream._write (/app/packages/mongo/dist/index.js:16704:9)                                                         │
│     at writeOrBuffer (internal/streams/writable.js:358:12)                                                                      │
│     at MessageStream.Writable.write (internal/streams/writable.js:303:10)                                                       │
│     at Socket.ondata (internal/streams/readable.js:745:22)                                                                      │
│     at Socket.emit (events.js:376:20)                                                                                           │
│     at Socket.emit (domain.js:470:12)                                                                                           │
│ (node:1) Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch().

This morning we got some weird error on the mongo replicaSet, where it was blocking the mongodb to spin up... after delete the RS and recreate it, at least the mongodb starts to "work"... but not 100% though. Running out of options 🤣

@agoldis
Copy link
Collaborator Author

agoldis commented Aug 4, 2021

@everton-nasc I don't really know how to help here - director is just trying to connect using the mongoDB credentials you've provided and looking at the error it seems like Mongo configuration error

I would try to follow the next steps:

  • make sure director get the correct MONGODB_URI when running
  • try to have mongo in single master mode, w/o slaves - make sure it is up and running and accessible before letting director connect to it

@everton-nasc
Copy link

@everton-nasc I don't really know how to help here - director is just trying to connect using the mongoDB credentials you've provided and looking at the error it seems like Mongo configuration error

I would try to follow the next steps:

* make sure director get the correct MONGODB_URI when running

* try to have mongo in single master mode, w/o slaves - make sure it is up and running and accessible before letting director connect to it

@agoldis
Thanks for your thoughts. For sure it will be plausible and less complex :D

I got it fixed, deleting all mongo statefulset along with their volumes and re-applied the templates again. At least the master and slave has started.

There's another error showing up, where the groups parameter is not working properly. It is splitting up the same test spec execution through the build-ids. I'll send a msg on the Slack about it.

Thanks!

@agoldis agoldis mentioned this issue Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants