Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Resuming on relate uses wrong index name #196

Closed
jodevsa opened this issue Mar 29, 2019 · 49 comments
Closed

Bug: Resuming on relate uses wrong index name #196

jodevsa opened this issue Mar 29, 2019 · 49 comments

Comments

@jodevsa
Copy link
Contributor

jodevsa commented Mar 29, 2019

Hi @rwynn ,
Thanks for your work, really amazing.

When Resuming using the following config, events in the change stream gets indexed to a wrong index name
{"index":{"_index":"5c976492cf09a2266a829a76_db.students","_id":"5c9e4143f37e264e131c84be","

version:4.16.0

resume=true
[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"

[[mapping]]
  namespace="db.studentsView"
  index="studentsView"
  type="studentsView"

This only happens on resumng, works fine when listening to new events in the change stream

Thanks!

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Hi @jodevsa

Thanks. Do you have any other configuration specified (excluding the connection and auth info)?. If you have direct-read-namespaces are they targeting the view instead of the collection?

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

I'm not even sure how it's possible the index could start with (an id?) appended with an underscore like this 5c976492cf09a2266a829a76_db.students. I don't believe there is any code that I'm aware of in monstache that would format the index like this. Are you using any plugins?

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Is it possible that there is some erroneous data in the oplog (possibly from a previous bad insert) such that the ns field in the oplog is set to 5c976492cf09a2266a829a76_db.students?

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

If so, maybe this would help:

namespace-regex = '^db\.students'

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

here is the rest of the config, direct-read-namespaces are commented,

# additional settings

# compress requests to Elasticsearch
gzip = true
# generate indexing statistics
stats = true
# index statistics into Elasticsearch
index-stats = false
# use the following PEM file for connections to MongoDB

# disable PEM validation
mongo-validate-pem-file = false
# use 4 go routines concurrently pushing documents to Elasticsearch
#elasticsearch-max-bytes = 400000
elasticsearch-max-conns = 2
# validate connections to Elasticsearch
elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
# override the name under which resume state is saved
resume-name = "default"
# exclude documents whose namespace matches the following pattern
# turn on indexing of GridFS file content
index-files = false
# turn on search result highlighting of GridFS content
file-highlighting = false
# print detailed information including request traces
verbose = true

# enable clustering mode
#cluster-name = 'apollo'
# do not exit after full-sync, rather continue tailing the oplog
exit-after-direct-reads = false

direct-read-namespaces is commented and yeah i searched the monstache code trying to figure out what happened nothing adds an ID, i've also checked the related collections i don't have any document with that ID

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

tried to reproduce several times, looks like the prefix ID is always "5c976492cf09a2266a829a76"

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

It might have been a mistake that got deleted later but was recorded in the oplog. Anyway it seems to saying that (maybe at one point) you have/had a db named 5c976492cf09a2266a829a76_db in MongoDB. It would have been recorded in local.oplog.rs as a record with an ns field = 5c976492cf09a2266a829a76_db.students.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

It should be ignored if you explicitly set the allowed namespaces

namespace-regex = '^db\.students'

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

i'm intrested in more than one collection so i change the regex to
namespace-regex= '^db'

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

Now resumes stopped working, live change streams updates still working fine

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Hmm. Maybe try

namespace-regex= '^db.+'

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

Same tried both
namespace-regex= '^db.+'
and
namespace-regex = '^db.students'

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Actually tested MatchString method in go and you should only need the ^db.

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } } 

Tried to listen to the change stream, this is the output

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Monstache gets the timestamp to resume from in monstache.monstache collection by default. There should be a doc from a previous run in there with an _id of your resume-name config.

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

{
    "_id" : "default",
    "ts" : Timestamp(1553879502, 4)
}

?

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

Updates and resumes are working fine, the problem is the index is occuring on the real collection not the view and the indexName on elastic is not right

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

If you are on MongoDB 4.0+ you can enable using the new change streams API with

change-stream-namespaces = [ "db.students" ]

Otherwise Monstache defaults to the old method of tailing the oplog (before the api became available).

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

If you want to listen to the whole database use

change-stream-namespaces = [ "db" ]

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

Sorry forgot to add that line too, i'm using change-stream-namespaces

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Other thing you can do to work around is add another mapping. I'm not entirely sure why it's need though.

[[relate]]
	namespace="db.students"
	with-namespace="db.studentsView"
	src-field="refId"
	match-field="_id"
        keep-src=false
[[mapping]]
  namespace="db.students"
  index="studentsView"
  type="studentsView"

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

actual change stream event

{ _id:
   { _data:
      '825C9E51CE0000000429295A1004E037522509E94139BEA8D7316E78898D46645F696400645C9E48F9F37E264E131C84F10004' },
  operationType: 'update',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 4, high_: 1553879502 },
  ns: { db: 'db', coll: 'student' },
  documentKey: { _id: 5c9e48f9f37e264e131c84f1 },
  updateDescription:
   { updatedFields:
      { description: '<p>123<br/></p>',
        updatedAt: 2019-03-29T17:11:42.892Z },
     removedFields: [] } } 

monstache log
{"index":{"_index":"5c976492cf09a2266a829a76_db.studnet","_id":"5c9e48f9f37e264e131c84f1","_type":"_doc","version":6673861643014766596,"version_type":"external"}}

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

@rwynn same issue

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

I don't know where that prefix could be getting added. Here is the code that sets the namespace attribute.

https://github.com/rwynn/gtm/blob/development/gtm.go#L168

It should just be joining the 2 fields with a dot.

This later flows into Monstache but I don't see how it could be getting the prefix there.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

I just did a quick test on my end in a VM and couldn't get the prefix added with a resume set. You mentioned this was a shared instance of MongoDB? Could this be related to that? Maybe some sort of tenant namespacing?

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

really weird, I'm using a mongo shared cluster on atlas

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

But listening to the stream using a nodeJS script outputs the correct stream

ns: { db: 'db', coll: 'student' },

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Yeah, I'm not sure why it's working differently. But seems to be consistent with that same ID.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Are you able to do a find on the underlying local.oplog.rs collection? You might not be able to in a shared cluster on altlas.

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 29, 2019

@rwynn nope

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

You could probably work around by adding another relate with the prefix added to the namespace.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

Looks like you can only access the oplog on an M10 or greater in Atlas.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

@jodevsa I created a ticket for this in the MongoDB Atlas support. Will report back if I learn anything.
Please let me know if you can recreate this outside of that environment.

@rwynn
Copy link
Owner

rwynn commented Mar 29, 2019

@jodevsa By the way, since you said this only triggers with resume on: in that case the change stream is started with the startAtOperationTime option set so that it can read previous entries. Maybe this bug only happens when that option is set?

https://docs.mongodb.com/manual/reference/method/db.collection.watch/

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

@rwynn you are right ! when using the resumAfter option, the namespace changes to "5c976492cf09a2266a829a76_db'"

ns:
   { db: '5c976492cf09a2266a829a76_db', coll: 'student' },

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

good news, this only occur on shared cluster!

@rwynn
Copy link
Owner

rwynn commented Mar 30, 2019

Would you mind writing up some instructions for repeated this with the mongo client? I have a ticket open now and they suspect it might be related to the shared tier cluster namespace. They asked for a code sample to repeat. Would be great to have a code sample that repeats with the mongo client to give them.

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

@rwynn yeah sure

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

// npm install mongodb
const MongoClient = require('mongodb').MongoClient;
const dbName = "db";
const url = `mongodb+srv://username:password@***********-i5mih.mongodb.net/${dbName}?retryWrites=true`;
const collectionName = "event";

const sleep = function(ms) {
    return new Promise((resolve, reject) => {
        setTimeout(() => {
            resolve(ms);
        }, ms)
    })
}
async function main() {
    let client = await MongoClient.connect(url, {
        useNewUrlParser: true
    })
    const db = client.db(dbName);
    const firstStream = await db.collection(collectionName).watch()
    let firstID
    firstStream.on('change', function(change) {
        console.log("first event ns", change.ns);
        firstID = change._id;
    });
    await sleep(1000)
    await db.collection(collectionName).insertOne({
        name: "first event "
    })
    await sleep(500);
    await firstStream.close();
    await db.collection(collectionName).insertOne({
        name: "second event"
    })
    const secondStream = await db.collection('event').watch({
        resumeAfter: firstID
    });
    secondStream.on('change', async function(change) {
        console.log("second event ns", change.ns);
        await secondStream.close();
        await client.close();
    });
}

main();

sample output:


first event ns { db: 'db', coll: 'event' }
second event ns { db: '5c976492cf09a2266a829a76_db', coll: 'event' }

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

The following code reproduces the issue, it prints the ns of 2 events from 2 different change streams
the first stream doesn't have any options specified, the second one uses the 'resumeAt' taken from the first event

@rwynn
Copy link
Owner

rwynn commented Mar 30, 2019

Thanks that’s perfect. I’ll attach it to the ticket.

Great find on this issue!

@jodevsa
Copy link
Contributor Author

jodevsa commented Mar 30, 2019

Awesome

@rwynn
Copy link
Owner

rwynn commented Apr 2, 2019

@jodevsa got an update today that a fix for this on the Atlas side is planned to land in the next couple weeks. They've been able to replicate with your script and it's on their radar.

@jodevsa
Copy link
Contributor Author

jodevsa commented Apr 3, 2019

@rwynn
Thanks for the update, thats awesome, I really appreciate your time on this; knowing that it's not a bug/issue with the tool.

@jodevsa jodevsa closed this as completed Apr 3, 2019
@jodevsa
Copy link
Contributor Author

jodevsa commented Jun 9, 2019

@rwynn Atlas have fixed the issue in the latest maintenance update 4.010

@rwynn
Copy link
Owner

rwynn commented Jun 9, 2019

That’s good news. Thanks for the update.

@sbeaupre
Copy link

@jodevsa @rwynn has this issue resurfaced? I am testing a setup now with replay and resume on and in ES an index is created with an id. When doing an update in mongo after all previous documents were synced, then another index with the correct name is created for the newly updated docs.

What could be absurd is that I don't think I had this issue a few weeks ago when I ran a first test. I don't remember having seen an index with an prefixed id at that time.

I am running monstach in docker (image version 6.3.1) and I am testing with Mongo Atlas M0 (noticed they are at version 4.0.13 now) and ES Cloud 7.0.1.

All options are configured via commandline params given to the container.

@rwynn
Copy link
Owner

rwynn commented Nov 11, 2019

@sbeaupre you may want to try to replicate again using just the node code that @jodevsa previously posted to this issue. If you find that the issue has come back it would be helpful if you created a ticket with MongoDB.

For reference, the previous ticket that I created with them on March 29th was given case #00558022. You might state that is was fixed in 4.0.10 and seems to have come back now in 4.0.13.

@sbeaupre
Copy link

I can confirm that it is an issue (again). I'll contact mongo support with your reference number

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants