Too much indexed docs - drop database #133

lukaszpy · 2013-09-24T07:54:57Z

my versions:

elasticsearch 0.90.2
plugins:
-- mapper attachments 1.4.0
-- river jdbc 2.2.1 (driver: postgresql-9.2-1002.jdbc4)
-- elasticsearch-river-mongodb-1.6.11

Problem:
When I creating indexes for Postgre db everything works fine in head plugin for ES I see:
structure - name of index
size: 1mb (1mb)
docs: 3587 (3587)
But when i create index on mongo db I getting:
type - index name
size: 642.6kb (642.6kb)
docs: 10495 (10495)

But in docs field is wrong number of docs, because in my db I have only 3936 docs.
This problem exist on every index on mongo db - count of indexed docs not mach count of docs in db.

I'm creating indexes with ( it's a windows version):
curl -XPUT "http://localhost:9200/_river/body/_meta" -d "{ "type": "mongodb", "mongodb": { "servers": [{host: "localhost", port: "27017" }], "options": {"secondary_read_preference": true}, "credentials": [{db: "fis-bps",user: "guest", password: "guest"}], db: "fis-bps", collection: "body",gridfs: "false"}, index: {name: "body", throttle_size: 2000}}"

This problem only exist on Windows system. On Ubuntu system problem dosn't exist.

I noticed one more think: when I dump my DB, and remove all data for dbs (from data directory for primary and slave). I create databases, create index, and then restore database from dump.
Now i have correct count of indexed docs.

It looks like elasticsearch looks deep into mongo db and normal drop DB and recreate it, still leave some data which are used by elasticsearch to create indexes.

richardwilly98 · 2013-09-24T16:02:25Z

The river get the data from oplog.rs not directly from the collection.

Did you by any chance drop the collection?

lukaszpy · 2013-09-25T06:08:20Z

I droped collection oplog.rs in PRIMARY but not in secondary (replica) Is that a mistake ?

richardwilly98 · 2013-09-25T08:17:45Z

In that case you should use options/drop_collection parameter (for more details see [1])

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki#configuration

lukaszpy · 2013-09-25T08:24:05Z

Ok, but ,I'm not sure so correct me If I think wrong.
flag drop_collection dosn't work with drop whole db, yes ?

So If I drop whole db, and recreate. Then restore data, I shoud get more indexed docs, than exist in my DB ?
Besause collection is not droped (whole DB is droped), and ES will se new dosc when using oplog

i just check that case on my Windows workstation.
I think it's a bug.

richardwilly98 · 2013-09-25T08:29:07Z

options/drop_collection will look with drop collection - probably not with drop database.

lukaszpy · 2013-09-25T08:30:35Z

So I think it's a bug, and shoud be corrected . Because we have incoherent states of indexes

richardwilly98 · 2013-09-25T09:57:10Z

Can you please clarify?
Which MongoDB command to drop the database or collection?
I believe dropping database or collection usually do not really apply to production environment.

lukaszpy · 2013-09-25T10:35:48Z

To drop db we should:

use test-db
db.dropDatabase()

To drop collection:

use test-db
db.test-collection.drop()

lukaszpy · 2013-09-25T11:07:42Z

I believe that bug coud exist on production environment.
for example me have:
machine1 and machine2 (with the same application, databases are duplicated too (each machine have its own mongodb))
For some reason we want to move data from 1 to 2 mchine.
We connect to machine1 and make dump of database. Then we go to machine2, now we droping whole db, and restore db created on machine1.
index state on machine2 will be incoherent. Because ES will get old data from oplog (but collection are empty), and new data from restore.

richardwilly98 · 2013-09-25T11:18:29Z

@lukaszpy

I will create a new feature request to support drop_database.

{
        "ts" : Timestamp(1380107544, 1),
        "h" : NumberLong("4469577380503976492"),
        "v" : 2,
        "op" : "c",
        "ns" : "mydb97.$cmd",
        "o" : {
                "dropDatabase" : 1
        }
}

richardwilly98 · 2013-10-07T10:55:22Z

@lukaszpy I will postpone this feature to release 1.7.2

The coming release 1.7.1 uses a different technique to do the initial import using the collection data (see Suggestion: Initial sync #47)
That could be a good workaroun to this issue: before to restore the data on machine 2 just drop the index and river in ES. Recreate the river when the restore has been completed.

Please provide feedback.

mahnunchik · 2013-10-07T12:57:56Z

Is this way like in mongodb? After large node downtime.

richardwilly98 · 2013-11-02T15:51:01Z

@mahnunchik can you please clarify?

- with ```options/drop_collection``` the river will also track ```dropDatabase``` operation

richardwilly98 added a commit that referenced this issue Nov 3, 2013

Fix for #133

0da87d2

- with ```options/drop_collection``` the river will also track ```dropDatabase``` operation

richardwilly98 closed this as completed Nov 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much indexed docs - drop database #133

Too much indexed docs - drop database #133

lukaszpy commented Sep 24, 2013

richardwilly98 commented Sep 24, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

richardwilly98 commented Oct 7, 2013

mahnunchik commented Oct 7, 2013

richardwilly98 commented Nov 2, 2013

Too much indexed docs - drop database #133

Too much indexed docs - drop database #133

Comments

lukaszpy commented Sep 24, 2013

richardwilly98 commented Sep 24, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

lukaszpy commented Sep 25, 2013

richardwilly98 commented Sep 25, 2013

richardwilly98 commented Oct 7, 2013

mahnunchik commented Oct 7, 2013

richardwilly98 commented Nov 2, 2013