New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oplog metricset to mongodb module #7604

Merged
merged 83 commits into from Aug 20, 2018

Conversation

Projects
None yet
7 participants
@a3dho3yn
Contributor

a3dho3yn commented Jul 15, 2018

Oplog size and window are two important metrics which show replication health.
With this metric set, we can have this information about replication:

{"mongodb": {
  "oplog": {
    "size": {
      "allocated": 2605587456,
      "used": 2616684138
    },
    "first": {
       "ts": 6515806468564845000
    },
    "last": {
       "ts": 6578335797915681000
    },
    "window": 62529329350836220
  }
}}
@elasticmachine

This comment has been minimized.

Show comment
Hide comment
@elasticmachine

elasticmachine Jul 15, 2018

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

elasticmachine commented Jul 15, 2018

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog_integration_test.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated

a3dho3yn added some commits Jul 15, 2018

@kvch

This comment has been minimized.

Show comment
Hide comment
@kvch

kvch Jul 16, 2018

Contributor

jenkins test this

Contributor

kvch commented Jul 16, 2018

jenkins test this

@ruflin ruflin requested a review from jsoriano Jul 16, 2018

@ruflin

This comment has been minimized.

Show comment
Hide comment
@ruflin

ruflin Jul 16, 2018

Collaborator

Could you add a changelog entry?

Collaborator

ruflin commented Jul 16, 2018

Could you add a changelog entry?

a3dho3yn added some commits Jul 16, 2018

@jsoriano

Thanks for working on this! :) It looks quite good, I have added some comments, the only serious thing is the failing test.

Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/docs/fields.asciidoc Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog.go Outdated
Show outdated Hide outdated metricbeat/module/mongodb/oplog/oplog_integration_test.go Outdated
@a3dho3yn

This comment has been minimized.

Show comment
Hide comment
@a3dho3yn

a3dho3yn Jul 16, 2018

Contributor

@jsoriano Thanks for your comments. I've fixed these issues and pushed them to my branch.

Contributor

a3dho3yn commented Jul 16, 2018

@jsoriano Thanks for your comments. I've fixed these issues and pushed them to my branch.

@jsoriano

This comment has been minimized.

Show comment
Hide comment
@jsoriano

jsoriano Jul 17, 2018

Member

@a3dho3yn unfortunately I cannot see the fixes, could you check that you pushed to the branch used for this PR?

Member

jsoriano commented Jul 17, 2018

@a3dho3yn unfortunately I cannot see the fixes, could you check that you pushed to the branch used for this PR?

@a3dho3yn

This comment has been minimized.

Show comment
Hide comment
@a3dho3yn

a3dho3yn Aug 13, 2018

Contributor

@jsoriano We just finished our work but I have a problem with the integration tests.
When I run make test-module in my machine, everything goes well:

=== RUN   TestFetch
time="2018-08-13T17:31:23+04:30" level=info msg="[0/35] [mongodb]: Starting "
time="2018-08-13T17:31:23+04:30" level=warning msg="Error while reading .dockerignore (/home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore) : open /home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore: no such file or directory"
time="2018-08-13T17:31:23+04:30" level=info msg="Building metricbeat_mongodb..."
time="2018-08-13T17:31:23+04:30" level=info msg="Recreating mongodb"
time="2018-08-13T17:31:23+04:30" level=info msg="[1/35] [mongodb]: Started "
--- PASS: TestFetch (3.76s)
        replstatus_integration_test.go:49: mongodb/replstatus event: {"headroom":{"max":null,"min":null},"lag":{"max":null,"min":null},"members":{"arbiter":{"count":0,"hosts":null},"down":{"count":0,"hosts":null},"primary":{"host":"9c951bf3e380:27017","optime":1534165281},"recovering":{"count":0,"hosts":null},"rollback":{"count":0,"hosts":null},"secondary":{"count":0,"hosts":null,"optimes":null},"startup2":{"count":0,"hosts":null},"unhealthy":{"count":0,"hosts":null},"unknown":{"count":0,"hosts":null}},"oplog":{"first":{"timestamp":1534161289},"last":{"timestamp":1534165281},"size":{"allocated":1038090240,"used":38484},"window":3992},"optimes":{"applied":1534165281,"durable":1534165281,"last_committed":1534165281},"server_date":"2018-08-13T17:31:27.055+04:30","set_name":"beats"}
=== RUN   TestData
--- SKIP: TestData (1.13s)
        data_generator.go:44: skip data generation tests
PASS
ok      github.com/elastic/beats/metricbeat/module/mongodb/replstatus   (cached)

But tests are failing in the CI due to no reachable servers :(

Contributor

a3dho3yn commented Aug 13, 2018

@jsoriano We just finished our work but I have a problem with the integration tests.
When I run make test-module in my machine, everything goes well:

=== RUN   TestFetch
time="2018-08-13T17:31:23+04:30" level=info msg="[0/35] [mongodb]: Starting "
time="2018-08-13T17:31:23+04:30" level=warning msg="Error while reading .dockerignore (/home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore) : open /home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore: no such file or directory"
time="2018-08-13T17:31:23+04:30" level=info msg="Building metricbeat_mongodb..."
time="2018-08-13T17:31:23+04:30" level=info msg="Recreating mongodb"
time="2018-08-13T17:31:23+04:30" level=info msg="[1/35] [mongodb]: Started "
--- PASS: TestFetch (3.76s)
        replstatus_integration_test.go:49: mongodb/replstatus event: {"headroom":{"max":null,"min":null},"lag":{"max":null,"min":null},"members":{"arbiter":{"count":0,"hosts":null},"down":{"count":0,"hosts":null},"primary":{"host":"9c951bf3e380:27017","optime":1534165281},"recovering":{"count":0,"hosts":null},"rollback":{"count":0,"hosts":null},"secondary":{"count":0,"hosts":null,"optimes":null},"startup2":{"count":0,"hosts":null},"unhealthy":{"count":0,"hosts":null},"unknown":{"count":0,"hosts":null}},"oplog":{"first":{"timestamp":1534161289},"last":{"timestamp":1534165281},"size":{"allocated":1038090240,"used":38484},"window":3992},"optimes":{"applied":1534165281,"durable":1534165281,"last_committed":1534165281},"server_date":"2018-08-13T17:31:27.055+04:30","set_name":"beats"}
=== RUN   TestData
--- SKIP: TestData (1.13s)
        data_generator.go:44: skip data generation tests
PASS
ok      github.com/elastic/beats/metricbeat/module/mongodb/replstatus   (cached)

But tests are failing in the CI due to no reachable servers :(

@jsoriano

This comment has been minimized.

Show comment
Hide comment
@jsoriano

jsoriano Aug 14, 2018

Member

I have been trying and the tests fail if they are run just after starting the container and it passes if the container was already started beforehand (or in a previous execution), so this is something that can be probably solved by improving the healthcheck, that currently only checks if the port is open.

On the other hand, I have also seen that tests only fail in the replstatus metricset, the only one setting the session mode to strong with mongoSession.SetMode(mgo.Strong, true). I wonder if it this is really needed. If this line is removed, tests pass too.

Member

jsoriano commented Aug 14, 2018

I have been trying and the tests fail if they are run just after starting the container and it passes if the container was already started beforehand (or in a previous execution), so this is something that can be probably solved by improving the healthcheck, that currently only checks if the port is open.

On the other hand, I have also seen that tests only fail in the replstatus metricset, the only one setting the session mode to strong with mongoSession.SetMode(mgo.Strong, true). I wonder if it this is really needed. If this line is removed, tests pass too.

@a3dho3yn

This comment has been minimized.

Show comment
Hide comment
@a3dho3yn

a3dho3yn Aug 14, 2018

Contributor
Contributor

a3dho3yn commented Aug 14, 2018

a3dho3yn added some commits Aug 17, 2018

@jsoriano

This comment has been minimized.

Show comment
Hide comment
@jsoriano

jsoriano Aug 17, 2018

Member

jenkins, test this please

Member

jsoriano commented Aug 17, 2018

jenkins, test this please

@jsoriano

It is looking good, I see it being merged soon 🙂
Only some small comments left.

@@ -0,0 +1,30 @@
{

This comment has been minimized.

@jsoriano

jsoriano Aug 17, 2018

Member

Could this file be updated?

@jsoriano

jsoriano Aug 17, 2018

Member

Could this file be updated?

Show outdated Hide outdated metricbeat/module/mongodb/replstatus/replstatus.go Outdated
myState, ok := status["myState"].(int)
t.Logf("Mongodb state is %d", myState)
if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable

This comment has been minimized.

@jsoriano

jsoriano Aug 17, 2018

Member

Is there any way to detect this stability? 🙂

We can leave it by now with the sleep in any case and revisit it later.

@jsoriano

jsoriano Aug 17, 2018

Member

Is there any way to detect this stability? 🙂

We can leave it by now with the sleep in any case and revisit it later.

if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable
break
}

This comment has been minimized.

@jsoriano

jsoriano Aug 17, 2018

Member

Could we add a sleep after every retry?

@jsoriano

jsoriano Aug 17, 2018

Member

Could we add a sleep after every retry?

This comment has been minimized.

@a3dho3yn

a3dho3yn Aug 17, 2018

Contributor

Yes we can, but I see no reason to do this.
As you mentioned before, we should wait for some condition instead of sleeping. First I thought we should wait for a primary node, and I expected this condition to be sufficient for running the test. But then -in action- I figured out it needs something more than a node in the primary state. As I didn't find any condition to wait for, I used this sleep expresion.

If you're worried about blowing up CPU with this loop, I should note that state changes very fast (like 5 3 2 2 1) and it doesn't seem to be an issue.

@a3dho3yn

a3dho3yn Aug 17, 2018

Contributor

Yes we can, but I see no reason to do this.
As you mentioned before, we should wait for some condition instead of sleeping. First I thought we should wait for a primary node, and I expected this condition to be sufficient for running the test. But then -in action- I figured out it needs something more than a node in the primary state. As I didn't find any condition to wait for, I used this sleep expresion.

If you're worried about blowing up CPU with this loop, I should note that state changes very fast (like 5 3 2 2 1) and it doesn't seem to be an issue.

This comment has been minimized.

@jsoriano

jsoriano Aug 20, 2018

Member

Ok, let's leave it like this by now.

@jsoriano

jsoriano Aug 20, 2018

Member

Ok, let's leave it like this by now.

a3dho3yn added some commits Aug 17, 2018

@jsoriano

This comment has been minimized.

Show comment
Hide comment
@jsoriano

jsoriano Aug 20, 2018

Member

jenkins, test this

Member

jsoriano commented Aug 20, 2018

jenkins, test this

@jsoriano jsoriano merged commit ca8f56b into elastic:master Aug 20, 2018

6 checks passed

CLA Commit author has signed the CLA
Details
Hound No violations found. Woof!
beats-ci Build finished.
Details
codecov/patch 77.89% of diff hit (target 64.79%)
Details
codecov/project Absolute coverage decreased by -0.25% but relative coverage increased by +13.1% compared to 65ef265
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 20, 2018

Add oplog metricset to mongodb module (elastic#7604)
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)

@jsoriano jsoriano added the v6.5.0 label Aug 20, 2018

jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 23, 2018

Add oplog metricset to mongodb module (elastic#7604)
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)

jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 23, 2018

Add oplog metricset to mongodb module (elastic#7604)
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)

ruflin added a commit that referenced this pull request Aug 24, 2018

Add oplog metricset to mongodb module (#7604) (#8019)
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment