Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oplog metricset to mongodb module #7604

Merged
merged 83 commits into from Aug 20, 2018

Conversation

@a3dho3yn
Copy link
Contributor

a3dho3yn commented Jul 15, 2018

Oplog size and window are two important metrics which show replication health.
With this metric set, we can have this information about replication:

{"mongodb": {
  "oplog": {
    "size": {
      "allocated": 2605587456,
      "used": 2616684138
    },
    "first": {
       "ts": 6515806468564845000
    },
    "last": {
       "ts": 6578335797915681000
    },
    "window": 62529329350836220
  }
}}
a3dho3yn and others added 5 commits Jul 12, 2018
Hossein Taleghani
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

elasticmachine commented Jul 15, 2018

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

used := event["size"].(common.MapStr)["used"].(int64)
assert.True(t, used > 0)

first_ts := event["first"].(common.MapStr)["ts"].(int64)

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

don't use underscores in Go names; var first_ts should be firstTs


// get first and last items in the oplog
oplog_iter := collection.Find(nil).Sort("$natural").Iter()
oplog_reverse_iter := collection.Find(nil).Sort("-$natural").Iter()

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

don't use underscores in Go names; var oplog_reverse_iter should be oplogReverseIter

used := int64(oplogStatus["size"].(float64))

// get first and last items in the oplog
oplog_iter := collection.Find(nil).Sort("$natural").Iter()

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

don't use underscores in Go names; var oplog_iter should be oplogIter

return false
}

func New(base mb.BaseMetricSet) (mb.MetricSet, error) {

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

exported function New should have comment or be unexported

mb.DefaultMetricSet())
}

type MetricSet struct {

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

exported type MetricSet should have comment or be unexported

"gopkg.in/mgo.v2/bson"
)

const oplog_col = "oplog.rs"

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Jul 15, 2018

don't use underscores in Go names; const oplog_col should be oplogCol

Hossein Taleghani added 5 commits Jul 15, 2018
Hossein Taleghani
Hossein Taleghani
Hossein Taleghani
Hossein Taleghani
Hossein Taleghani Hossein
@a3dho3yn a3dho3yn force-pushed the appson:dev/add-oplog-metricset branch from 3638aec to b533b5d Jul 15, 2018
@kvch

This comment has been minimized.

Copy link
Contributor

kvch commented Jul 16, 2018

jenkins test this

@ruflin ruflin requested a review from jsoriano Jul 16, 2018
@ruflin

This comment has been minimized.

Copy link
Collaborator

ruflin commented Jul 16, 2018

Could you add a changelog entry?

Hossein Taleghani and others added 2 commits Jul 16, 2018
Hossein Taleghani
Copy link
Member

jsoriano left a comment

Thanks for working on this! :) It looks quite good, I have added some comments, the only serious thing is the failing test.

var debugf = logp.MakeDebug("mongodb.oplog")

func init() {
logp.Info("initializing oplog")

This comment has been minimized.

Copy link
@jsoriano

jsoriano Jul 16, 2018

Member

We don't use to log initializations.

--
*`mongodb.oplog.last.ts`*::

This comment has been minimized.

Copy link
@jsoriano

jsoriano Jul 16, 2018

Member

I wouldn't abbreviate timestamps field names, what about first.time or first.timestamp?

}

firstTs := int64(first.(bson.M)["ts"].(bson.MongoTimestamp))
lastTs := int64(last.(bson.M)["ts"].(bson.MongoTimestamp))

This comment has been minimized.

Copy link
@jsoriano

jsoriano Jul 16, 2018

Member

Add checks for type conversions here

// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

This comment has been minimized.

Copy link
@jsoriano

jsoriano Jul 16, 2018

Member

Add this selector so this is executed only when running integration tests:

// +build integration

I think this is the reason why CI builds are failing.

This comment has been minimized.

Copy link
@a3dho3yn

a3dho3yn Jul 16, 2018

Author Contributor

Oh yes! My bad :(

// New creates a new instance of the MetricSet
// Part of new is also setting up the configuration by processing additional
// configuration entries if needed.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) {

This comment has been minimized.

Copy link
@jsoriano

jsoriano Jul 16, 2018

Member

Oh, add also here the experimental warning, something like:

cfgwarn.Experimental("The mongodb oplog metricset is experimental.")
a3dho3yn added 6 commits Jul 16, 2018
@a3dho3yn

This comment has been minimized.

Copy link
Contributor Author

a3dho3yn commented Jul 16, 2018

@jsoriano Thanks for your comments. I've fixed these issues and pushed them to my branch.

bahar-tgv and others added 11 commits Aug 11, 2018
@a3dho3yn

This comment has been minimized.

Copy link
Contributor Author

a3dho3yn commented Aug 13, 2018

@jsoriano We just finished our work but I have a problem with the integration tests.
When I run make test-module in my machine, everything goes well:

=== RUN   TestFetch
time="2018-08-13T17:31:23+04:30" level=info msg="[0/35] [mongodb]: Starting "
time="2018-08-13T17:31:23+04:30" level=warning msg="Error while reading .dockerignore (/home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore) : open /home/ho3yn/go/src/github.com/elastic/beats/metricbeat/module/mongodb/_meta/.dockerignore: no such file or directory"
time="2018-08-13T17:31:23+04:30" level=info msg="Building metricbeat_mongodb..."
time="2018-08-13T17:31:23+04:30" level=info msg="Recreating mongodb"
time="2018-08-13T17:31:23+04:30" level=info msg="[1/35] [mongodb]: Started "
--- PASS: TestFetch (3.76s)
        replstatus_integration_test.go:49: mongodb/replstatus event: {"headroom":{"max":null,"min":null},"lag":{"max":null,"min":null},"members":{"arbiter":{"count":0,"hosts":null},"down":{"count":0,"hosts":null},"primary":{"host":"9c951bf3e380:27017","optime":1534165281},"recovering":{"count":0,"hosts":null},"rollback":{"count":0,"hosts":null},"secondary":{"count":0,"hosts":null,"optimes":null},"startup2":{"count":0,"hosts":null},"unhealthy":{"count":0,"hosts":null},"unknown":{"count":0,"hosts":null}},"oplog":{"first":{"timestamp":1534161289},"last":{"timestamp":1534165281},"size":{"allocated":1038090240,"used":38484},"window":3992},"optimes":{"applied":1534165281,"durable":1534165281,"last_committed":1534165281},"server_date":"2018-08-13T17:31:27.055+04:30","set_name":"beats"}
=== RUN   TestData
--- SKIP: TestData (1.13s)
        data_generator.go:44: skip data generation tests
PASS
ok      github.com/elastic/beats/metricbeat/module/mongodb/replstatus   (cached)

But tests are failing in the CI due to no reachable servers :(

@jsoriano

This comment has been minimized.

Copy link
Member

jsoriano commented Aug 14, 2018

I have been trying and the tests fail if they are run just after starting the container and it passes if the container was already started beforehand (or in a previous execution), so this is something that can be probably solved by improving the healthcheck, that currently only checks if the port is open.

On the other hand, I have also seen that tests only fail in the replstatus metricset, the only one setting the session mode to strong with mongoSession.SetMode(mgo.Strong, true). I wonder if it this is really needed. If this line is removed, tests pass too.

@a3dho3yn

This comment has been minimized.

Copy link
Contributor Author

a3dho3yn commented Aug 14, 2018

a3dho3yn added 2 commits Aug 17, 2018
@jsoriano

This comment has been minimized.

Copy link
Member

jsoriano commented Aug 17, 2018

jenkins, test this please

Copy link
Member

jsoriano left a comment

It is looking good, I see it being merged soon 🙂
Only some small comments left.

@@ -0,0 +1,30 @@
{

This comment has been minimized.

Copy link
@jsoriano

jsoriano Aug 17, 2018

Member

Could this file be updated?

func init() {
mb.Registry.MustAddMetricSet("mongodb", "replstatus", New,
mb.WithHostParser(mongodb.ParseURL),
mb.DefaultMetricSet())

This comment has been minimized.

Copy link
@jsoriano

jsoriano Aug 17, 2018

Member

If this requires replicaset to work maybe it'd be better to make it non-default.

myState, ok := status["myState"].(int)
t.Logf("Mongodb state is %d", myState)
if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable

This comment has been minimized.

Copy link
@jsoriano

jsoriano Aug 17, 2018

Member

Is there any way to detect this stability? 🙂

We can leave it by now with the sleep in any case and revisit it later.

if ok && myState == 1 {
time.Sleep(5 * time.Second) // hack, wait more for replica set to become stable
break
}

This comment has been minimized.

Copy link
@jsoriano

jsoriano Aug 17, 2018

Member

Could we add a sleep after every retry?

This comment has been minimized.

Copy link
@a3dho3yn

a3dho3yn Aug 17, 2018

Author Contributor

Yes we can, but I see no reason to do this.
As you mentioned before, we should wait for some condition instead of sleeping. First I thought we should wait for a primary node, and I expected this condition to be sufficient for running the test. But then -in action- I figured out it needs something more than a node in the primary state. As I didn't find any condition to wait for, I used this sleep expresion.

If you're worried about blowing up CPU with this loop, I should note that state changes very fast (like 5 3 2 2 1) and it doesn't seem to be an issue.

This comment has been minimized.

Copy link
@jsoriano

jsoriano Aug 20, 2018

Member

Ok, let's leave it like this by now.

a3dho3yn added 2 commits Aug 17, 2018
@jsoriano

This comment has been minimized.

Copy link
Member

jsoriano commented Aug 20, 2018

jenkins, test this

@jsoriano jsoriano merged commit ca8f56b into elastic:master Aug 20, 2018
6 checks passed
6 checks passed
CLA Commit author has signed the CLA
Details
Hound No violations found. Woof!
beats-ci Build finished.
Details
codecov/patch 77.89% of diff hit (target 64.79%)
Details
codecov/project Absolute coverage decreased by -0.25% but relative coverage increased by +13.1% compared to 65ef265
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 20, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)
@jsoriano jsoriano added the v6.5.0 label Aug 20, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 23, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)
jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 23, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)
ruflin added a commit that referenced this pull request Aug 24, 2018
Add metrics about replication health.

Co-authored-by: Hossein Taleghani <a3dho3yn@users.noreply.github.com>
Co-authored-by: Bahar Taghavi <bahareh.t@fanap.plus>
(cherry picked from commit ca8f56b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.