Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat event data structure #3406

Closed
ruflin opened this issue Jan 19, 2017 · 18 comments
Closed

Heartbeat event data structure #3406

ruflin opened this issue Jan 19, 2017 · 18 comments
Labels
discuss Issue needs further discussion. enhancement Heartbeat

Comments

@ruflin
Copy link
Member

ruflin commented Jan 19, 2017

This issue is to discuss the heartbeat event data structure. The current structure (with error) looks as following:

http

{
  "@timestamp": "2017-01-19T08:14:26.256Z",
  "beat": {
    "hostname": "ruflin",
    "name": "ruflin",
    "version": "6.0.0-alpha1"
  },
  "duration": {
    "us": 4955
  },
  "error": {
    "message": "Get http://localhost:80: dial tcp 127.0.0.1:80: getsockopt: connection refused",
    "type": "io"
  },
  "host": "localhost",
  "ip": "127.0.0.1",
  "monitor": "http@http://localhost:80",
  "port": 80,
  "resolve_rtt": {
    "us": 3177
  },
  "scheme": "http",
  "tcp_connect_rtt": {
    "us": 1546
  },
  "type": "http",
  "up": false,
  "url": "http://localhost:80"
}

tcp

{
  "@timestamp": "2017-01-19T08:14:26.256Z",
  "beat": {
    "hostname": "ruflin",
    "name": "ruflin",
    "version": "6.0.0-alpha1"
  },
  "duration": {
    "us": 4969
  },
  "error": {
    "message": "dial tcp 127.0.0.1:12345: getsockopt: connection refused",
    "type": "io"
  },
  "host": "localhost",
  "ip": "127.0.0.1",
  "monitor": "tcp-plain@localhost:12345",
  "port": "12345",
  "resolve_rtt": {
    "us": 3192
  },
  "scheme": "tcp",
  "tcp_connect_rtt": {
    "us": 1651
  },
  "type": "tcp",
  "up": false
}

icmp

{
  "@timestamp": "2017-01-19T08:16:45.003Z",
  "beat": {
    "hostname": "ruflin",
    "name": "ruflin",
    "version": "6.0.0-alpha1"
  },
  "duration": {
    "us": 2145
  },
  "host": "localhost",
  "icmp_rtt": {
    "us": 98
  },
  "ip": "127.0.0.1",
  "monitor": "icmp-host-ip@localhost",
  "resolve_rtt": {
    "us": 1925
  },
  "type": "icmp",
  "up": true
}
@ruflin ruflin added discuss Issue needs further discussion. Heartbeat labels Jan 19, 2017
@tsg
Copy link
Contributor

tsg commented Feb 4, 2017

For what it's worth, the boolean type for the "up" field is kind of a pain in Kibana. I suggest making it (also) a string.

@urso
Copy link

urso commented Mar 29, 2017

how about changing "up" to "status" with values "up", "down" for now?

@monicasarbu
Copy link
Contributor

@urso That's a good idea to have status: up or status: down.

@monicasarbu
Copy link
Contributor

monicasarbu commented Apr 6, 2017

I suggest to combine some fields, for example:

Before:

 "host": "localhost",
  "ip": "127.0.0.1",
  "monitor": "http@http://localhost:80",
  "port": 80,
  "scheme": "http",

After:

monitor: {
   "scheme": "http",
   "host": "localhost",
   "ip": "127.0.0.1",
   "port": 80,
   "url": "http://localhost:80",
   "id": "http@http://localhost:80", //maybe the name `id` is not the best here
}

I would say to export either url or scheme, host, ip and port, but not all of them. I think I am in favor of not exporting url by default.

@monicasarbu
Copy link
Contributor

I would suggest combining together also rttp related times. A possible option can be:

Before:

  "duration": {
    "us": 4955
  },
 
  "resolve_rtt": {
    "us": 3177
  },
  "tcp_connect_rtt": {
    "us": 1546
  },

After:

"rtt": {
     "total": {
        "us": 4955
     },
     "resolve": {
        "us": 3177
     },
    "connect": {
        "us": 1546
     }
}

@monicasarbu
Copy link
Contributor

For HTTP, not sure if it makes sense to export both monitor and url, one of them should be sufficient.

@ruflin
Copy link
Member Author

ruflin commented Apr 6, 2017

I like the idea to have the "common" fields under the monitor namespace. The other fields we should organise the same as we do for metricbeat and filebeat modules except that there seems to be only "module" and nothing like "fileset/metricset". So all special http fields would be under httpand icmp under icmp. @urso You see in the future an additional nesting level with some potential features?

We will also need monitor.type to store which monitor was used.

@andrewkroh
Copy link
Member

andrewkroh commented Apr 6, 2017

I would keep the url in http events and change the Kibana type to "url" so that the value becomes a hyperlink. It's handy to have a clickable link directly in Kibana. The url is also useful when generate alerts from Watcher since you can include a clickable link in email or slack msg.

hyperlink

@urso
Copy link

urso commented Apr 7, 2017

+1 on keeping URL. users can pass additional GET parameters right in the URL.

duration is not related to rtt, duration is total runtime of the monitor to produce an event.

Note, monitor.id is basically redundant. It's somewhat nice for filtering in Kibana, just having one field to filter on, but it's not really required.

The status fields will be "up" and "down" by default. Maybe we want to add some more status types here: check_fail= (connect ok, but user-configurable checks did fail), ...

Or have status more detailed like (reason is optional and only added if "status.type"=="down"):

"status": {
  "type": "up", // "down"
  "reason": "TLS handshake" // "DNS resolve", "Check X failed", ...
}

This way, there is some overlap between error and status. The difference is, status.reason will be curated by us (more broad) and error.reason will contain more detailed information.
e.g. for EOF on TLS handshake failure (e.g. remote did close connection due to failed client authentication) we might see:

"error": {
  "type": "io",
  "message": "http: TLS handshake error from 127.0.0.1:5253: EOF"
},
"status": {
  "type": "down",
  "reason": "TLS handshake failed"
}

New event structure:

{
  "@timestamp": "2017-01-19T08:14:26.256Z",
  "beat": {
    "hostname": "hostname",
    "name": "hostname",
    "version": "6.0.0-alpha1"
  },
  "monitor": {
     "name": "apache"
     "scheme": "https",
     "host": "localhost",
     "ip": "127.0.0.1",
     "port": 80,
     "url": "https://localhost:80",
     "id": "apache@https://localhost:80",
  },
  "duration": { "us": 12345 }, // duration monitor was active (includes waiting due to configured scheduler limits)
  "status": { "type": "up" },
  "rtt": {
     "resolve": {
        "us": 3177
     },
    "tcp_connect": {
        "us": 1546
     },
  }
}

current available rtt types: rtt.icmp, rtt.tls_handshake, rtt.tcp_connect, rtt.socks5_connect rtt.validate.

@ruflin
Copy link
Member Author

ruflin commented Apr 7, 2017

As status and duration seem to exist for all monitors, I would also put them under monitor namespace: monitor.status.type: up and monitor.duriation.us: 123

The rtt fields seem to me the (only?) ones that are monitor specific, so they should be in the namespace of the monitor, means: tcp.connect.ms, http.rrt.resolve.ms, icmp.rtt.resolve.ms. In case these are also shared, they should also go under monitor.

@urso
Copy link

urso commented Apr 7, 2017

Including 'nesting' by protocol, I'd remove the rtt namespace and introduce a per protocol namespace (different protocols/monitors might use same namespace).

some samples:

"resolve": {
  "rtt": { "us": 12345 },
}

"icmp": {
  "rtt": { "us": 12345 },
  "requests": 5 // total number of Echo Requests
}

"tcp": {
  "rtt": {
    "connect": ...
    "validate": ...
  }
}

"tls": {
  "rtt": { "handshake": ... },
  "status": {
    "version": "TLS 3.0",
    "cipher": ...
  }
  "cert": {
    "expires": ...,
    ... // more certificate meta data?
  }
}

http: {
  "rtt": { "validate": ... }
  "request": {
    "method": "GET",
   ...  // request metadata ?
  },
  "response": {
    "code": 200
    ... // response metadata (e.g. report selected headers, body size, ...)
  }
}

@urso
Copy link

urso commented Apr 7, 2017

we can report protocol info either by monitor, or top-level.

@ruflin
Copy link
Member Author

ruflin commented Apr 7, 2017

Seems like we are mostly on the same page :-) +1 on per monitor.

@urso
Copy link

urso commented Apr 7, 2017

with nesting and moving monitor specific fields, http event will look like:

{
  "@timestamp": "2017-01-19T08:14:26.256Z",
  "beat": {
    "hostname": "hostname",
    "name": "hostname",
    "version": "6.0.0-alpha1"
  },
  "monitor": {
     "name": "apache"
     "host": "localhost",
     "ip": "127.0.0.1",

     "duration": { "us": 12345 },
     "status": { "type": "up" },
  },
  "http": {
    "scheme": "http",
    "url": "https://localhost:80",
    "rtt": {
      "validate": { "us": 12345 }
    }
    "request": {
      "method": "GET",
    },
    "response": {
      "status": {
        "code": 200,
        "message": "OK"
      }
    }
  },
  "tcp": {
    "port": 80,
    "rtt": {
      "connect": { "us": 12345 }
    }
  },
  "resolve": {
    "host": "localhost",
    "ip": "127.0.0.1",
    "rtt": { "us": 12345 }
  }
}

@urso urso mentioned this issue Apr 7, 2017
12 tasks
@ruflin
Copy link
Member Author

ruflin commented Apr 7, 2017

👍 on the proposal above. resolve namespace we can deal with as soon as we start using it.

@monicasarbu
Copy link
Contributor

monicasarbu commented Apr 7, 2017

👍 on the above proposal, but I suggest to export the minimal fields by default. For example, I would not export http.request and http.response by default, and I would not export status.reason if error.message is present anyway.

@ruflin
Copy link
Member Author

ruflin commented Apr 7, 2017

In general it looks like the data structure will compress quite well as the only values that change for each event are probably rtt and timestamp.

ruflin added a commit to ruflin/beats that referenced this issue Apr 19, 2017
This is a first take on changing the event structure to elastic#3406 (comment)

* move up to status -> up / down. In the example it is status.type: up? Reason?
This was referenced Apr 21, 2017
@urso
Copy link

urso commented May 18, 2017

This has been implemented in #4091.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Heartbeat
Projects
None yet
Development

No branches or pull requests

5 participants