Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose metrics through http endpoint #3717

Merged
merged 1 commit into from Apr 4, 2017

Conversation

Projects
None yet
2 participants
@ruflin
Copy link
Collaborator

commented Mar 3, 2017

The following PR exposes beats metrics through a configurable http endpoint. This allows when enabled to get an insight into a running beat. For security reasons the endpoint is off by default.

Configuration

The configuration options are in the http namespace. This config naming is borrowed from Logstash. By default the http endpoint is disabled. If enabled the metrics are only exposed on localhost on port 5066.

http.enabled: false
http.host: localhost
http.port: 5066

-httpprof ?

The http endpoint can be enabled also in production if needed. The additional endpoint httpprof endpoint which can be enabled through -httpprof still exists but is only recommended for debugging purpose. The httpprof endpoint exposes many more metrics and runtime data then the metrics endpoint.

Endpoints

The current implementation has two endpoints:

  • /: The standard endpoint exposes info about the beat
  • /stats: Stats exposes all metrics collected by monitoring

The output of the data is in json. The flag ?pretty can be used to have formatted json as output to make it more human readable. Below is an example of each endpoint.

/

{
  "beat": "metricbeat",
  "hostname": "ruflin",
  "name": "ruflin",
  "uuid": "9d6e0c3a-1677-424c-aead-097e597e09f9",
  "version": "6.0.0-alpha1"
}

/stat

{
  "beat": {
    "memstats": {
      "gc_next": 6262080,
      "memory_alloc": 3879968,
      "memory_total": 479284520
    }
  },
  "libbeat": {
    "config": {
      "module": {
        "running": 0,
        "starts": 0,
        "stops": 0
      },
      "reloads": 0
    }
  },
  "metricbeat": {
    "system": {
      "cpu": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "filesystem": {
        "events": 28,
        "failures": 0,
        "success": 7
      },
      "fsstat": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "load": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "memory": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "network": {
        "events": 70,
        "failures": 0,
        "success": 7
      },
      "process": {
        "events": 1324,
        "failures": 0,
        "success": 7
      }
    }
  },
  "output": {
    "elasticsearch": {
      "events": {
        "acked": 1445,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 34
        }
      },
      "read": {
        "bytes": 17213,
        "errors": 0
      },
      "write": {
        "bytes": 1185991,
        "errors": 0
      }
    },
    "events": {
      "acked": 1445
    },
    "kafka": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      }
    },
    "logstash": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "messages": {
      "dropped": 0
    },
    "redis": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "write": {
      "bytes": 1185991,
      "errors": 0
    }
  },
  "publisher": {
    "events": {
      "count": 1450
    },
    "queue": {
      "messages": {
        "count": 1450
      }
    }
  }
}

Questions

  • Are these good endpoints paths?
  • Which should be our default port?

@ruflin ruflin force-pushed the ruflin:stats-endpoint branch from dc960bf to e0c926f Mar 6, 2017

@urso

This comment has been minimized.

Copy link
Collaborator

commented Mar 14, 2017

  • we should not start a second http server but see how we can unify this with the server started via -httpprof
  • don't use the Do interface, but checkout the CollectX functions in monitoring package. See #3739 for flat and nested snapshot support.
@ruflin

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 14, 2017

@urso For the second http server: That is only the case if -httpprof is enabled and the stats endpoint is enabled. In all other cases there is only 0 or 1 http endpoint I think. As the 2 server different purposes from my point of view, I'm ok with that. Enabling -httprof is only for debugging purpose.

@ruflin ruflin force-pushed the ruflin:stats-endpoint branch 2 times, most recently from 9637d7c to ca6f528 Mar 16, 2017

@ruflin ruflin added review and removed discuss in progress labels Mar 27, 2017

@ruflin ruflin force-pushed the ruflin:stats-endpoint branch 2 times, most recently from 8bd0974 to 8328170 Apr 3, 2017

@@ -6,7 +6,7 @@ coverage:
default:
# basic
target: auto
threshold: null
threshold: 0.1

This comment has been minimized.

Copy link
@urso

urso Apr 3, 2017

Collaborator

uhm... correct PR?

This comment has been minimized.

Copy link
@ruflin

ruflin Apr 3, 2017

Author Collaborator

kind off, because otherwise the PR would fail :-) I will keep this one in ...

@@ -938,6 +938,19 @@ output.elasticsearch:
# dashboards and index pattern. Example: testbeat-*
#dashboards.index:

#================================ HTTP Endpoint ======================================
# Each beat can expose internally collected metrics through a http endpoint. For security

This comment has been minimized.

Copy link
@urso

urso Apr 3, 2017

Collaborator

can we get rid of mentioning 'metrics' here?

This comment has been minimized.

Copy link
@ruflin

ruflin Apr 3, 2017

Author Collaborator

done

}()
}

func rootHandler(w http.ResponseWriter, r *http.Request, info common.BeatInfo) {

This comment has been minimized.

Copy link
@urso

urso Apr 3, 2017

Collaborator

alternatively turn rootHandler into a httpHandler function via:

func rootHandler(info common.BeatInfo) func(http.ResponseWriter, *http.Request) {
   ...
}

This comment has been minimized.

Copy link
@ruflin

ruflin Apr 3, 2017

Author Collaborator

done

Expose metrics through http endpoint
The following PR exposes beats metrics through a configurable http endpoint. This allows when enabled to get an insight into a running beat. For security reasons the endpoint is off by default.

**Configuration**

The configuration options are in the `http` namespace. This config naming is borrowed from Logstash. By default the http endpoint is disabled. If enabled the metrics are only exposed on localhost on port 5066.

```
http.enabled: false
http.host: localhost
http.port: 5066
```

**-httpprof ?**

The http endpoint can be enabled also in production if needed. The additional endpoint httpprof endpoint which can be enabled through `-httpprof` still exists but is only recommended for debugging purpose. The httpprof endpoint exposes many more metrics and runtime data then the metrics endpoint.

**Endpoints**

The current implementation has two endpoints:

* `/`: The standard endpoint exposes info about the beat
* `/stats`: Stats exposes all metrics collected by monitoring

The output of the data is in json. The flag `?pretty` can be used to have formatted json as output to make it more human readable. Below is an example of each endpoint.

**/**

```
{
  "beat": "metricbeat",
  "hostname": "ruflin",
  "name": "ruflin",
  "uuid": "9d6e0c3a-1677-424c-aead-097e597e09f9",
  "version": "6.0.0-alpha1"
}
```

**/stat**

```
{
  "beat": {
    "memstats": {
      "gc_next": 6262080,
      "memory_alloc": 3879968,
      "memory_total": 479284520
    }
  },
  "libbeat": {
    "config": {
      "module": {
        "running": 0,
        "starts": 0,
        "stops": 0
      },
      "reloads": 0
    }
  },
  "metricbeat": {
    "system": {
      "cpu": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "filesystem": {
        "events": 28,
        "failures": 0,
        "success": 7
      },
      "fsstat": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "load": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "memory": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "network": {
        "events": 70,
        "failures": 0,
        "success": 7
      },
      "process": {
        "events": 1324,
        "failures": 0,
        "success": 7
      }
    }
  },
  "output": {
    "elasticsearch": {
      "events": {
        "acked": 1445,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 34
        }
      },
      "read": {
        "bytes": 17213,
        "errors": 0
      },
      "write": {
        "bytes": 1185991,
        "errors": 0
      }
    },
    "events": {
      "acked": 1445
    },
    "kafka": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      }
    },
    "logstash": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "messages": {
      "dropped": 0
    },
    "redis": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "write": {
      "bytes": 1185991,
      "errors": 0
    }
  },
  "publisher": {
    "events": {
      "count": 1450
    },
    "queue": {
      "messages": {
        "count": 1450
      }
    }
  }
}
```

**Questions**

* Are these good endpoints paths?
* Which should be our default port?

@ruflin ruflin force-pushed the ruflin:stats-endpoint branch from 8328170 to e20da6a Apr 3, 2017

@urso urso merged commit f214666 into elastic:master Apr 4, 2017

4 of 5 checks passed

codecov/patch 0% of diff hit (target 65.3%)
Details
CLA Commit author has signed the CLA
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details

ruflin added a commit to ruflin/beats that referenced this pull request Apr 4, 2017

Remove metrics endpoint in winlogbeat
The metrics endpoint is replaced by the http endpoint in libbeat. See elastic#3717

andrewkroh added a commit that referenced this pull request Apr 4, 2017

Remove metrics endpoint in winlogbeat (#3901)
* Remove metrics endpoint in winlogbeat

The metrics endpoint is replaced by the http endpoint in libbeat. See #3717

@ruflin ruflin deleted the ruflin:stats-endpoint branch Apr 4, 2017

@ruflin ruflin added the v6.0.0-alpha1 label Apr 4, 2017

ruflin added a commit to ruflin/beats that referenced this pull request Apr 28, 2017

Deprecate the metrics endpoint in Winlogbeat
The metrics endpoint is replaced by the http endpoing for all beats in 6.0. See elastic#3717

ruflin added a commit to ruflin/beats that referenced this pull request Apr 28, 2017

Deprecate the metrics endpoint in Winlogbeat
The metrics endpoint is replaced by the http endpoing for all beats in 6.0. See elastic#3717

andrewkroh added a commit that referenced this pull request May 1, 2017

Deprecate the metrics endpoint in Winlogbeat (#4145)
The metrics endpoint is replaced by the http endpoint for all beats in 6.0. See #3717

monicasarbu added a commit that referenced this pull request Jun 6, 2017

Automatic merge from 5.4 to 5.x branch (#4449)
* Bump version in the 5.x branch to 5.5.0 (#4007)

* Backport codecov file to 5.x (#4040)

This will make sure builds do not go red on 5.x because of some small diffs in coverage.

* Properly shut down crawler in case one prospector is misconfigured (#4037) (#4048)

If one prospector started to already send data and a second one was misconfigured, the beat paniced during shutdown. This is no prevented by properly shutting down the crawler also on error.

Closes #3917
(cherry picked from commit 95195cc)

* Fix link to the MacOSX SDK tarball (#4120) (#4122)

The original download was temporarily down and then it came back up
with a different sha1. Switching to what seems to be a link closer to the
source.

This will require backporting in all branches that need to be built.
(cherry picked from commit 7d15bf3)

* Deprecate the metrics endpoint in Winlogbeat (#4145)

The metrics endpoint is replaced by the http endpoint for all beats in 6.0. See #3717

* elasticsearch: set _type=doc (#3757) (#4191)

The `_type` field is deprecated per
elastic/elasticsearch#15613
(cherry picked from commit bec7603)

* Reduce the number of notifications from travis CI. (#4210) (#4214)

- Disable PR notifications.
- Send failed build notifications.
- Send an update when build transistions from red -> green.
(cherry picked from commit 1844718)

* Fix MongoDB dbstats fields mapping (#4258)

(cherry picked from commit 6b0b077)

* Deprecate document_type in filebeat 5.5 (#4225)

`_type` is removed in elasticsearch 6.0 and `document_type` is removed in filebeat 6.0. We recommend using `fields` instead.

* Ignore permission errors in Metricbeat’s TestFileSystemList (#3562)

The test can fail if some calls to statfs fail due to permission errors. For example:

`stat("/var/lib/docker/aufs/mnt/50d0d5f599f0f19450e7649f73a0e23da1f172048e555df2b1cb78b3fefa355b", 0x7ffd2e5b8ed0) = -1 EACCES (Permission denied)`

* Miscellaneous test fixes

- Fix and enable the python smoke test for heartbeat
- Remove fmt.Printf from metricbeat ceph module
- Fix Windows path issues in libbeat/paths tests
- Fix ioutil.TempDir usage in Packetbeat tests (it broke windows)

* Using single quotes around Windows paths

The thrift test config used double quotes around Windows path separators and this was interpreted incorrectly in YAML parsing.

* Rename TestBadCondition to TestConditions

This test doesn’t actually test any bad conditions. Plus there is another test in the same directory with the name TestBadCondition.

* Remove OS specific error message check from mockbeat

The error message “no such file or directory” is an OS specific error message. There is a different error message on Windows. Simply checking for “error loading config file” should be sufficient.

* Use shorter filename in Filebeat test for Windows

The test was failing on Windows when `os.rename` failed with `[Error 3] The system cannot find the path specified`. The root cause of the failure was that the path was ~260 characters on Jenkins which is greater than the `MAX_PATH` value in Windows. This PR shortens the test log’s name to resolve the issue.

The other changes to normalize the filepath are nice to have for Windows, but not strictly required.

* Add filesystem name to test error message

Errors that are logged by the system/filesystem test case don’t have enough context to debug them. This adds the filesystem that caused the error to the message.

* Less strict error matching in Winlogbeat config_test

Error string testing is brittle. The PR makes the test less stringent by not checking the full error message that includes the Golang stdlib error.

* Use logp.Beta or logp.Experimental in metricsets

And in system tests, centralize the logic for asserting that there are no ERR or WARN in logs.

Filter out errors about “The service process could not connect to the service controller” that occur when testing on Jenkins where Jenkins itself is running as a service. This confuses the Beat because it thinks that it is running as service, but it’s not.

* Fix Winlogbeat test by checking full hostname (#3942) (#4304)

The `computer_name` field in events is the full hostname, but the `win32api.GetComputerName` was returning the shortened netbios name. So the test fail on machines with longer hostnames.
(cherry picked from commit 5c6e623)

* Clean geoip.paths before using the path (#4306)

Use filepath.Clean on the configured paths to fix any invalid OS path separators.

Skip the geoip test with symlinks on Windows (`os.symlink` isn’t supported on Windows).

* Use .go-version to specify the Go version for all CI builds (#4303) (#4307)

Having a simple file that requires no parsing to retrieve the Go version
provides us a standard portable way to know what Go version to use for builds.
It's basically the least common denominator for builds accross CI systems
(Jenkins, AppVeyor, Travis) and operating systems.

Also changed AppVeyor to invalidate the cached Go version only when the
.go-version file changes instead of when the .appveyor.yml changes.

* Fix testing env in the 5.x branch. (#4412)

It was set on 5.4.0 BC, which got removed in the mean time.

* Cherry-pick #4378 to 5.x: Fix parsing of interface options with _ (#4334) (#4411)

* Fix parsing of interface options with _ (#4334) (#4378)

In commit 5547060 linting issues
were addresses and variables containing _ were renamed. This broke
the config parsing.

In packetbeat this seems to effect the with_vlans, bpf_filter and
the buffer_size_mb options. Correct it by adding tags for all
documented variables.
(cherry picked from commit 813466d)

* Allow string characters in browser patch version (#4418)

Both for NGINX and Apache logs

(cherry picked from commit a10c1b7)

* Fix type for HAProxy health.last field (#4410) (#4425)

Fixes #4407. Also adds docs for two fields where docs were missing.
(cherry picked from commit a2ea586)

@tsg tsg referenced this pull request Jul 24, 2017

Closed

Document breaking changes in 6.0 #4737

28 of 28 tasks complete

athom added a commit to athom/beats that referenced this pull request Jan 25, 2018

Expose metrics through http endpoint (elastic#3717)
The following PR exposes beats metrics through a configurable http endpoint. This allows when enabled to get an insight into a running beat. For security reasons the endpoint is off by default.

**Configuration**

The configuration options are in the `http` namespace. This config naming is borrowed from Logstash. By default the http endpoint is disabled. If enabled the metrics are only exposed on localhost on port 5066.

```
http.enabled: false
http.host: localhost
http.port: 5066
```

**-httpprof ?**

The http endpoint can be enabled also in production if needed. The additional endpoint httpprof endpoint which can be enabled through `-httpprof` still exists but is only recommended for debugging purpose. The httpprof endpoint exposes many more metrics and runtime data then the metrics endpoint.

**Endpoints**

The current implementation has two endpoints:

* `/`: The standard endpoint exposes info about the beat
* `/stats`: Stats exposes all metrics collected by monitoring

The output of the data is in json. The flag `?pretty` can be used to have formatted json as output to make it more human readable. Below is an example of each endpoint.

**/**

```
{
  "beat": "metricbeat",
  "hostname": "ruflin",
  "name": "ruflin",
  "uuid": "9d6e0c3a-1677-424c-aead-097e597e09f9",
  "version": "6.0.0-alpha1"
}
```

**/stat**

```
{
  "beat": {
    "memstats": {
      "gc_next": 6262080,
      "memory_alloc": 3879968,
      "memory_total": 479284520
    }
  },
  "libbeat": {
    "config": {
      "module": {
        "running": 0,
        "starts": 0,
        "stops": 0
      },
      "reloads": 0
    }
  },
  "metricbeat": {
    "system": {
      "cpu": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "filesystem": {
        "events": 28,
        "failures": 0,
        "success": 7
      },
      "fsstat": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "load": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "memory": {
        "events": 7,
        "failures": 0,
        "success": 7
      },
      "network": {
        "events": 70,
        "failures": 0,
        "success": 7
      },
      "process": {
        "events": 1324,
        "failures": 0,
        "success": 7
      }
    }
  },
  "output": {
    "elasticsearch": {
      "events": {
        "acked": 1445,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 34
        }
      },
      "read": {
        "bytes": 17213,
        "errors": 0
      },
      "write": {
        "bytes": 1185991,
        "errors": 0
      }
    },
    "events": {
      "acked": 1445
    },
    "kafka": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      }
    },
    "logstash": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "publishEvents": {
        "call": {
          "count": 0
        }
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "messages": {
      "dropped": 0
    },
    "redis": {
      "events": {
        "acked": 0,
        "not_acked": 0
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "write": {
      "bytes": 1185991,
      "errors": 0
    }
  },
  "publisher": {
    "events": {
      "count": 1450
    },
    "queue": {
      "messages": {
        "count": 1450
      }
    }
  }
}
```

**Questions**

* Are these good endpoints paths?
* Which should be our default port?

athom added a commit to athom/beats that referenced this pull request Jan 25, 2018

Remove metrics endpoint in winlogbeat (elastic#3901)
* Remove metrics endpoint in winlogbeat

The metrics endpoint is replaced by the http endpoint in libbeat. See elastic#3717

@tsg tsg referenced this pull request Jan 31, 2018

Closed

Beats central monitoring Phase 1 #3422

10 of 10 tasks complete

@russorat russorat referenced this pull request Mar 23, 2018

Open

adding inputs.filebeat plugin #3751

3 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.