Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[heartbeat] States and Improved Errors #30632

Merged
merged 129 commits into from Sep 13, 2022
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
dee25ff
Moar intervals
andrewvc Jul 4, 2019
a3a7cbd
Checkpoint
andrewvc Jul 8, 2019
72fab36
Checkpoint
andrewvc Jul 9, 2019
22e7c12
Checkpoint
andrewvc Jul 11, 2019
aab5d8a
Merge remote-tracking branch 'origin/master' into intervals
andrewvc Sep 13, 2019
b46f81d
[Heartbeat] Report next_run info per event
andrewvc Sep 13, 2019
e90a046
Add changelog
andrewvc Sep 13, 2019
cdeac64
Incorporate PR feedback
andrewvc Sep 18, 2019
077be58
Checkpoint
andrewvc Oct 28, 2019
b06323f
Checkpoint
andrewvc Oct 28, 2019
a371f6e
Just report the timespan
andrewvc Nov 26, 2019
9c8dbe5
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Nov 26, 2019
3e4e8ea
fix tests
andrewvc Nov 26, 2019
c9b39cb
fix relnote
andrewvc Nov 26, 2019
a00aa0f
Fmt
andrewvc Nov 26, 2019
e1d0c65
Tweaks
andrewvc Nov 26, 2019
da1fde1
Factor timeout into timespans
andrewvc Nov 26, 2019
4e08b3d
fmt
andrewvc Nov 26, 2019
51bffaa
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Dec 8, 2019
f74c85d
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Dec 11, 2019
1a7c1f4
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Dec 12, 2019
0150408
Don't require docs on date_range sub-keys
andrewvc Dec 12, 2019
95acb3c
Remove print
andrewvc Dec 12, 2019
d565042
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Dec 13, 2019
7b2162a
FMT
andrewvc Dec 13, 2019
0b28b6f
Merge remote-tracking branch 'origin/master' into intervals
andrewvc Dec 16, 2019
8feb085
Checkpoint
andrewvc Dec 16, 2019
9232921
Merge remote-tracking branch 'origin/master' into next-run-range
andrewvc Dec 16, 2019
7bf22b5
Merge branch 'next-run-range' into intervals
andrewvc Dec 16, 2019
89563ea
Merge remote-tracking branch 'origin/master' into intervals
andrewvc Dec 16, 2019
9cadda9
Merge remote-tracking branch 'origin/master' into intervals
andrewvc Aug 11, 2020
e744853
Merge commit '9cadda912e93fa0a21bf02e1164dcf96e0b9c606' into tv2
andrewvc Mar 2, 2022
67a4191
More
andrewvc Mar 10, 2022
5a15d3c
[DOCS] Removed reference to the Stack GS (#32083)
debadair Jun 23, 2022
a55ae58
[DOCS] Removed reference to the Stack GS (#32119)
debadair Jun 27, 2022
9cbb78e
Merge branch 'main', remote-tracking branch 'origin' into tv2
andrewvc Jul 6, 2022
de4f4de
Unblock blocked monitors test
andrewvc Jul 6, 2022
1814865
Works for browsers
andrewvc Jul 6, 2022
e0c8bc5
Almost working
andrewvc Jul 7, 2022
8bf1b05
more is working
andrewvc Jul 7, 2022
6751721
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 8, 2022
83e26b6
Clean-up initialization of ES client
andrewvc Aug 8, 2022
844460c
Many cleanups
andrewvc Aug 8, 2022
3a1d963
Checkpoint for flapping refactor
andrewvc Aug 8, 2022
21d6062
Checkpoint
andrewvc Aug 9, 2022
26eac4e
Fix tracker, add basic tracker tests, plus type for loading past state.
andrewvc Aug 10, 2022
9af04c3
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 11, 2022
5fdba6c
Checkpoint
andrewvc Aug 11, 2022
4da0d7e
Checkpoint
andrewvc Aug 11, 2022
6f4c20a
ES kinda works in testing.
andrewvc Aug 11, 2022
4e4af45
checkpoint
andrewvc Aug 11, 2022
ec133a5
Cleanup and refactor
andrewvc Aug 12, 2022
5b7c268
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 12, 2022
bed5fbf
Changelog
andrewvc Aug 12, 2022
d42b383
Cleanup dev tools usage
andrewvc Aug 12, 2022
8ac7d40
Group imports
andrewvc Aug 12, 2022
d6ed347
Cleanups
andrewvc Aug 12, 2022
79ac24a
Cleanups
andrewvc Aug 12, 2022
102db33
Cleanups
andrewvc Aug 12, 2022
c7178cc
Cleanups
andrewvc Aug 12, 2022
a7ff0e2
Cleanups
andrewvc Aug 12, 2022
1ab5ff5
Cleanups
andrewvc Aug 12, 2022
1163d53
Cleanups
andrewvc Aug 12, 2022
cf99c89
Update and add run xml
andrewvc Aug 12, 2022
007b607
Cleanups
andrewvc Aug 12, 2022
701f48c
More inclusive index pattern
andrewvc Aug 12, 2022
00199f8
More inclusive index pattern
andrewvc Aug 12, 2022
0247783
Tweaks
andrewvc Aug 12, 2022
55ae272
Tweaks
andrewvc Aug 12, 2022
ce39e3f
Tweaks for flapping
andrewvc Aug 12, 2022
cfb33c0
Fix infinite storage growth
andrewvc Aug 12, 2022
6a2e430
Fix infinite storage growth
andrewvc Aug 12, 2022
372281d
Add tests for transitions
andrewvc Aug 12, 2022
4eff88d
FMT
andrewvc Aug 12, 2022
4262352
FMT
andrewvc Aug 12, 2022
9723532
Update and refine ECS types / checks to more precisely test ECS errors
andrewvc Aug 15, 2022
f22b067
Fix state ends
andrewvc Aug 15, 2022
c480df1
Fix test failures
andrewvc Aug 15, 2022
7565406
Use error codes for most test situations
andrewvc Aug 15, 2022
56a0cd2
Fixup connection errors in HTTP tests
andrewvc Aug 15, 2022
2af1a0a
Fix broken HTTP errors
andrewvc Aug 15, 2022
8e67dd9
Make linter happy
andrewvc Aug 16, 2022
107a52b
Make linter happy, remove runner XML
andrewvc Aug 17, 2022
bff7ac8
Make linter happy
andrewvc Aug 17, 2022
55cbff0
Make linter happy
andrewvc Aug 17, 2022
363a6f8
Fix integration test targeting
andrewvc Aug 17, 2022
8500f22
Make linter happy
andrewvc Aug 17, 2022
131e187
Integration tests fixed in non-xpack heartbeat
andrewvc Aug 17, 2022
c446705
Add empty pythonIntegTest mage targets to make CI happy
andrewvc Aug 17, 2022
43dd0fc
Remove flap history field from state.ends
andrewvc Aug 17, 2022
a05c1fd
Fix nesting of state.ends
andrewvc Aug 17, 2022
c48d71a
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 17, 2022
e298ac6
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 29, 2022
54c7705
Add geo config
andrewvc Aug 29, 2022
c8f8a82
Initial tests and functionality for geo per monitor
andrewvc Aug 29, 2022
0211a38
ES loader now supports locations
andrewvc Aug 29, 2022
73b435d
FMT
andrewvc Aug 29, 2022
315b1cf
Disable flapping by default, add tests for this
andrewvc Aug 30, 2022
305c837
Fix broken integ tests, refactor integ framework
andrewvc Aug 30, 2022
a27c7ec
Initial work for scenario tests with ES
andrewvc Aug 30, 2022
7f010b1
Format
andrewvc Aug 30, 2022
eb0de76
Checkpoint
andrewvc Aug 30, 2022
3e285df
Fix stateloader scenario tests
andrewvc Aug 30, 2022
3eede87
Cleanup framework
andrewvc Aug 30, 2022
8ff8267
Make linter happy
andrewvc Aug 30, 2022
fdc4507
Make linter happy
andrewvc Aug 30, 2022
2fbe2e0
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Aug 31, 2022
27435f0
Fix windows targetting
andrewvc Aug 31, 2022
3739f24
Restrict all browser source from win builds
andrewvc Aug 31, 2022
e9386a4
Fix win deps
andrewvc Aug 31, 2022
1194b06
Merge branch 'main' into tv2
andrewvc Sep 1, 2022
2e952f1
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Sep 6, 2022
793eb39
Incorporate PR feedback
andrewvc Sep 7, 2022
aa5e7dc
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Sep 7, 2022
b9f7590
Apply suggestions from code review
andrewvc Sep 7, 2022
cb8bce3
Merge remote-tracking branch 'andrewvc/tv2' into tv2
andrewvc Sep 7, 2022
70bd444
Update heartbeat/monitors/wrappers/monitorstate/tracker.go
andrewvc Sep 7, 2022
c11982f
Incorporate PR feedback
andrewvc Sep 7, 2022
903c841
Merge remote-tracking branch 'andrewvc/tv2' into tv2
andrewvc Sep 7, 2022
ee08dc7
Remove unnecessary state loader assignment
andrewvc Sep 8, 2022
eed6a89
Remove browser from win tests
andrewvc Sep 8, 2022
6ecac3e
Fix state loader to only use ES state loader with ES output
andrewvc Sep 9, 2022
bc39d5a
Don't run integ tests on windows
andrewvc Sep 9, 2022
045969b
Revert "ci: enable windows for testing heartbeat (#32937)"
andrewvc Sep 12, 2022
779da1b
Rename monitor.location to monitor.run_from and add tests for observe…
andrewvc Sep 12, 2022
04d29e5
Merge branch 'main' into tv2
andrewvc Sep 13, 2022
a404afa
Incorporate PR feedback
andrewvc Sep 13, 2022
816734d
Merge remote-tracking branch 'origin/main' into tv2
andrewvc Sep 13, 2022
34783dc
Merge remote-tracking branch 'andrewvc/tv2' into tv2
andrewvc Sep 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Expand Up @@ -136,6 +136,7 @@ https://github.com/elastic/beats/compare/v8.2.0\...main[Check the HEAD diff]


*Heartbeat*
- Add new states field for internal use by new synthetics app. {pull}30632[30632]


*Metricbeat*
Expand Down
69 changes: 67 additions & 2 deletions heartbeat/_meta/fields.common.yml
Expand Up @@ -65,7 +65,6 @@
type: keyword
description: >
Indicator if monitor could validate the service to be available.

- name: check_group
type: keyword
description: >
Expand All @@ -89,11 +88,77 @@
- name: id
type: keyword
description: Project ID
fields:
- name: name
type: keyword
description: Project name

- key: state
title: "Monitor state"
description: state related fields
fields:
- name: state
type: group
description: "Present in the last event emitted during a check. If a monitor checks multiple endpoints, as is the case with `mode: all`."
fields:
- name: id
type: keyword
description: >
ID of this state
- name: started_at
type: date
description: >
First time state with this ID was seen
- name: duration_ms
type: date
description: >
Length of time this state has existed in millis
- name: status
type: keyword
description: >
The current status, "up", "down", or "flapping"
any state can change into flapping.
- name: checks
type: integer
description: total checks run
- name: up
type: integer
description: total up checks run
- name: down
type: integer
description: total down checks run
- name: flap_history
enabled: false
- name: ends
type: object
description: the state that was ended by this state
fields:
- name: id
type: integer
description: >
ID of this state
- name: started_at
type: date
description: >
First time state with this ID was seen
- name: duration_ms
type: date
description: >
Length of time this state has existed in millis
- name: status
type: keyword
description: >
The current status, "up", "down", or "flapping"
any state can change into flapping.
- name: checks
type: integer
description: total checks run
- name: up
type: integer
description: total up checks run
- name: down
type: integer
description: total down checks run

- key: summary
title: "Monitor summary"
description:
Expand Down
66 changes: 53 additions & 13 deletions heartbeat/beater/heartbeat.go
Expand Up @@ -20,26 +20,27 @@ package beater
import (
"errors"
"fmt"

"syscall"
"time"

"github.com/elastic/beats/v7/libbeat/publisher/pipeline"

conf "github.com/elastic/elastic-agent-libs/config"
"github.com/elastic/elastic-agent-libs/logp"
"github.com/elastic/go-elasticsearch/v8"

"github.com/elastic/beats/v7/heartbeat/config"
"github.com/elastic/beats/v7/heartbeat/hbregistry"
"github.com/elastic/beats/v7/heartbeat/monitors"
"github.com/elastic/beats/v7/heartbeat/monitors/plugin"
"github.com/elastic/beats/v7/heartbeat/monitors/wrappers/monitorstate"
"github.com/elastic/beats/v7/heartbeat/scheduler"
_ "github.com/elastic/beats/v7/heartbeat/security"
"github.com/elastic/beats/v7/libbeat/autodiscover"
"github.com/elastic/beats/v7/libbeat/beat"
"github.com/elastic/beats/v7/libbeat/cfgfile"
"github.com/elastic/beats/v7/libbeat/common/reload"
"github.com/elastic/beats/v7/libbeat/management"

_ "github.com/elastic/beats/v7/heartbeat/security"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline"
)

// Heartbeat represents the root datastructure of this beat.
Expand All @@ -49,12 +50,29 @@ type Heartbeat struct {
config config.Config
scheduler *scheduler.Scheduler
monitorReloader *cfgfile.Reloader
dynamicFactory *monitors.RunnerFactory
monitorFactory *monitors.RunnerFactory
autodiscover *autodiscover.Autodiscover
}

type EsConfig struct {
Hosts []string `config:"hosts"`
Username string `config:"username"`
Password string `config:"password"`
}

// New creates a new heartbeat.
func New(b *beat.Beat, rawConfig *conf.C) (beat.Beater, error) {
esc, err := getESClient(b.Config.Output.Config())
if err != nil {
return nil, err
}
var stateLoader monitorstate.StateLoader
if esc != nil {
stateLoader = monitorstate.MakeESLoader(esc, "synthetics-*,heartbeat-*")
} else {
stateLoader = monitorstate.NilStateLoader
}

parsedConfig := config.DefaultConfig
if err := rawConfig.Unpack(&parsedConfig); err != nil {
return nil, fmt.Errorf("error reading config file: %w", err)
Expand Down Expand Up @@ -89,8 +107,9 @@ func New(b *beat.Beat, rawConfig *conf.C) (beat.Beater, error) {
done: make(chan struct{}),
config: parsedConfig,
scheduler: sched,
// dynamicFactory is the factory used for dynamic configs, e.g. autodiscover / reload
dynamicFactory: monitors.NewFactory(b.Info, sched.Add, plugin.GlobalPluginsReg, pipelineClientFactory),
// monitorFactory is the factory used for creating all monitor instances,
// wiring them up to everything needed to actually execute.
monitorFactory: monitors.NewFactory(b.Info, sched.Add, stateLoader, plugin.GlobalPluginsReg, pipelineClientFactory),
}
return bt, nil
}
Expand Down Expand Up @@ -158,7 +177,7 @@ func (bt *Heartbeat) Run(b *beat.Beat) error {
func (bt *Heartbeat) RunStaticMonitors(b *beat.Beat) (stop func(), err error) {
runners := make([]cfgfile.Runner, 0, len(bt.config.Monitors))
for _, cfg := range bt.config.Monitors {
created, err := bt.dynamicFactory.Create(b.Publisher, cfg)
created, err := bt.monitorFactory.Create(b.Publisher, cfg)
if err != nil {
if errors.Is(err, monitors.ErrMonitorDisabled) {
logp.L().Info("skipping disabled monitor: %s", err)
Expand All @@ -182,21 +201,21 @@ func (bt *Heartbeat) RunStaticMonitors(b *beat.Beat) (stop func(), err error) {

// RunCentralMgmtMonitors loads any central management configured configs.
func (bt *Heartbeat) RunCentralMgmtMonitors(b *beat.Beat) {
mons := cfgfile.NewRunnerList(management.DebugK, bt.dynamicFactory, b.Publisher)
mons := cfgfile.NewRunnerList(management.DebugK, bt.monitorFactory, b.Publisher)
reload.Register.MustRegisterList(b.Info.Beat+".monitors", mons)
inputs := cfgfile.NewRunnerList(management.DebugK, bt.dynamicFactory, b.Publisher)
inputs := cfgfile.NewRunnerList(management.DebugK, bt.monitorFactory, b.Publisher)
reload.Register.MustRegisterList("inputs", inputs)
}

// RunReloadableMonitors runs the `heartbeat.config.monitors` portion of the yaml config if present.
func (bt *Heartbeat) RunReloadableMonitors() (err error) {
// Check monitor configs
if err := bt.monitorReloader.Check(bt.dynamicFactory); err != nil {
if err := bt.monitorReloader.Check(bt.monitorFactory); err != nil {
logp.Error(fmt.Errorf("error loading reloadable monitors: %w", err))
}

// Execute the monitor
go bt.monitorReloader.Run(bt.dynamicFactory)
go bt.monitorReloader.Run(bt.monitorFactory)

return nil
}
Expand All @@ -206,7 +225,7 @@ func (bt *Heartbeat) makeAutodiscover(b *beat.Beat) (*autodiscover.Autodiscover,
ad, err := autodiscover.NewAutodiscover(
"heartbeat",
b.Publisher,
bt.dynamicFactory,
bt.monitorFactory,
autodiscover.QueryConfig(),
bt.config.Autodiscover,
b.Keystore,
Expand All @@ -221,3 +240,24 @@ func (bt *Heartbeat) makeAutodiscover(b *beat.Beat) (*autodiscover.Autodiscover,
func (bt *Heartbeat) Stop() {
close(bt.done)
}

// getESClient returns an ES client if one is configured. Will return nil, nil, if none is configured.
func getESClient(outputConfig *conf.C) (esc *elasticsearch.Client, err error) {
esConfig := EsConfig{}
err = outputConfig.Unpack(&esConfig)
if err != nil {
logp.L().Info("output is not elasticsearch, error / state tracking will not be enabled: %w", err)
return nil, nil
}
esc, err = elasticsearch.NewClient(elasticsearch.Config{
Addresses: esConfig.Hosts,
Username: esConfig.Username,
Password: esConfig.Password,
})
if err != nil {
return nil, fmt.Errorf("could not initialize elasticsearch client: %w", err)
}
logp.L().Infof("successfully connected to elasticsearch for error / state tracking: %v", esConfig.Hosts)

return esc, nil
}
26 changes: 3 additions & 23 deletions heartbeat/docker-compose.yml
@@ -1,39 +1,19 @@
version: '2.3'
services:
beat:
build: ${PWD}/.
depends_on:
- proxy_dep
environment:
- REDIS_HOST=redis
- REDIS_PORT=6379
- ES_HOST=elasticsearch
- ES_USER=heartbeat_user
- ES_PASS=testing
- ES_PORT=9200
working_dir: /go/src/github.com/elastic/beats/heartbeat
volumes:
- ${PWD}/..:/go/src/github.com/elastic/beats/
# We launch docker containers to test docker autodiscover:
- /var/run/docker.sock:/var/run/docker.sock
command: make

# This is a proxy used to block beats until all services are healthy.
# See: https://github.com/docker/compose/issues/4369
proxy_dep:
image: busybox
depends_on:
elasticsearch: { condition: service_healthy }
redis: { condition: service_healthy }

elasticsearch:
extends:
file: ${ES_BEATS}/testing/environments/${TESTING_ENVIRONMENT}.yml
file: ${ES_BEATS}/testing/environments/${STACK_ENVIRONMENT}.yml
service: elasticsearch
healthcheck:
test: ["CMD-SHELL", "curl -u admin:testing -s http://localhost:9200/_cat/health?h=status | grep -q green"]
retries: 300
interval: 1s

redis:
build: ${PWD}/tests/docker_support/redis
ports:
- 9200:9200