Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GeoIpDownloader startup during rolling upgrade #84000

Merged
merged 4 commits into from
Feb 16, 2022

Conversation

probakowski
Copy link
Contributor

If rolling upgrade was used from version prior GeoIPv2 (<7.14) then geoip downloader wouldn't be started so no new databases were downloaded. This is especially troubling in 8.x as we no longer provide default databases inside ES so after upgrade no geoip enrichment can take place until downloader is started with workaround (setting ingest.geoip.downloader.enabled to false and true again).
This is because logic that was used to lower number of requests / cluster update listeners at the startup was too optimistic about order of actions / who can be elected master at what time.
This change fixes that and also cleans up logs when there are some ignorable errors and adds debug logging on start and stop of the task to ease up troubleshooting.
It also adds rolling upgrade test to make sure the fix works.

@probakowski probakowski added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.17.1 v8.2.0 v8.1.1 labels Feb 15, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 15, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @probakowski, I've created a changelog YAML for you.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

clusterService.removeListener(this);
if (event.localNodeMaster()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this if check is safe here. Now any node will attempt to start the downloader task if it hasn't started all ready (the start task call is executed on elected master node). This addresses the problem that we was reported, since each time an upgraded node joins the cluster, the geoip downloader task is attempted to be started. So it is unlikely that geoip downloader doesn't exist if cluster has been upgraded.

@probakowski probakowski added v8.0.1 auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Feb 16, 2022
@elasticsearchmachine elasticsearchmachine merged commit 01eba38 into elastic:master Feb 16, 2022
@probakowski probakowski deleted the fix-geoip-start branch February 16, 2022 13:38
probakowski added a commit to probakowski/elasticsearch that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
probakowski added a commit to probakowski/elasticsearch that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.17
8.0
8.1

probakowski added a commit to probakowski/elasticsearch that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
elasticsearchmachine pushed a commit that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
elasticsearchmachine pushed a commit that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
elasticsearchmachine pushed a commit that referenced this pull request Feb 16, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 16, 2022
…ijun/elasticsearch into fix-none-tsdb-index-dimension-tests

* 'fix-none-tsdb-index-dimension-tests' of github.com:weizijun/elasticsearch: (37 commits)
  [docs] Mention JDK 17 in the Contributing docs (elastic#84018)
  Fix GeoIpDownloader startup during rolling upgrade (elastic#84000)
  Script: Fields API for Dense Vector (elastic#83550)
  Move InferenceConfigUpdate under VersionedNamedWriteable (elastic#84022)
  [ML] Fix license feature test cleanup (elastic#84020)
  Replace deprecated api in artifact transforms (elastic#84015)
  QL: Add leniency option to SQL CLI (elastic#83795)
  [Stack Monitoring] add kibana_stats version alias to -mb template (elastic#83930)
  Optimize spliterator for ImmutableOpenMap (elastic#83899)
  Feature usage actions for archive (elastic#83931)
  Use latch to speedup multi feature migration test (elastic#84007)
  Make action names available in NodeClient (elastic#83919)
  [DOCS] Re-add HTTP proxy setings from elastic#82737 (elastic#84001)
  Add CI matrix configuration for snapshot BWC versions (elastic#83990)
  Update YAML Rest tests to check for product header on all responses (elastic#83290)
  TSDB: Add time series aggs cancellation (elastic#83492)
  [DOCS] Fix percolate query headings (elastic#83988)
  [DOCS] Move tip for percolate query example (elastic#83972)
  Simplify LocalExporter cleaner function to fix failing tests (elastic#83812)
  [GCE Discovery] Correcly handle large zones with 500 or more instances (elastic#83785)
  ...
probakowski added a commit to probakowski/elasticsearch that referenced this pull request Feb 23, 2022
If rolling upgrade was used from version prior GeoIPv2 (<`7.14`) then
geoip downloader wouldn't be started so no new databases were
downloaded. This is especially troubling in `8.x` as we no longer
provide default databases inside ES so after upgrade no geoip enrichment
can take place until downloader is started with workaround (setting
`ingest.geoip.downloader.enabled` to `false` and `true` again). This is
because logic that was used to lower number of requests / cluster update
listeners at the startup was too optimistic about order of actions / who
can be elected master at what time.  This change fixes that and also
cleans up logs when there are some ignorable errors and adds debug
logging on start and stop of the task to ease up troubleshooting. It
also adds rolling upgrade test to make sure the fix works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v7.17.1 v8.0.1 v8.1.1 v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants