Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not run sysctl for vm.max_map_count when its already set #31285

Conversation

majormoses
Copy link
Contributor

@majormoses majormoses commented Jun 13, 2018

Context: This fails in a docker container otherwise. The change was inspired by issues getting it working and after researching I found a similar change applied to another project: spujadas/elk-docker#106 the major difference here is that we include equality in the condition rather than only if its greater than.

This was the error I encountered:

    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    service[elasticsearch] (/opt/kitchen/cache/cookbooks/elasticsearch/libraries/provider_service.rb line 132) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
    ---- Begin output of /etc/init.d/elasticsearch start ----
    STDOUT: * Starting Elasticsearch Server
       ...fail!
    STDERR: /etc/init.d/elasticsearch: line 122: ulimit: max locked memory: cannot modify limit: Operation not permitted
    sysctl: setting key "vm.max_map_count": Read-only file system
    ---- End output of /etc/init.d/elasticsearch start ----
    Ran /etc/init.d/elasticsearch start returned 1

After applying the patch I was able to see that error go away (other errors still but one at a time):

root@dokken:/# service elasticsearch start
 * Starting Elasticsearch Server                                            /etc/init.d/elasticsearch: line 122: ulimit: max locked memory: cannot modify limit: Operation not permitted
                                                                     [fail]

Signed-off-by: Ben Abrams me@benabrams.it

@jasontedor
Copy link
Member

I don't think this is needed. Rather, I think that you can mask systemd-sysctl.service (systemctl mask systemd-sysctl.service; install Elasticsearch; systemctl unmask systemd-sysctl.service). Would you try?

@jasontedor jasontedor added feedback_needed :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Jun 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@majormoses
Copy link
Contributor Author

majormoses commented Jun 13, 2018

@jasontedor This is for a sysv-init not a systemd setup and as such I don't have systemctl in the container:

root@dokken:/# which systemctl
root@dokken:/# 

@jasontedor
Copy link
Member

Oh, I missed that; sorry.

Would you add a packaging test for this please?

See for example:

@test "[SYSTEMD] masking systemd-sysctl" {
clean_before_test
systemctl mask systemd-sysctl.service
install_package
systemctl unmask systemd-sysctl.service
}

and check the contributing docs for how to run these tests. You can also push such a change to the PR and we can run the tests in our CI, but that might be painful to iterate on.

Effectively we would want a test here that shows that if the vm.max_map_count exceeds 262144 then it is still that value after Elasticsearch is installed.

@majormoses
Copy link
Contributor Author

Ya I can take a stab at it, I see two options regarding testing:

  1. set vm.max_map_count to some ridiculously low number and then verify that after starting the service it has been changed. The advantage to this test is that if the default values change in the future the test will keep on working without needing a tweak. As the trend is that these numbers are going up rather than down I think this is more future proof.
  2. set MAX_MAP_COUNT in /etc/default/elasticsearch or /etc/sysconfig/elasticsearch depending on the distribution. As this has more complexity and could need updating if we bump the default number again in the future I would prefer to not use this method if the above works for you.

@jasontedor
Copy link
Member

Thank you for this PR, and thank you for agreeing to try contributing a test.

I think that a test along the lines of what you describe in option one is good. I do think that there should be a test though for when the current value already exceeds what we set it to since the point of your change is that we do not do anything in that case?

@majormoses
Copy link
Contributor Author

majormoses commented Jun 13, 2018

I do think that there should be a test though for when the current value already exceeds what we set it to since the point of your change is that we do not do anything in that case?

That makes sense, I will try look through it and figure out what's the easiest way to test that depending on distribution. If you happen to know if there is some helper function that would allow me to specify it in an distribution agnostic method that would save me some trouble.

I will work on the first test as its easier and demonstrates this is not a breaking change and will then work on the test to ensure that it does what I expect.

@jasontedor
Copy link
Member

That makes sense, I will try look through it and figure out what's the easiest way to test that depending on distribution. If you happen to know if there is some helper function that would allow me to specify it in an distribution agnostic method that would save me some trouble.

It is okay; I think you can call sysctl to set the value of vm.max_map_count before the test starts to some pre-defined value rather than jumping through hoops to know what the defaults are on all of the boxes that we test on.

@majormoses
Copy link
Contributor Author

majormoses commented Jun 13, 2018

@jasontedor any chance you can give me your username in IRC so I can hit you up? If you prefer to hit me up I a share my github and irc username. I am having difficulty getting all the dependencies in place locally. While I could just push and have you run it on your CI I'd rather learn to fish.

@majormoses majormoses force-pushed the feature/init-dont-run-sysctl-max_map_count-if-set branch from ce8a670 to 5c95fe7 Compare June 13, 2018 03:40
@jasontedor
Copy link
Member

@majormoses I appreciate your dedication here.

I would be happy to help yet it’s quite late in my timezone. Could we pick this up tomorrow? If you send to me an email at my first name at elastic.co we can connect.

@majormoses majormoses force-pushed the feature/init-dont-run-sysctl-max_map_count-if-set branch 3 times, most recently from 8ebcfc7 to 73c50fc Compare June 13, 2018 03:58
@alpar-t
Copy link
Contributor

alpar-t commented Jun 13, 2018

elastic bot: run all packaging tests

@majormoses
Copy link
Contributor Author

I am working on trying to get them running locally but the failure seems to be a timeout rather than a failed spec.

Build timed out (after 400 minutes). Marking the build as failed.

@majormoses majormoses force-pushed the feature/init-dont-run-sysctl-max_map_count-if-set branch 2 times, most recently from c8898d5 to c51f136 Compare June 13, 2018 23:43
…e same or greater value.

Context: This fails in a docker container otherwise.

Signed-off-by: Ben Abrams <me@benabrams.it>
@majormoses majormoses force-pushed the feature/init-dont-run-sysctl-max_map_count-if-set branch from c51f136 to 9d82a67 Compare June 13, 2018 23:55
@majormoses
Copy link
Contributor Author

I got the tests working locally:

./gradlew packagingTest -Pvagrant.boxes=centos-6,ubuntu-1404
# intentionally truncated
BUILD SUCCESSFUL in 54m 48s
396 actionable tasks: 97 executed, 299 up-to-date

I will create a gist and link it here.

@majormoses
Copy link
Contributor Author

@majormoses
Copy link
Contributor Author

elastic bot: run all packaging tests

Not sure if its open to the public but it was worth a try.

@jasontedor
Copy link
Member

It is not otherwise that would be a vehicle for remote code execution on our infrastructure.

@majormoses
Copy link
Contributor Author

majormoses commented Jun 14, 2018

It is not otherwise that would be a vehicle for remote code execution on our infrastructure.

makes sense, if it was ephemeral it probably would not matter beyond people using it to do something like spinning up crypto miners but better to be safe than sorry.

@jasontedor
Copy link
Member

@elasticmachine test this please

* elastic/master: (29 commits)
  [DOC] Extend SQL docs
  Immediately flush channel after writing to buffer (elastic#31301)
  [DOCS] Shortens ML API intros
  Use quotes in the call invocation (elastic#31249)
  move security ingest processors to a sub ingest directory (elastic#31306)
  Add 5.6.11 version constant.
  Fix version detection.
  SQL: Whitelist SQL utility class for better scripting (elastic#30681)
  [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (elastic#31299)
  CCS: don't proxy requests for already connected node (elastic#31273)
  Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap
  [test] opensuse packaging turn up debug logging
  Add unreleased version 6.3.1
  Removes experimental tag from scripted_metric aggregation (elastic#31298)
  [Rollup] Metric config parser must use builder so validation runs (elastic#31159)
  [ML] Check licence when datafeeds use cross cluster search  (elastic#31247)
  Add notion of internal index settings (elastic#31286)
  Test: Remove broken yml test feature (elastic#31255)
  REST hl client: cluster health to default to cluster level (elastic#31268)
  [ML] Update test thresholds to account for changes to memory control (elastic#31289)
  ...
@jasontedor
Copy link
Member

@elasticmachine test this please

@jasontedor
Copy link
Member

I merged master into your branch and started a new build because the current one was not going to succeed on the BWC aspect of the build given that we bumped versions a few hours ago.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and the tests look good; thanks so much for writing them and getting them working. 💯

I left one comment.

max_map_count=$(sysctl -n vm.max_map_count)
stop_elasticsearch_service

[ $max_map_count != 100 ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with us asserting that this is 262144.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally do not think it would be better as if the defaults change this is one more test to update and if its not 100 and the previous command did not error (which I assume would trigger a failed build) it should essentially be the same but if you feel otherwise I can certainly change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, today we want to ensure that the value is set to what we specify: 262144. It's not much overhead to have to change this if we were to make a change to the default here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

* elastic/master:
  More detailed tracing when writing metadata (elastic#31319)
  [Tests] Mutualize fixtures code in BaseHttpFixture (elastic#31210)
  Remove RestGetAllAliasesAction (elastic#31308)
  Temporary fix for broken build
  Reenable Checkstyle's unused import rule (elastic#31270)
  Remove remaining unused imports before merging elastic#31270
  Fix non-REST doc snippet
@jasontedor
Copy link
Member

@elasticmachine test this please

@jasontedor
Copy link
Member

@elasticmachine test this please

@jasontedor
Copy link
Member

@elasticmachine run all packaging tests

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jasontedor jasontedor merged commit 87a676e into elastic:master Jun 15, 2018
jasontedor pushed a commit that referenced this pull request Jun 15, 2018
This commit modifies the Sys V init startup scripts to only modify
vm.max_map_count if needed. In this case, needed means that the current
value is less than our default value of 262144 maps.
jasontedor pushed a commit that referenced this pull request Jun 15, 2018
This commit modifies the Sys V init startup scripts to only modify
vm.max_map_count if needed. In this case, needed means that the current
value is less than our default value of 262144 maps.
jasontedor pushed a commit that referenced this pull request Jun 15, 2018
This commit modifies the Sys V init startup scripts to only modify
vm.max_map_count if needed. In this case, needed means that the current
value is less than our default value of 262144 maps.
@jasontedor
Copy link
Member

Thanks @majormoses.

tlrx added a commit that referenced this pull request Jun 15, 2018
* master:
  992c788 Uncouple persistent task state and status (#31031)
  8c6ee7d Describe how to add a plugin in Dockerfile (#31340)
  1c5cec0 Remove http status code maps (#31350)
  87a676e Do not set vm.max_map_count when unnecessary (#31285)
  e5b7137 TEST: getCapturedRequestsAndClear should be atomic (#31312)
  0324103 Painless: Fix bug for static method calls on interfaces (#31348)
  d6d0727 QA: Fix resolution of default distribution (#31351)
  fcf1e41 Extract common http logic to server (#31311)
  6dd81ea Build: Fix the license in the pom zip and tar (#31336)
  8f886cd Treat ack timeout more like a publish timeout (#31303)
  9b29327 [ML] Add description to ML filters (#31330)
  f7a0caf SQL: Fix build on Java 10
  375d09c [TEST] Fix RemoteClusterClientTests#testEnsureWeReconnect
  4877cec More detailed tracing when writing metadata (#31319)
  bbfe1ec [Tests] Mutualize fixtures code in BaseHttpFixture (#31210)
tlrx added a commit that referenced this pull request Jun 15, 2018
* 6.x:
  6d711fa (origin/6.x, 6.x) Uncouple persistent task state and status (#31031)
  f0f16b7 [TEST] Make SSL restrictions update atomic (#31050)
  652193f Describe how to add a plugin in Dockerfile (#31340)
  bb568ab TEST: getCapturedRequestsAndClear should be atomic (#31312)
  6f94914 Do not set vm.max_map_count when unnecessary (#31285)
  c12f3c7 Painless: Fix bug for static method calls on interfaces (#31348)
  21128e2 QA: Fix resolution of default distribution (#31351)
  df17a83 Build: Fix the license in the pom zip and tar (#31336)
  3e76b15 Treat ack timeout more like a publish timeout (#31303)
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team v5.6.11 v6.3.1 v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants