Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch as master_job_cache throws critical #23125

Closed
bemeyert opened this issue Apr 28, 2015 · 32 comments
Closed

Elasticsearch as master_job_cache throws critical #23125

bemeyert opened this issue Apr 28, 2015 · 32 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P1 Priority 1 Returners severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@bemeyert
Copy link

Hi all,
Since the local job cache is just too slow(even for only 50 Minions) I wanted to use Elasticsearch as master_job_cache. But I can't get the thing working. Configuration and error message when running a salt\* test.ping can be found here. Salt version is 2014.7.5, OS is CentOS 6, Elasticsearch-py has version 1.4.
This might be a duplicate of #20826

Cheers

@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Returners P2 Priority 2 labels Apr 28, 2015
@jfindlay
Copy link
Contributor

@bemeyert, thanks for the report. This is not a duplicate of #20826. It seems that something wrong is happening with the loader. Would you mind copying your gist contents into a comment here?

@jfindlay jfindlay added this to the Approved milestone Apr 28, 2015
@bemeyert
Copy link
Author

@jfindlay Here we go

My master configuration:

master_job_cache: elasticsearch
elasticsearch:
  host: 'eshost:9200'
  index: 'salt_test'

The part from the logfile when I run a salt \* test.ping with DEBUG enabled:

2015-04-27 15:15:48,553 [salt.config ][DEBUG ] Reading configuration from /etc/salt/master
2015-04-27 15:15:48,559 [salt.config ][DEBUG ] Using cached minion ID from /etc/salt/minion_id: 10.0.0.1
2015-04-27 15:15:48,561 [salt.config ][DEBUG ] Missing configuration file: ~/.saltrc
2015-04-27 15:15:48,562 [salt.utils.event ][DEBUG ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
2015-04-27 15:15:48,562 [salt.utils.event ][DEBUG ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
2015-04-27 15:15:48,564 [salt.master ][INFO ] Clear payload received with command publish
2015-04-27 15:15:48,573 [salt.master ][CRITICAL] Unexpected Error in Mworker
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/salt/master.py", line 599, in __bind
ret = self.serial.dumps(self._handle_payload(payload))
File "/usr/lib/python2.6/site-packages/salt/master.py", line 651, in _handle_payload
'clear': self._handle_clear}[key](load)
File "/usr/lib/python2.6/site-packages/salt/master.py", line 664, in _handle_clear
return getattr(self.clear_funcs, load['cmd'])(load)
File "/usr/lib/python2.6/site-packages/salt/master.py", line 2275, in publish
clear_load['jid'] = self.mminion.returners[fstr](nocache=extra.get('nocache', False),
File "/usr/lib/python2.6/site-packages/salt/loader.py", line 1357, in __getitem__
self._load(key)
File "/usr/lib/python2.6/site-packages/salt/loader.py", line 1334, in _load
return self._dict[key]
KeyError: 'elasticsearch.prep_jid'```

@jfindlay jfindlay added Core relates to code central or existential to Salt and removed Core relates to code central or existential to Salt labels May 26, 2015
@msteed
Copy link
Contributor

msteed commented Aug 11, 2015

@bemeyert: could you please test whether #24422 fixes the problem for you?

@msteed msteed added the info-needed waiting for more info label Aug 11, 2015
@bemeyert
Copy link
Author

@msteed I will test this as soon as I can and update the issue accordingly.

@bemeyert
Copy link
Author

@msteed Well, at last. Sry for the delay.

I used the docs from http://docs.saltstack.com/en/develop/ref/modules/all/salt.modules.elasticsearch.html#module-salt.modules.elasticsearch
& http://docs.saltstack.com/en/develop/ref/returners/all/salt.returners.elasticsearch_return.html

Installed versions:
elasticsearch-1.6.0
salt 2015.8.0rc3 (Beryllium)

Master configuration:

master_job_cache: elasticsearch
elasticsearch:
  host: 'eshost:9200'

And a simple salt \* test.ping gave the following output in the master log:

2015-08-25 16:22:52,085 [salt.master      ][CRITICAL][31711] The specified returner used for the master job cache "elasticsearch" does not have a save_load function!
2015-08-25 16:22:52,101 [salt.client      ][WARNING ][32729] Returner unavailable: 'elasticsearch.get_load'

I hope I did something wrong. Otherwise it still doesn't work.

@msteed
Copy link
Contributor

msteed commented Aug 25, 2015

@bemeyert: Sorry, you are right that the elasticsearch returner is not currently set up for use as the master job cache. It is missing two functions, save_load() and get_load(); see
http://docs.saltstack.com/en/develop/ref/returners/index.html#master-job-cache-support.

I think this would not be too difficult to get working. I will try to get to this week, or if you are interested in taking it on we will gladly accept a pull request.

@msteed msteed removed the info-needed waiting for more info label Aug 25, 2015
@msteed msteed self-assigned this Aug 25, 2015
@bemeyert
Copy link
Author

@msteed: I really would like to help. But neither my time nor my (little to non-existent) Python allow me to fix this. I'm sorry. I wouldn't even now where to begin with the tests.

@msteed
Copy link
Contributor

msteed commented Aug 26, 2015

As I looked at this I quickly reached the limits of my elasticsearch knowledge, and I'm not sure I can do a decent job of it.

@bechtoldt: would you be willing to take a look at making the elasticsearch returner work as a master job cache? See my comment above.

@msteed
Copy link
Contributor

msteed commented Sep 7, 2015

@bemeyert: please give this pr a spin and let me know if it works for you.

@bemeyert
Copy link
Author

@msteed I'm sorry, bu my Git(Hub) foo isn't good enough. How can I test your changes?

@bemeyert
Copy link
Author

@msteed Gods dammit, I just realized that you want to merge this into 2015.8, wich is supposed too be released soon. And I'm sitting here twiddling my thumbs and having no clue how to test this. I'm really sorry about that. So any pointers on how I can test this would be much appreciated. As soon as I know how to proceed I'll get on it.

@bemeyert
Copy link
Author

@msteed : I just saw, that it was merged into saltstack:2015.8. Not my days....
I will try it today.

@bemeyert
Copy link
Author

@msteed Did it. Installed Salt 2015.8 from Git and elasticsearch 1.6 via "pip".

Master config:

master_job_cache: elasticsearch
elasticsearch:
  host: eshost:9200'

Some tests:

[root@localhost ~]# salt \* test.ping      
[WARNING ] Returner unavailable: expected string or buffer
10.0.2.15:
    True

[root@localhost ~]# salt-run jobs.list_jobs
Exception occurred in runner jobs.list_jobs: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 337, in low
    data['return'] = self.functions[fun](*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/salt/runners/jobs.py", line 266, in list_jobs
    ret = mminion.returners['{0}.get_jids'.format(returner)]()
  File "/usr/lib/python2.7/site-packages/salt/utils/lazy.py", line 93, in __getitem__
    raise KeyError(key)
KeyError: 'elasticsearch.get_jids'

So the job runner (at least "list_jobs") don't seem to work. But there are indices in our Elasticsearch. Found the following:

  • salt-master-job-cache-v1
  • salt-mine_update-v1
  • salt-runner_jobs_active-v1
  • salt-runner_jobs_list_jobs-v1
  • salt-test_ping-v1

Their content looks good to me

Do you need more information?

Cheers

@msteed
Copy link
Contributor

msteed commented Sep 15, 2015

@bemeyert: Thanks for the test results, and my apologies for not getting back to you earlier about testing.

I wonder if the message expected string or buffer is due to the missing opening quote on the host value?

The get_jids() routine is indeed not implemented in the elasticsearch returner. This is one of the routines required for external_job_cache support (but not for the master_job_cache). I'd like to finish making elasticsearch a first-class returner but it will probably be a few days before I can get back to it.

@bemeyert
Copy link
Author

@msteed No harm done ;) It was just my weekend panicking...

Yeah, about the quotes. That was my mistake while changing the hostname "manually". To be sure I ran the test again. First I installed elasticsearch via pip. Then:

[root@localhost ~]# bash install_salt.sh -M git 2015.8
[...]
[root@localhost ~]# salt --version
salt 2015.8.0-210-g65c59ec (Beryllium)
[root@localhost ~]# cd /etc/salt
[root@localhost salt]# cat minion
master: localhost
[root@localhost salt]# cat master
master_job_cache: elasticsearch
elasticsearch:
  host: 'eshost:9200'
# starting both and accepting key
root@localhost salt]# salt \* test.ping
[WARNING ] Returner unavailable: expected string or buffer
10.0.2.15:
    True
[root@localhost salt]# salt-run jobs.list_jobs
Exception occurred in runner jobs.list_jobs: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 337, in low
    data['return'] = self.functions[fun](*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/salt/runners/jobs.py", line 266, in list_jobs
    ret = mminion.returners['{0}.get_jids'.format(returner)]()
  File "/usr/lib/python2.7/site-packages/salt/utils/lazy.py", line 93, in __getitem__
    raise KeyError(key)
KeyError: 'elasticsearch.get_jids'

Here the test.ping debug output from the master's log:

2015-09-17 13:59:25,815 [salt.config      ][DEBUG   ][5822] Reading configuration from /etc/salt/master
2015-09-17 13:59:25,816 [salt.config      ][DEBUG   ][5822] Using cached minion ID from /etc/salt/minion_id: 10.0.2.15
2015-09-17 13:59:25,817 [salt.config      ][DEBUG   ][5822] Missing configuration file: /root/.saltrc
2015-09-17 13:59:25,818 [salt.utils.event ][DEBUG   ][5822] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
2015-09-17 13:59:25,818 [salt.utils.event ][DEBUG   ][5822] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
2015-09-17 13:59:25,836 [salt.transport.zeromq][DEBUG   ][5822] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', '10.0.2.15_master', 'tcp://127.0.0.1:4506', 'clear')
2015-09-17 13:59:25,891 [salt.utils.lazy  ][DEBUG   ][5822] LazyLoaded elasticsearch.get_load
2015-09-17 13:59:25,895 [salt.utils.lazy  ][DEBUG   ][5822] LazyLoaded config.option
2015-09-17 13:59:25,911 [salt.utils.lazy  ][DEBUG   ][5822] LazyLoaded elasticsearch.document_get
2015-09-17 13:59:25,912 [urllib3.util.retry][DEBUG   ][5822] Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2015-09-17 13:59:25,912 [urllib3.connectionpool][INFO    ][5822] Starting new HTTP connection (1): eshost
2015-09-17 13:59:25,918 [urllib3.connectionpool][DEBUG   ][5822] "HEAD / HTTP/1.1" 200 0
2015-09-17 13:59:25,919 [urllib3.util.retry][DEBUG   ][5822] Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2015-09-17 13:59:25,922 [urllib3.connectionpool][DEBUG   ][5822] "GET /salt-master-job-cache/default/20150917135925839607 HTTP/1.1" 200 383
2015-09-17 13:59:25,923 [salt.client      ][WARNING ][5822] Returner unavailable: expected string or buffer
2015-09-17 13:59:25,923 [salt.client      ][DEBUG   ][5822] get_iter_returns for jid 20150917135925839607 sent to set(['10.0.2.15']) will timeout at 13:59:30.923171
2015-09-17 13:59:25,950 [salt.client      ][DEBUG   ][5822] jid 20150917135925839607 return from 10.0.2.15
2015-09-17 13:59:25,951 [salt.utils.lazy  ][DEBUG   ][5822] LazyLoaded nested.output
2015-09-17 13:59:25,952 [salt.client      ][DEBUG   ][5822] jid 20150917135925839607 found all minions set(['10.0.2.15'])

@bemeyert
Copy link
Author

@msteed @bechtoldt Any news on this one?

@ZhangZhenhua
Copy link

Having exactly the same issue. Is there any plan fixing it? Maybe next release? Thanks very much.

root@base:~# salt 'base' test.ping
[WARNING ] Returner unavailable: expected string or buffer
base:
    True
root@base:~# salt-run jobs.list_jobs
Exception occurred in runner jobs.list_jobs: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/client/mixins.py", line 337, in low
    data['return'] = self.functions[fun](*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/runners/jobs.py", line 266, in list_jobs
    ret = mminion.returners['{0}.get_jids'.format(returner)]()
  File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 93, in __getitem__
    raise KeyError(key)
KeyError: 'elasticsearch.get_jids'
root@base:~# salt --version
salt 2015.8.1 (Beryllium)

@arnisoph
Copy link
Contributor

Sorry, I don't have free capacity to work on this at the moment.

@Tanoti
Copy link
Contributor

Tanoti commented Mar 4, 2016

+1 to this, I would love to use elasticsearch as a external master job cache but can't.

@jfindlay jfindlay added P1 Priority 1 and removed P2 Priority 2 labels Mar 4, 2016
@cmarzullo
Copy link

Anyone have any pointers where to start looking to fix this? It's killing salt for us that we can't visualize our salt jobs.

@jfindlay
Copy link
Contributor

Ping @msteed.

@msteed
Copy link
Contributor

msteed commented May 18, 2016

My understanding is that for complete external job cache support, the following routines need to be added to salt/returners/elasticsearch_return.py:

  • get_jid()
  • get_fun()
  • get_jids()
  • get_minions()

See https://docs.saltstack.com/en/develop/ref/returners/#external-job-cache-support for details.

@Lamabu
Copy link

Lamabu commented Aug 11, 2016

I still don't understand how exactly this works. So, configuring master_job_cache allows me to send data to ES (even if getting the warning Returner unavailable: expected string or buffer )
ext_job_cache though is not working at all. It is saying that elasticsearch.save_load is not callable. Can you please give me a clue? I'm really new in this (just an intern to a cloud company that has a lot to learn)

@clark42
Copy link

clark42 commented Apr 11, 2017

Hi here!
Is this is working ? I see additional documentation in develop branch but I don't know if I can use elasticsearch for external job cache.

@bemeyert
Copy link
Author

@msteed , @jfindlay : What is the status here? I see that in v2016.3.6 the functions get_load and save_load exists. But also that develop is quite different. What is the difference between the branches? I'm a bit puzzled. Can you please help me?

@Tanoti
Copy link
Contributor

Tanoti commented Jun 19, 2017

@bemeyert I raised a couple of PRs around November/January time to fix some config loading issues and add extra functionality around a unified index. These may be what you are referring to? These were all to do with posting job data to elsasticsearch but not to do with using it as an external job cache.

@bemeyert
Copy link
Author

@Tanoti I guess you are referring to the diff between develop and v2016.3.6?

@Tanoti
Copy link
Contributor

Tanoti commented Jun 19, 2017

@bemeyert Correct, it looks like my code was not pulled into a branch until the 2017.7 version, As it happens I am about to pick up this work again as we are looking once more at elasticsearch but it is not going to be the external job cache side as we've not decided how we will implement that yet.

@bemeyert
Copy link
Author

@Tanoti Thanks for letting me know.

@getSurreal
Copy link

@Tanoti You mentioned getting job data into elasticsearch but not using it as a job cache. Could you point me in the right direction to read more about implementing that? The title of this article says storing job data, but then only talks about the job cache option.

@Tanoti
Copy link
Contributor

Tanoti commented Sep 11, 2018

@getSurreal I'm sorry, I've not been following this. We dropped using elasticsearch as a job data store/cache a year ago and are implementing a DynamoDB system using an event bus engine instead.

@sagetherage
Copy link
Contributor

closing this in favor of newer issue mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P1 Priority 1 Returners severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests