Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong checking of #23981 bug #24283

Closed
antdavidl opened this issue Apr 24, 2017 · 17 comments
Closed

Wrong checking of #23981 bug #24283

antdavidl opened this issue Apr 24, 2017 · 17 comments
Assignees
Labels

Comments

@antdavidl
Copy link

Elasticsearch version: 5.3.1
JVM version: openjdk-1.8.0.131-2.b11.el7_3
OS version: CentOS 7.3.1611 (Core)

Description of the problem including expected versus actual behavior:
I had just updated from 5.3.0 to 5.3.1 and then the node didn't like to start up. The logs indicates that there were indexes in the default folder where they should not be. However, this is exactly the folder in which always I have had my indexes!

I think that the mechanism to control a potential misconfiguration after the fix of the bug #23981 in 5.3.1 is wrongly implemented when you have the default data folder explicitly defined in the configuration file.

I had the following lines in my configuration file:

    path:
        data: /var/lib/elasticsearch
        logs: /var/log/elasticsearch

I was using the default data folder but anyway configuring it explicitly (since it is configurable by other means; ansible-stuff ...).

In order to raise the Elasticsearch service I have had to comment the path.data line and then it works fine:

    path:
#        data: /var/lib/elasticsearch
         logs: /var/log/elasticsearch

I think the problem is that the new code was checking if the data folder was being settled up and then check if the default data folder contains indexes, but ... what about if the default data folder is also a valid data folder included in the configuration?

Thanks!

Steps to reproduce:

  1. Update to 5.3.1 from 5.3.0 having the default data folder configured explicitly
  2. Try to start the elastic search service

Provide logs (if relevant):
[2017-04-24T11:45:12,480][INFO ][o.e.n.Node ] [automation] initializing ...
[2017-04-24T11:45:12,557][INFO ][o.e.e.NodeEnvironment ] [automation] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/mapper/vg_data-lv_elastic)]], net usable_space [39gb], net total_space [58.9gb], spins? [possibly], types [ext4]
[2017-04-24T11:45:12,557][INFO ][o.e.e.NodeEnvironment ] [automation] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-04-24T11:45:13,691][ERROR][o.e.n.Node ] [automation] detected index data in default.path.data [/var/lib/elasticsearch/nodes/0/indices] where there should not be any

@elzozz
Copy link

elzozz commented Apr 24, 2017

Same issue here.
Upgraded from 5.3.0 to 5.3.1 and had explicitly configured data path.
Unfortunately I don't have the data on default path, so I'm one node down until a patch is out

@antdavidl
Copy link
Author

antdavidl commented Apr 24, 2017

Well @elzozz,
according to what you are commenting, it could be that the bug at 5.3.0 has really affected you, saving some indexes in the wrong (default) folder. In this case, it would be a different scenario to the one I was commenting (in my case, it was a clear false negative).

If this is your case, you have to manually fix this moving these indexes to the real data folder.

Check https://www.elastic.co/blog/multi-data-path-bug-in-elasticsearch-5-3-0 for further details (basically, you just need to move the folder content ...).

@jasontedor
Copy link
Member

@elzozz I think that @antdavidl is correct here.

@jasontedor
Copy link
Member

@antdavidl Thanks for reporting, we will get this fixed in the next patch release (5.3.2).

@elzozz
Copy link

elzozz commented Apr 24, 2017

@antdavidl
Well the paths are the same in the error logs, so I think it's the same issue ;)
[2017-04-24T15:12:01,365][ERROR][o.e.n.Node] [elastic-master-01] detected index data in default.path.data [/elastic-data/nodes/0/indices] where there should not be any [2017-04-24T15:12:01,366][INFO ][o.e.n.Node] [elastic-master-01] index folder [eSpOBkgsSB24W1M_z5gbCA] in default.path.data [/elastic-data/nodes/0/indices] must be moved to any of [/elastic-data/nodes/0/indices]

@elzozz
Copy link

elzozz commented Apr 24, 2017

As a workaround if you don't have the data at default path '/var/lib/elasticsearch'. just create a symlink to the default path and comment out the explicit path.data definition in the config file

@jasontedor
Copy link
Member

I think there's some confusion here. You say:

Unfortunately I don't have the data on default path, so I'm one node down until a patch is out

If you have default.path.data and path.data set to the same thing, as it appears that you do, then the above statement can not be true and you see the issue that you're seeing. If there's data in path.data, then since it appears you have default.path.data set to the same path, there must be data in default.path.data.

@jasontedor
Copy link
Member

jasontedor commented Apr 24, 2017

@elzozz Can you please describe your precise situation? It appears that you have path.data and default.path.data both explicitly configured to the same data path, and that you are not using the default default.path.data configuration of /var/lib/elasticsearch from the packaging?

@elzozz
Copy link

elzozz commented Apr 24, 2017

@jasontedor Sorry, It's Monday and I can't seem to compose a normal sentence ;)
In my configuration I only have defined the path.data: /elastic-data nothing else related to path
By default path, I meant /var/lib/elasticsearch, as configured by default by package manager

@jasontedor
Copy link
Member

Then I do not understand how you're running into this issue if default.path.data is the default /var/lib/elasticsearch from the packaging and path.data is explicitly configured elsewhere, and I don't understand the log message that you provided. If default.path.data indeed has the default value from the packaging, the log message that you shared above would not say that default.path.data has the value /elastic-data/nodes/0/indices.

@splitice
Copy link

I'm too seeing this with a fresh install of an els 5.3.1 cluster which was performed using the elastic ansible file.

in my elasticsearch config path.data is set to /var/lib/elasticsearch/els13-els13
systemd starts elasticsearch with /usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR}
The value of DATA_DIR as set in /etc/defaults is DATA_DIR=/var/lib/elasticsearch/els13-els13

Is the ansible script doing something wrong, or is this an ELS bug? It prevents the restart of nodes without deletion of that node's index data and is hence quite critical if multiple nodes were to fail at the same time.

@elzozz
Copy link

elzozz commented Apr 24, 2017

Here are the config files, maybe it provides with more info.

elk-master-01.zip

@jasontedor
Copy link
Member

Is the ansible script doing something wrong, or is this an ELS bug?

@splitice I peeked at the Ansible role; it appears they are setting the configured data paths in the service definition and in the configuration file. I wouldn't necessarily consider this a bug although I don't think they should be doing this, but it does mean that every user of the Ansible role is going to be snagged by this until 5.3.2 it out.

@jasontedor
Copy link
Member

@elzozz I don't see default.path.data configured in the configuration files that you shared. Can you also share /etc/default/elasticsearch or /etc/sysconfig/elasticsearch (the former for a Debian-based system, the latter for an RPM-based system) and maybe /usr/lib/systemd/system/elasticsearch.service if you're on systemd and otherwise /etc/init.d/elasticsearch)?

@camarigor
Copy link

camarigor commented Apr 24, 2017

i found this topic and this solve to me.
ELK running back again.

I removed the "-Edefault.path.data=${DATA_DIR} " parameter from the startup command and everything works fine again.

The default.data.path is defined in the elasticsearch.yml file

https://discuss.elastic.co/t/elasticsearch-restart-failed/83357/7?u=igormarqs

@elzozz
Copy link

elzozz commented Apr 25, 2017

@jasontedor Here are the config files and I think I found the issue in my case.
In /etc/sysconfig/elasticsearch/ DATA_DIR was set to /elastic-data, the same path as in elasticsearch.yml data.path parameter. As mentioned by @igormarqs the startup script then adds the -Edefault.path.data=${DATA_DIR} line to the command line. So the default.path.data is indeed defined, but not from the elasticsearch.yml configuration file.
Commenting out the DATA_DIR in /etc/sysconfig/elasticsearch solved the issue after upgrade.
elk-sysconfig.zip

@jasontedor
Copy link
Member

@elzozz Indeed, you have both path.data configured and default.path.data configured the same path (which differs from the default provided by the packaging). That explains why are you also impacted by this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants