Skip to content
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.

Settings option is not applied #69

Open
lukas-vlcek opened this issue Jan 20, 2017 · 1 comment
Open

Settings option is not applied #69

lukas-vlcek opened this issue Jan 20, 2017 · 1 comment

Comments

@lukas-vlcek
Copy link

It seems that the --settings option is not applied. The following is repro script for the wiki use case.

$ ./stream2es --version
2017-01-20T13:03:47.629+0000 INFO  stream2es 20161020121123fe262bd

$ export ESURL=http://10.40.2.198:9200
$ curl ${ESURL}
{
  "name" : "Mikey",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "5vscFAPyRAqiX75Gwz5n9Q",
  "version" : {
    "number" : "2.4.1",
    "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
    "build_timestamp" : "2016-09-27T18:57:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}

# Starting with empty cluster
$ curl ${ESURL}/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size

# Let's start indexing wiki. Stop the task after 10 seconds.
nohup \
./stream2es wiki \
   --target ${ESURL}/wiki \
   --clobber true \
   --settings '{ "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 1 }}}' \
>/dev/null 2>&1 &
sleep 10
kill $!

# Index "wiki" has the default number of shards and no replicas. Why?
$ curl ${ESURL}/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 
green  open   wiki    2   0        193            0      3.5mb          3.5mb

# Create "test" index manually using the same settings
$ curl -X PUT ${ESURL}/test/ -d '{ "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 1 }}}'
{"acknowledged":true}

# Compare "wiki" vs "test"
$ curl ${ESURL}/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   test    5   1          0            0       260b           260b 
green  open   wiki    2   0        193            0      3.5mb          3.5mb

Relevant server log:

[2017-01-20 14:17:59,572][INFO ][cluster.metadata         ] [Mikey] [wiki] creating index, cause [api], templates [], shards [2]/[0], mappings [_default_]
[2017-01-20 14:17:59,860][INFO ][cluster.routing.allocation] [Mikey] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[wiki][0], [wiki][0]] ...]).
[2017-01-20 14:18:08,795][INFO ][cluster.metadata         ] [Mikey] [wiki] create_mapping [redirect]
[2017-01-20 14:18:08,922][INFO ][cluster.metadata         ] [Mikey] [wiki] create_mapping [page]
[2017-01-20 14:18:09,581][INFO ][cluster.metadata         ] [Mikey] [wiki] create_mapping [disambiguation]
[2017-01-20 14:18:16,685][INFO ][cluster.metadata         ] [Mikey] [wiki] update_mapping [disambiguation]
[2017-01-20 14:18:23,178][INFO ][cluster.metadata         ] [Mikey] [wiki] update_mapping [redirect]
@fupolarbear
Copy link

fupolarbear commented Apr 12, 2017

The problem is that the author gives a wrong/(outdated?) settings example in readme, so the settings and mappings are parsed incorrectly and have no affect to ES.
If you follow the author's example, you will find your ES index is configured incorrectly like below:

curl -XGET 'http://localhost:9200/_all/_settings?pretty'
...
  "wiki" : {
    "settings" : {
      "index" : {
        "settings" : {
          "index" : {
            "analysis" : {
              "analyzer" : {
...

it's absolutely wrong, so actually you should do like this:

java -DentityExpansionLimit=2147480000 -DtotalEntitySizeLimit=2147480000 -Djdk.xml.totalEntitySizeLimit=2147480000 -Xmx2g -jar stream2es wiki --log debug --source 'enwiki-20170401-pages-articles.xml.bz2' --settings '
{
	"number_of_shards" : 1,
	"analysis" : {
		"analyzer" : {
			"default":{
				"type" : "snowball",
				"language" : "English"
			}
		}
	}
}'

Those strange JVM opts is for another code issue issues 65.
Hope it can help you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants