Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.3-rc2] Unable to query influxdb #3607

Closed
shilpisharma opened this issue Aug 10, 2015 · 43 comments · Fixed by #3857
Closed

[0.9.3-rc2] Unable to query influxdb #3607

shilpisharma opened this issue Aug 10, 2015 · 43 comments · Fixed by #3857
Assignees

Comments

@shilpisharma
Copy link

Hi,

We are doing feasibility testing.

We inserted 35 million records in one of the measurements. But now influx is not responding when we query the table.

Can someone please suggest what can be done?

Shilpi

@desa
Copy link
Contributor

desa commented Aug 10, 2015

@shilpisharma what version of Influxdb are you running? And what does the query you tried to run look like?

@shilpisharma
Copy link
Author

InfluxDb version: 0.9.1

select count(value) from <measurement>
select * from <measurement>
select * from measurement where time > <starttime> and time < <endtime>

@otoolep
Copy link
Contributor

otoolep commented Aug 10, 2015

@shilpisharma -- what is coming back when you issue these queries?

@shilpisharma
Copy link
Author

No response. The DB just hangs.

@beckettsean
Copy link
Contributor

@shilpisharma can you provide log output and a sample write statement? What exactly do you mean by "the DB just hangs"? Is the process up and not responsive? Is it issuing 500s? Will the database process writes?

@shilpisharma
Copy link
Author

Process remains up, but no response. Where will I find the logs for this?

@beckettsean
Copy link
Contributor

Look in /var/log/influxdb/influxdb.log if you aren't explicitly redirecting your log output.

The process is running according to a process list, but submissions to the write and query endpoints return nothing? Can you paste the results of curl -Gv 'http://localhost:8086/query?db=<database_name>' --data-urlencode "q=SHOW MEASUREMENTS", replacing <database_name> with the appropriate value, and updating the URL as required if not running it local to the InfluxDB server.

@shilpisharma
Copy link
Author

I checked logs, they don't display the query at all.

[retention] 2015/08/10 13:47:48 retention policy shard deletion check commencing
[http] 2015/08/10 13:55:05 127.0.0.1 - - [10/Aug/2015:13:55:05 -0400] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.1 ee6475c0-3f88-11e5-8001-000000000000 326.54µs
[http] 2015/08/10 13:55:57 127.0.0.1 - - [10/Aug/2015:13:55:57 -0400] GET /query?db=&q=show+databases HTTP/1.1 200 105 - InfluxDBShell/0.9.1 0da3087d-3f89-11e5-8002-000000000000 63.174633ms
[http] 2015/08/10 13:56:12 127.0.0.1 - - [10/Aug/2015:13:56:12 -0400] GET /query?db=influxtest&q=show+measurements HTTP/1.1 200 40 - InfluxDBShell/0.9.1 1652e372-3f89-11e5-8003-000000000000 974.99µs
[http] 2015/08/10 13:56:27 127.0.0.1 - - [10/Aug/2015:13:56:27 -0400] GET /query?db=influx_1&q=show+measurements HTTP/1.1 200 105 - InfluxDBShell/0.9.1 1f88e203-3f89-11e5-8004-000000000000 30.490928ms
[retention] 2015/08/10 13:57:48 retention policy enforcement check commencing
[retention] 2015/08/10 13:57:48 retention policy shard deletion check commencing
[retention] 2015/08/10 14:07:49 retention policy enforcement check commencing
[retention] 2015/08/10 14:07:49 retention policy shard deletion check commencing

RAM usage by influx was pretty high due to which server itself became slow. I executed: following command after show measurements: select * from energyconsumption

administrator@administrator-IdeaCentre-Q190:/var/log/influxdb$ curl -Gv 'http://localhost:8086/query?db=influx_1' --data-urlencode "q=SHOW MEASUREMENTS"
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8086 (#0)
> GET /query?db=influx_1&q=SHOW%20MEASUREMENTS HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8086
> Accept: */*
>

@shilpisharma
Copy link
Author

Any suggestions?

@beckettsean
Copy link
Contributor

I would suggest waiting for the 0.9.3 release and upgrading. Nothing in your logs is indicative of a problem, and I can't understand why the query endpoint stops responding.

Can you provide a sample of your write statements?

@shilpisharma
Copy link
Author

Data was written using rest api through python:

for ts in ts_data:
insert_data += table+flatten_tag+' value='+str(float(ts[2]))+' '+ts[0]+'\n'
r = requests.post(url, data=insert_data)

@beckettsean
Copy link
Contributor

Thanks, @shilpisharma but that doesn't give me the exact syntax of the writes. Is there a way you can paste in the result of that code running, and give an example of the actual write made?

@shilpisharma
Copy link
Author

Logs during the write:

[http] 2015/08/06 14:37:47 128.2.109.83 - - [06/Aug/2015:14:37:46 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3b9be4f4-3c6a-11e5-8914-000000000000 173.431512ms
[http] 2015/08/06 14:37:49 128.2.109.83 - - [06/Aug/2015:14:37:47 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3bd0c3f5-3c6a-11e5-8915-000000000000 2.372260081s
[http] 2015/08/06 14:37:50 128.2.109.83 - - [06/Aug/2015:14:37:49 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3d53d0cf-3c6a-11e5-8916-000000000000 691.012019ms
[http] 2015/08/06 14:37:50 128.2.109.83 - - [06/Aug/2015:14:37:50 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3dd8050d-3c6a-11e5-8917-000000000000 226.922529ms
[http] 2015/08/06 14:37:51 128.2.109.83 - - [06/Aug/2015:14:37:51 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3e101721-3c6a-11e5-8918-000000000000 322.323207ms
[http] 2015/08/06 14:37:51 128.2.109.83 - - [06/Aug/2015:14:37:51 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3e53d6ab-3c6a-11e5-8919-000000000000 164.651593ms
[http] 2015/08/06 14:37:51 128.2.109.83 - - [06/Aug/2015:14:37:51 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3e8169a5-3c6a-11e5-891a-000000000000 163.465269ms
[http] 2015/08/06 14:37:52 128.2.109.83 - - [06/Aug/2015:14:37:52 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3eada326-3c6a-11e5-891b-000000000000 366.620601ms
[http] 2015/08/06 14:37:52 128.2.109.83 - - [06/Aug/2015:14:37:52 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3f009c6a-3c6a-11e5-891c-000000000000 155.372336ms
[http] 2015/08/06 14:37:53 128.2.109.83 - - [06/Aug/2015:14:37:52 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3f319fb7-3c6a-11e5-891d-000000000000 170.720889ms
[http] 2015/08/06 14:37:53 128.2.109.83 - - [06/Aug/2015:14:37:53 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3f62dd53-3c6a-11e5-891e-000000000000 172.081764ms
[http] 2015/08/06 14:37:53 128.2.109.83 - - [06/Aug/2015:14:37:53 -0400] POST /write?db=influx_1&precision=s HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Windows/7 3f93c836-3c6a-11e5-891f-000000000000 189.379733ms
[shard] 2015/08/06 14:37:57 flush 41936 points in 12.362s
[shard] 2015/08/06 14:38:07 flush 42326 points in 8.239s
[shard] 2015/08/06 14:38:18 flush 42077 points in 9.311s

Records inserted are something like this: Example of one record.

[{"measurement": "energyconsumption_5","tags": {"accumulationBehaviour": "4","commodity": "1","dataQualifier": "12","defaultQuality": "0","flowDirection": "1","intervalLength": "900","kind": "12","phase": "769","powerOfTenMultiplier": "0","timeAttribute": "0","uom": "72"},"time": "2014-11-19T17:30:00Z","fields": {"value": 67256.0}}]

@beckettsean
Copy link
Contributor

@shilpisharma it looks form your write that you are using the deprecated JSON protocol for writes. I would strongly suggest updating your code to use the line protocol as the JSON protocol will be removed in an upcoming release. it is known to cause performance issues, especially if you aren't batching writes.

@beckettsean
Copy link
Contributor

Check out https://github.com/influxdb/influxdb-python for writes, I believe that is updated to the latest in 0.9

@ccutrer
Copy link

ccutrer commented Aug 25, 2015

I'm also seeing this problem. When on 0.9.3pre (current master), it hangs on the query endpoint (I do see a ping that gets a 204 when I launch the CLI) with no logging. But when I go back to 0.9.2pre I get symptoms similar to #3632. The workaround there is no help.

2015/08/24 20:53:46 InfluxDB starting, version 0.9.2pre, commit 7b815fe8aa4f10aca7f7269c6c925c84bb6586c3
2015/08/24 20:53:46 GOMAXPROCS set to 4
[metastore] 2015/08/24 20:53:47 [WARN] raft: Heartbeat timeout reached, starting election
[metastore] 2015/08/24 20:53:47 [INFO] raft: Node at 127.0.0.1:8088 [Candidate] entering Candidate state
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Votes needed: 1
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Vote granted. Tally: 1
[metastore] 2015/08/24 20:53:47 [INFO] raft: Election won. Tally: 1
[metastore] 2015/08/24 20:53:47 [INFO] raft: Node at 127.0.0.1:8088 [Leader] entering Leader state
[metastore] 2015/08/24 20:53:47 [INFO] raft: Disabling EnableSingleNode (bootstrap)
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
2015/08/24 20:53:47 Sending anonymous usage statistics to m.influxdb.com
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:47 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:53:48 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:00 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:00 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Added peer localhost:8088, starting replication
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Removed peer localhost:8088, stopping replication (Index: 2496)
2015/08/24 20:54:05 [DEBUG] raft-net: 127.0.0.1:8088 accepted connection from: 127.0.0.1:60560
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Added peer localhost:8088, starting replication
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [audiopi.cutrer.local:
2015/08/24 20:54:05 [DEBUG] raft-net: 127.0.0.1:8088 accepted connection from: 127.0.0.1:60561
[metastore] 2015/08/24 20:54:05 [INFO] raft: Added peer audiopi.cutrer.local:8088, starting replication
[metastore] 2015/08/24 20:54:05 [INFO] raft: Removed peer localhost:8088, stopping replication (Index: 2501)
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [audiopi.cutrer.local:
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
2015/08/24 20:54:05 [DEBUG] raft-net: 127.0.0.1:8088 accepted connection from: 127.0.0.1:60562
[metastore] 2015/08/24 20:54:05 [INFO] raft: Removed peer audiopi.cutrer.local:8088, stopping replication (Ind
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [localhost:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Added peer localhost:8088, starting replication
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Removed peer localhost:8088, stopping replication (Index: 2522)
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
2015/08/24 20:54:05 [DEBUG] raft-net: 127.0.0.1:8088 accepted connection from: 127.0.0.1:60563
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/24 20:54:05 [INFO] raft: Node at 127.0.0.1:8088 [Follower] entering Follower state
[metastore] 2015/08/24 20:54:05 [INFO] raft: pipelining replication to peer localhost:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: pipelining replication to peer localhost:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: aborting pipeline replication to peer localhost:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: pipelining replication to peer audiopi.cutrer.local:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: pipelining replication to peer localhost:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: aborting pipeline replication to peer localhost:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: aborting pipeline replication to peer audiopi.cutrer.local:8088
[metastore] 2015/08/24 20:54:05 [INFO] raft: aborting pipeline replication to peer localhost:8088
[metastore] 2015/08/24 20:54:07 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.

The workaround in the other issue (explicitly setting the hostname, to either 127.0.0.1, or localhost) is no help. After that any write returns a 500, but I can query data. Halp!

@beckettsean beckettsean changed the title Unable to query influxdb. [0.9.1] Unable to query influxdb Aug 25, 2015
@ccutrer
Copy link

ccutrer commented Aug 25, 2015

I just reproed on the 0.9.3-rc2 build. What can I provide to help debug this? My log looks like:

[metastore] 2015/08/25 09:14:23 Using data dir: /var/opt/influxdb/meta
[metastore] 2015/08/25 09:14:23 Node at localhost:8088 [Follower]
[metastore] 2015/08/25 09:14:23 Skipping cluster join: already member of cluster: nodeId=1 raftEnabled=false peers=[127.0.0.1:8088]
[store] 2015/08/25 09:14:23 Using data dir: /var/opt/influxdb/data
[handoff] 2015/08/25 09:14:24 Starting hinted handoff service
[handoff] 2015/08/25 09:14:24 Using data dir: /var/opt/influxdb/hh
[tcp] 2015/08/25 09:14:24 Starting cluster service
[shard-precreation] 2015/08/25 09:14:24 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2015/08/25 09:14:24 Starting snapshot service
[admin] 2015/08/25 09:14:24 Starting admin service
[admin] 2015/08/25 09:14:24 Listening on HTTP: [::]:8083
[continuous_querier] 2015/08/25 09:14:24 Starting continuous query service
[httpd] 2015/08/25 09:14:24 Starting HTTP service
[httpd] 2015/08/25 09:14:24 Authentication enabled: true
[httpd] 2015/08/25 09:14:24 Listening on HTTP: [::]:8086
[retention] 2015/08/25 09:14:24 Starting rentention policy enforcement service
2015/08/25 09:14:24 InfluxDB starting, version 0.9.3-rc2, branch HEAD, commit d0646a8871a583f8342afd0dc1a253946565d2cd
2015/08/25 09:14:24 GOMAXPROCS set to 4
[run] 2015/08/25 09:14:24 Listening for signals
[metastore] 2015/08/25 09:14:24 Node at localhost:8088 [Leader]. peers=[localhost:8088]
2015/08/25 09:14:24 Sending anonymous usage statistics to m.influxdb.com
[http] 2015/08/25 09:15:17 127.0.0.1 - - [25/Aug/2015:09:15:17 -0600] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.3-rc2 17bbc677-4b3c-11e5-8003-000000000000 597.031<C2><B5>s

After that ping in the log, I tried a show databases from the shell, and it just hangs.

@beckettsean
Copy link
Contributor

@ccutrer I think you are seeing a different problem than @shilpisharma. I would recommend deleting your /meta directory to reset the clustering, as that seems to be where the system is confused, based on your prior log.

@ccutrer
Copy link

ccutrer commented Aug 25, 2015

@beckettsean I stopped influx, deleted the meta dir (I didn't know that was safe!?), and started influx. Same symptoms:

2015/08/25 12:17:10 InfluxDB starting, version 0.9.3-rc2, branch HEAD, commit d0646a8871a583f8342afd0dc1a253946565d2cd
2015/08/25 12:17:10 GOMAXPROCS set to 4
[run] 2015/08/25 12:17:10 Listening for signals
2015/08/25 12:17:10 Sending anonymous usage statistics to m.influxdb.com
[metastore] 2015/08/25 12:17:10 Node at localhost:8088 [Leader]. peers=[localhost:8088]
[http] 2015/08/25 12:18:00 127.0.0.1 - - [25/Aug/2015:12:18:00 -0600] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.3-rc2 9e4d53c5-4b55-11e5-8005-000000000000 493.073<C2><B5>s

@beckettsean
Copy link
Contributor

@ccutrer in a single node cluster that fully settled before shutdown deleting /meta is partially safe. You will need to explicitly recreate the databases and continuous queries in order for them to be referenced.

I have no theories why the service isn't starting cleanly.

@shilpisharma any updates from you?

@beckettsean beckettsean changed the title [0.9.1] Unable to query influxdb [0.9.3-rc2] Unable to query influxdb Aug 25, 2015
@ccutrer
Copy link

ccutrer commented Aug 25, 2015

So um, yeah, deleting /meta seems to not be safe at all. It loses my users and my database - all my data. This is happening on 0.9.1, 0.9.2-pre, and 0.9.3-rc2. But it does boot correctly under all those versions when I do it. So obviously it is something with my meta directory. Any other tips to get influx running on any of these versions?

@beckettsean
Copy link
Contributor

@ccutrer My apologies for the /meta deletion. Totally my fault. I had something else in mind entirely, and when I tested on my end but didn't notice the metadata was gone until a restart. The raw data points are still present, if you run create database <previous_database_name> the data should be accessible again. Users are gone. Again, my apologies not only for the bad advice but for doubling-down on my own bad advice.

Given that the deletion fixed the issue, and that we cannot reproduce this here, any ideas how to reproduce the error you were seeing?

@jwilder
Copy link
Contributor

jwilder commented Aug 25, 2015

#3836 may help the slow query issue. You may have a lot of series based on the number of points you've written if each point also has 10 tags and a lot of different values. The slow startup is likely due to the shard index loading. You can monitor IO w/ iostat 1 while it's loading to see if your disks are very busy. Startup can be slow if there is a lot of data to index.

@ccutrer
Copy link

ccutrer commented Aug 25, 2015

A small tidbit - with my original meta directory, and 0.9.3-rc2, and enabling [meta].cluster-tracing=true, my log looks like

2015/08/25 16:14:55 GOMAXPROCS set to 4
[run] 2015/08/25 16:14:55 Listening for signals
[metastore] 2015/08/25 16:14:56 [WARN] raft: Heartbeat timeout reached, starting election
[metastore] 2015/08/25 16:14:56 [INFO] raft: Node at localhost:8088 [Candidate] entering Candidate state
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Votes needed: 1
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Vote granted. Tally: 1
[metastore] 2015/08/25 16:14:56 [INFO] raft: Election won. Tally: 1
[metastore] 2015/08/25 16:14:56 [INFO] raft: Node at localhost:8088 [Leader] entering Leader state
[metastore] 2015/08/25 16:14:56 Node at localhost:8088 [Leader]. peers=[localhost:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [INFO] raft: Added peer 127.0.0.1:8088, starting replication
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
2015/08/25 16:14:56 Sending anonymous usage statistics to m.influxdb.com
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:14:56 [DEBUG] raft: Node localhost:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/25 16:15:06 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:15:16 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:15:26 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:15:36 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[http] 2015/08/25 16:15:36 127.0.0.1 - - [25/Aug/2015:16:15:36 -0600] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.3-rc2 cf9293cf-4b76-11e5-8003-000000000000 484.115<C2><B5>s
[metastore] 2015/08/25 16:15:46 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:15:56 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:16:06 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:16:17 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:16:27 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:16:39 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout
[metastore] 2015/08/25 16:16:51 [ERR] raft: Failed to AppendEntries to 127.0.0.1:8088: read tcp 127.0.0.1:8088: i/o timeout

also, to @jwilder - my iostat output mostly looks like

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk0           0.00         0.00         0.00          0          0

@jwilder
Copy link
Contributor

jwilder commented Aug 25, 2015

Ok. So loading the index is not the issue.

What does your meta/peers.json show? You may need to manually update it to ["localhost:8088"] if it says ["127.0.0.1:8088"].

@ccutrer
Copy link

ccutrer commented Aug 25, 2015

@beckettsean it's okay about the meta. I've got a full back of /var/opt/influxdb from 0.9.2pre that I keep copying over. I was last able to write data to it about 48 hours ago (I've got my incoming data going to flat files of the line protocol data for now).

Anyhow, after I delete meta, and create database ccutrer_home, I can see all of my measurements/series, but none of them have any data.

For @jwilder, I have a total of four measurements and not too many series:

> show series
name: energy
------------
_key                circuit
energy,circuit=a_c      a_c
energy,circuit=computer     computer
energy,circuit=dryer        dryer
energy,circuit=freezer      freezer
energy,circuit=fridge       fridge
energy,circuit=furnace      furnace
energy,circuit=hot_tub      hot_tub
energy,circuit=outside      outside
energy,circuit=shed     shed
energy,circuit=solar        solar
energy,circuit=stove        stove
energy,circuit=solaredge    solaredge


name: power
-----------
_key            circuit
power,circuit=a_c   a_c
power,circuit=computer  computer
power,circuit=dryer dryer
power,circuit=freezer   freezer
power,circuit=fridge    fridge
power,circuit=furnace   furnace
power,circuit=hot_tub   hot_tub
power,circuit=outside   outside
power,circuit=shed  shed
power,circuit=solar solar
power,circuit=stove stove
power,circuit=solaredge solaredge


name: temperature
-----------------
_key                probe
temperature,probe=inverter  inverter


name: voltage
-------------
_key                    circuit     form
voltage,circuit=solaredge,form=dc   solaredge   dc

The power,circuit=outside series has one point per second. All other series are one point per minute. It's been going for a month or two. meta/peers.json is ["127.0.0.1:8088"] in my restored backup, and ["localhost:8088"] if I delete meta. If I change the peers.json to be ["127.0.0.1:8088"] in the restored backup, it still hangs on any query.

@jwilder
Copy link
Contributor

jwilder commented Aug 25, 2015

@ccutrer Don't delete meta dir. You will lose the ability to query it. I would suggest:

  1. Restore your backup
  2. Update meta/peers.json to be ["localhost:8088"]
  3. Start the the server using the latest 0.9.3 build.

For 0.9.2, the meta/peers.json always had numeric IP addresses. In 0.9.3, it defaults to hostnames and localhost.

In 0.9.3, you can also try starting the server with influxd -hostname 127.0.0.1to see if that allows you boot.

@ccutrer
Copy link

ccutrer commented Aug 25, 2015

@jwilder: nope still hangs with 0.9.3-rc2 after a fresh restore when I either change the meta/peers.json to be ["localhost:8088"], or if I use the hostname command line option.

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

One other idea to try:

  1. Restore backup
  2. Update meta/peers.json to be ["127.0.0.1:8088"]
  3. Starting latest 0.9.3-rc3 release with -hostname 127.0.0.1

That should start 0.9.3 the way 0.9.2 was before.

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

Nope, still hangs.

[metastore] 2015/08/26 10:06:13 Using data dir: /var/opt/influxdb/meta
[metastore] 2015/08/26 10:06:13 Skipping cluster join: already member of cluster: nodeId=1 raftEnabled=true peers=[127.0.0.1:8088]
[store] 2015/08/26 10:06:13 Using data dir: /var/opt/influxdb/data
[metastore] 2015/08/26 10:06:13 Node at 127.0.0.1:8088 [Follower]
[metastore] 2015/08/26 10:06:13 [INFO] raft: Node at 127.0.0.1:8088 [Follower] entering Follower state
[handoff] 2015/08/26 10:06:15 Starting hinted handoff service
[handoff] 2015/08/26 10:06:15 Using data dir: /var/opt/influxdb/hh
[tcp] 2015/08/26 10:06:15 Starting cluster service
[shard-precreation] 2015/08/26 10:06:15 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2015/08/26 10:06:15 Starting snapshot service
[admin] 2015/08/26 10:06:15 Starting admin service
[admin] 2015/08/26 10:06:15 Listening on HTTP: [::]:8083
[continuous_querier] 2015/08/26 10:06:15 Starting continuous query service
[httpd] 2015/08/26 10:06:15 Starting HTTP service
[httpd] 2015/08/26 10:06:15 Authentication enabled: false
[httpd] 2015/08/26 10:06:15 Listening on HTTP: [::]:8086
[retention] 2015/08/26 10:06:15 Starting rentention policy enforcement service
2015/08/26 10:06:15 InfluxDB starting, version 0.9.3-rc2, branch HEAD, commit d0646a8871a583f8342afd0dc1a253946565d2cd
2015/08/26 10:06:15 GOMAXPROCS set to 4
[run] 2015/08/26 10:06:15 Listening for signals
[metastore] 2015/08/26 10:06:15 [WARN] raft: Heartbeat timeout reached, starting election
[metastore] 2015/08/26 10:06:15 [INFO] raft: Node at 127.0.0.1:8088 [Candidate] entering Candidate state
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Votes needed: 1
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Vote granted. Tally: 1
[metastore] 2015/08/26 10:06:15 [INFO] raft: Election won. Tally: 1
[metastore] 2015/08/26 10:06:15 [INFO] raft: Node at 127.0.0.1:8088 [Leader] entering Leader state
[metastore] 2015/08/26 10:06:15 Node at 127.0.0.1:8088 [Leader]. peers=[127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
2015/08/26 10:06:15 Sending anonymous usage statistics to m.influxdb.com
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 Updated node id=1 hostname=127.0.0.1:8088
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:15 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[metastore] 2015/08/26 10:06:16 [DEBUG] raft: Node 127.0.0.1:8088 updated peer set (2): [127.0.0.1:8088]
[http] 2015/08/26 10:06:34 127.0.0.1 - - [26/Aug/2015:10:06:34 -0600] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.3-rc2 6c49d8f4-4c0c-11e5-8001-000000000000 6.000572ms

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

@ccutrer That log looks better. What is hanging? The /ping called returned successfully so it is responding to requests.

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

cody@audiopi:~$ /opt/influxdb/influx
Connected to http://localhost:8086 version 0.9.3-rc2
InfluxDB shell 0.9.3-rc2
> show databases

The show databases call (or any other query) hangs.

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

When the server is hung, can you send a SIGQUIT to it and post the stack dump? It looks like you might be hitting some kind of deadlock which is very strange.

On linux, kill -SIGQUIT $(pidof influxd), OSX kill -s QUIT $(pgrep influxd) should do it.

The full trace in a gist would be very helpful.

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

Definitely a deadlock in the metastore. Can you try disabling the Precreation service to see if that resolves it?

In your config, set

[shard-precreation]
  enabled = false

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

Nope, still hangs: https://gist.github.com/ccutrer/4c225d07535291ad41ce

@otoolep
Copy link
Contributor

otoolep commented Aug 26, 2015

@ccutrer - this is a single-node system, correct?

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

Correct.

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

@ccutrer Would you be able to send your /meta and /data dirs as well as your config to me? Email to jason@influxdb.com if that works.

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

@ccutrer Can you try disabling continuous queries?

[continuous_queries]
  enabled = true

@ccutrer
Copy link

ccutrer commented Aug 26, 2015

you mean enabled = false. and yes!! I can query!!! and I can write data! and it's not doubling things (a recent bug fix that I tried to update for and caused this whole issue)!

@jwilder
Copy link
Contributor

jwilder commented Aug 26, 2015

Yes... sorry. enable = false Great!

@jwilder jwilder self-assigned this Aug 26, 2015
jwilder added a commit that referenced this issue Aug 26, 2015
The interaction of continuous query service, the meta-store loading
and initializing raft state, and syncing node info could cause a
deadlock in some instances.  There was an extra read-lock taken by isLeader()
when it already had a read-lock.  Removing this extra lock fixes the startup
deadlock.

Fixes #3607
jwilder added a commit that referenced this issue Aug 26, 2015
The interaction of continuous query service, the meta-store loading
and initializing raft state, and syncing node info could cause a
deadlock in some instances.  There was an extra read-lock taken by isLeader()
when it already had a read-lock.  Removing this extra lock fixes the startup
deadlock.

Fixes #3607
@ccutrer
Copy link

ccutrer commented Aug 26, 2015

Confirmed that with 0.9.3 final I don't need to disable continuous queries to avoid deadlocks. Thanks!

jonseymour added a commit to ninjasphere/influxdb that referenced this issue Nov 19, 2015
The interaction of continuous query service, the meta-store loading
and initializing raft state, and syncing node info could cause a
deadlock in some instances.  There was an extra read-lock taken by isLeader()
when it already had a read-lock.  Removing this extra lock fixes the startup
deadlock.

Fixes influxdata#3607

(Cherry-picked from 0286a3e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants