Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB Losing Database and Users After Service Restart #6

Closed
falnemer opened this issue Sep 20, 2018 · 38 comments
Closed

InfluxDB Losing Database and Users After Service Restart #6

falnemer opened this issue Sep 20, 2018 · 38 comments
Assignees

Comments

@falnemer
Copy link

Problem/Motivation

When I restart my InfluxDB addon service, it would remove the users and databases it created.

Expected behavior

I expect my database and users to remain.

Actual behavior

The databses and users are removed

Steps to reproduce

When I log into Hass.IO and go to the addons, I launch the InfluxDB addon and click the restart service.

I did hear back from a reddit thread that another user was having this issue. It was due to the config section having "auth": true

Here is the link to the reddit thread/comment

Here is my configuration and it now works by changing auth to false:

{
"log_level": "info",
"username": "##remark##",
"password": "##remark##",
"auth": false,
"ssl": false,
"certfile": "fullchain.pem",
"keyfile": "privkey.pem",
"ipv6": true
}

Proposed changes

Update InfluxDB to allow for authentication.

@addons-assistant
Copy link

👋 Thanks for opening your first issue here! If you're reporting a 🐛 bug, please make sure you include steps to reproduce it. Also, logs, error messages and information about your hardware might be usefull.

@tjorim
Copy link
Contributor

tjorim commented Sep 20, 2018

Hm, just noticed my influxdb is empty as well. Missing the database and all users gone, don't remember restarting it tho. Can confirm I do have auth enabled (default) as well.

@frenck
Copy link
Member

frenck commented Sep 23, 2018

I'm running the add-on with auth: true myself and cannot reproduce this error. I've tried on multiple test systems and still retains across restarts

@petrfaitl
Copy link

Same issue here. I've tried switching all the flags on/off (auth, ssl, ip6) to no avail.

System Log reports
18-09-24 04:03:07 INFO (MainThread) [hassio.api.security] /host/info access from a0d7b954_influxdb 18-09-24 04:03:12 WARNING (MainThread) [hassio.api.security] /addons/a0d7b954_influxdb/info no role for a0d7b954_influxdb

And log from InfluxDB add-on

2018/09/24 16:08:34 Using configuration at: /etc/kapacitor/kapacitor.conf ts=2018-09-24T16:08:35.509+12:00 lvl=error msg="encountered error" service=run err="open server: open service *influxdb.Service: failed to link subscription on startup: authorization failed" run: open server: open service *influxdb.Service: failed to link subscription on startup: authorization failed INFO: Starting the Kapacitor

@frenck
Copy link
Member

frenck commented Sep 24, 2018

@petrfaitl
The first error is expected since that is related to a new feature of Hassio. This is currently a warning and has no effect on the add-on. The implementation of this new feature will be done soon.

The second error is perfectly normal/fine. Kapacitor starts quicker then InfluxDB, so it fails to connect the first time. It will eventually pick up as soon as InfluxDB is started.

So both of those errors are not related, known and kinda expected.

@frenck
Copy link
Member

frenck commented Sep 24, 2018

OK, somebody actually handed over some big logs that showed the issue.

And the thing is: InfluxDB uses the Hassio Add-on API token as an internal password.
Recently Hassio implemented a new feature: rotation of API token on startup of the add-on (before it was fixed per add-on instance for the lifetime of the add-on).

This causes issues now.

Good news: Your data isn't gone. It is just Chronograf & Kapacitor who can't access the data.
If you have external users configured (Home Assistant or Grafana), they should be able to connect (based on first initial review of the situation).

I'm tagging this issue high priority and will provide updates ASAP.

@pyrosmiley
Copy link

Thanks so much for jumping on this @frenck -- really appreciate the work you put in for these add-ons.

frenck added a commit that referenced this issue Sep 25, 2018
@frenck
Copy link
Member

frenck commented Sep 25, 2018

I've added in a fix for this issue on the development branch, this fix needs testing.
But since I'm now able to reproduce this, that will be done ASAP.

I currently do not have access to a slow device (Raspberry Pi), which is a cause of this problem as well.
I'll report back in a couple of hours.

@frenck
Copy link
Member

frenck commented Sep 25, 2018

OK, so reproducing fails on my test setups (I guess the size of the database matters in this case...). I did, however, restart the add-on like 50 times or so.

This release improves on many levels, so I'll do some final testing on the new Hassio API security stuff and will release it tonight.

In this case, I'm going to assume this fixes it. (It sure won't break it...)

I'll leave this issue open and hope someone is willing to report back on it after upgrading.

@frenck
Copy link
Member

frenck commented Sep 25, 2018

🎉 Released v1.1.0

Please give me some feedback on this 🙏

@smbunn
Copy link

smbunn commented Sep 25, 2018

Hi Frenck, checked this morning (my time in New Zealand) and V1.1.0 is not coming up in Hass.io on my Raspberry PI. I still have 1.0.1 installed and no 'update' option. Can I force an update? Can I uninstall and reinstall without losing my data?

oh...saw it was updated only 11 minutes ago, I am probably the first user in the world awake right now to use it :-)

@matthew73210
Copy link

Hey,

Found a quick fix for my machine, remove this from configuration.yaml

http:
api_password: xxx

Workes now for me.

@matthew73210
Copy link

I've got the update too, but it still failed untill i removed http...

@frenck
Copy link
Member

frenck commented Sep 25, 2018

@matthew73210 That cannot possibly be related at all!
The add-on does not communicate with Home Assistant.... so disable or enable the password and/or http section would not do anything..

@frenck
Copy link
Member

frenck commented Sep 25, 2018

@smbunn In the Hass.io add-on store there is a reload button in the top right. Hit it!

@smbunn
Copy link

smbunn commented Sep 25, 2018

OK, that allowed me to see the update which I am installing now.

@matthew73210
Copy link

@frenck Okay maybe a glitch for 'my' system, because that's the only thing I changed and it started working. Thanks for the rapid update.

@smbunn
Copy link

smbunn commented Sep 25, 2018

I have completed the update and InfluxDB will not start:

[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-banner.sh: executing... 
-----------------------------------------------------------
 Hass.io Add-on: InfluxDB v1.1.0
 Scalable datastore for metrics, events, and real-time analytics
 From: Community Hass.io Add-ons
 By: Franck Nijhof <frenck@addons.community>
-----------------------------------------------------------
 armhf / HassOS 1.11 / HA 0.78.3 / SU 131 / stable
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-log-level.sh: executing... 
Log level is set to INFO
[cont-init.d] 01-log-level.sh: exited 0.
[cont-init.d] 02-updates.sh: executing... 
INFO: You are running the latest version of this add-on
[cont-init.d] 02-updates.sh: exited 0.
[cont-init.d] 10-requirements.sh: executing... 
INFO: Password is NOT in the Have I Been Pwned database! Nice!
[cont-init.d] 10-requirements.sh: exited 0.
[cont-init.d] 11-nginx.sh: executing... 
Adding password for user admin
[cont-init.d] 11-nginx.sh: exited 0.
[cont-init.d] 20-system-users.sh: executing... 
/var/run/s6/etc/cont-init.d/20-system-users.sh: line 17: hassio.log.info: command not found
[cont-init.d] 20-system-users.sh: exited 127.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] 99-message.sh: executing... 
-----------------------------------------------------------
                Oops! Something went wrong.
 
 We are so sorry, but something went terribly wrong when
 starting or running this add-on.
 
 Be sure to check the log above, line by line, for hints.
-----------------------------------------------------------
[cont-finish.d] 99-message.sh: exited 0.
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.```

@matthew73210
Copy link

Did you restart hassio?

@frenck
Copy link
Member

frenck commented Sep 25, 2018

@smbunn Thanks! That is my bad!
The good news: You are hitting the part I expected to be the issue.

Going to create a hotfix right now! 🚑

@smbunn
Copy link

smbunn commented Sep 25, 2018

About to do that now

@frenck
Copy link
Member

frenck commented Sep 25, 2018

🚑 Created patch, building an edge release right now. As soon as that one finishes, I'll tag a v1.1.1 release.

Update: Tagged release v1.1.1, awaiting release builds to finish...

@frenck
Copy link
Member

frenck commented Sep 25, 2018

🎉 Released v1.1.1

Please give me some feedback on this 🙏

@smbunn
Copy link

smbunn commented Sep 25, 2018

Installing now

@smbunn
Copy link

smbunn commented Sep 25, 2018

It seems to be ins some sort of loop.

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-banner.sh: executing... 
-----------------------------------------------------------
 Hass.io Add-on: InfluxDB v1.1.1
 Scalable datastore for metrics, events, and real-time analytics
 From: Community Hass.io Add-ons
 By: Franck Nijhof <frenck@addons.community>
-----------------------------------------------------------
 armhf / HassOS 1.11 / HA 0.78.3 / SU 131 / stable
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-log-level.sh: executing... 
Log level is set to INFO
[cont-init.d] 01-log-level.sh: exited 0.
[cont-init.d] 02-updates.sh: executing... 
INFO: You are running the latest version of this add-on
[cont-init.d] 02-updates.sh: exited 0.
[cont-init.d] 10-requirements.sh: executing... 
INFO: Password is NOT in the Have I Been Pwned database! Nice!
[cont-init.d] 10-requirements.sh: exited 0.
[cont-init.d] 11-nginx.sh: executing... 
Adding password for user admin
[cont-init.d] 11-nginx.sh: exited 0.
[cont-init.d] 20-system-users.sh: executing... 
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...```

@smbunn
Copy link

smbunn commented Sep 25, 2018

has moved on

[cont-init.d] 20-system-users.sh: exited 0.
[cont-init.d] 21-auth.sh: executing... 
[cont-init.d] 21-auth.sh: exited 0.
[cont-init.d] 30-kapacitor.sh: executing... 
[cont-init.d] 30-kapacitor.sh: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
INFO: Starting the InfluxDB
INFO: Starting Chronograf
INFO: Starting the Kapacitor
time="2018-09-26T09:23:59+12:00" level=info msg="Moving from version 1.6.1" 
time="2018-09-26T09:23:59+12:00" level=info msg="Moving to version 1.6.2" 
time="2018-09-26T09:23:59+12:00" level=info msg="Successfully created /data/backup/chronograf.db.1.6.1" 
time="2018-09-26T09:24:00+12:00" level=info msg="Serving chronograf at http://127.0.0.1:8889" component=server 
time="2018-09-26T09:24:00+12:00" level=info msg="Reporting usage stats" component=usage freq=24h reporting_addr="https://usage.influxdata.com" stats="os,arch,version,cluster_id,uptime" 
'##:::'##::::'###::::'########:::::'###:::::'######::'####:'########::'#######::'########::
 ##::'##::::'## ##::: ##.... ##:::'## ##:::'##... ##:. ##::... ##..::'##.... ##: ##.... ##:
 ##:'##::::'##:. ##:: ##:::: ##::'##:. ##:: ##:::..::: ##::::: ##:::: ##:::: ##: ##:::: ##:
 #####::::'##:::. ##: ########::'##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
 ##. ##::: #########: ##.....::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##.. ##:::
 ##:. ##:: ##.... ##: ##:::::::: ##.... ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
 ##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::'####:::: ##::::. #######:: ##:::. ##:
..::::..::..:::::..::..:::::::::..:::::..:::......:::....:::::..::::::.......:::..:::::..::
2018/09/26 09:24:01 Using configuration at: /etc/kapacitor/kapacitor.conf
ts=2018-09-26T09:24:01.869+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:02.577+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:03.564+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:04.936+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:06.215+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:09.909+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:12.830+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"
ts=2018-09-26T09:24:16.044+12:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get http://localhost:8086/ping: dial tcp 127.0.0.1:8086: connect: connection refused"```

@frenck
Copy link
Member

frenck commented Sep 25, 2018

@smbunn Not that is just reporting in... it will do that for a max of 60 seconds after it will stop doing that.

What has changed that it now tries to wait for InfluxDB to be started before actually trying to fix the users. It tried to do that every 2 seconds, so that is why you'll see it more often.

It will pass now.

@smbunn
Copy link

smbunn commented Sep 25, 2018

It is running! My databases are back. My users are back!

Thanks Frenck, you are a star!

@frenck
Copy link
Member

frenck commented Sep 25, 2018

Thank you so much for helping me out on this @smbunn! ❤️

I've created issue #7 to make some more improvements and tuning down the warnings in the logs.

@frenck
Copy link
Member

frenck commented Sep 25, 2018

I call this resolved for now. If someone still has issues with this, please re-open this issue or just add a comment to it. (This issue will remain unlocked for 30 days).

@frenck frenck closed this as completed Sep 25, 2018
@rpeders
Copy link

rpeders commented Sep 26, 2018

INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
FATAL: InfluxDB init process failed.
[cont-init.d] 20-system-users.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] 99-message.sh: executing... 
-----------------------------------------------------------
                Oops! Something went wrong.
 
 We are so sorry, but something went terribly wrong when
 starting or running this add-on.
 
 Be sure to check the log above, line by line, for hints.
-----------------------------------------------------------
[cont-finish.d] 99-message.sh: exited 0.
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.

I get this now and influx wont start... (i had it on autoupdate and it must have updated during the night to 1.1.1)

@frenck
Copy link
Member

frenck commented Sep 26, 2018

Hmmm... seems like 60 seconds is not enough wait time for your system @rpeders...
Reopening ticket to tackle this.

@Pteranodon
Copy link

Pteranodon commented Sep 29, 2018

My influxdb also won't start but I'm always only getting INFO: InfluxDB init process in progress... 5 times before [cont-init.d] 20-system-users.sh: exited 1.

I tried changing to debug and trace levels and this added a line above [cont-init.d] 20-system-users.sh: exited 1.:
/var/run/s6/etc/cont-init.d/20-system-users.sh: line 9: 979 Killed influxd
s6-nuke: fatal: unable to kill: No such process

Full log:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-banner.sh: executing... 
-----------------------------------------------------------
 Hass.io Add-on: InfluxDB v1.1.1
 Scalable datastore for metrics, events, and real-time analytics
 From: Community Hass.io Add-ons
 By: Franck Nijhof <frenck@addons.community>
-----------------------------------------------------------
 armhf / HassOS 1.11 / HA 0.79.0 / SU 131 / stable
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-log-level.sh: executing... 
Log level is set to INFO
[cont-init.d] 01-log-level.sh: exited 0.
[cont-init.d] 02-updates.sh: executing... 
INFO: You are running the latest version of this add-on
[cont-init.d] 02-updates.sh: exited 0.
[cont-init.d] 10-requirements.sh: executing... 
INFO: Password is NOT in the Have I Been Pwned database! Nice!
[cont-init.d] 10-requirements.sh: exited 0.
[cont-init.d] 11-nginx.sh: executing... 
Adding password for user ha_admin
[cont-init.d] 11-nginx.sh: exited 0.
[cont-init.d] 20-system-users.sh: executing... 
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
INFO: InfluxDB init process in progress...
[cont-init.d] 20-system-users.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] 99-message.sh: executing... 
-----------------------------------------------------------
                Oops! Something went wrong.
 
 We are so sorry, but something went terribly wrong when
 starting or running this add-on.
 
 Be sure to check the log above, line by line, for hints.
-----------------------------------------------------------
[cont-finish.d] 99-message.sh: exited 0.
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.
/var/run/s6/etc/cont-init.d/20-system-users.sh: line 9:   625 Killed                  influxd

@frenck
Copy link
Member

frenck commented Sep 29, 2018

@Pteranodon That is not related. This issue is about losing users & databases, you are now reporting in an issue about a fatal error of the add-on, causing it not to start.

Don't go off-topic on GitHub issues, that is really not appreciated. I've created issues #9 for you.

@frenck
Copy link
Member

frenck commented Nov 8, 2018

For the upcoming release, I've improved this waiting for InfluxDB logic overall, which takes care of the remaining issues listed here.

Closing this issue.

@frenck frenck closed this as completed Nov 8, 2018
@ghost ghost removed the Status: In progress label Nov 8, 2018
@FinMati
Copy link

FinMati commented Nov 8, 2018

I have also that "INFO: InfluxDB init process in progress..." until failure. When that new release should be out?

@frenck
Copy link
Member

frenck commented Nov 8, 2018

I've just finished the cross-platform tests, so I'm currently writing the release notes.

@addons-assistant
Copy link

This thread has been automatically locked because it has not had recent activity. Please open a new issue for related bugs and link to relevant comments in this thread.

@addons-assistant addons-assistant bot locked as resolved and limited conversation to collaborators Dec 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants