Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Desktop Upgrade issue with Telegraf #368

Open
jaffro2k opened this issue Oct 11, 2023 · 15 comments
Open

Docker Desktop Upgrade issue with Telegraf #368

jaffro2k opened this issue Oct 11, 2023 · 15 comments

Comments

@jaffro2k
Copy link

Hi folks - I've had an issue where Dock Desktop upgraded to v4.24.1 (Windows 11) - and seems to have broken Telegraf.

The container is now stuck in a restart loop with the log showing this error:

Loading config file: /etc/telegraf/telegraf.conf
error loading config file /etc/telegraf/telegraf.conf: open /etc/telegraf/telegraf.conf: operation not permitted

I subsequently tried to run upgrade.sh to see if there was an update to Telegraf however same issue.

Any ideas? Would like to fix rather than re-building again from scratch (and ideally keep the history in the DB).

Cheers,

Jeff

@jasonacox
Copy link
Owner

Hi @jaffro2k !

First, before you do anything else, backup your data:

# stop the containers
cd Powerwall-Dashboard # in case you aren't there already
./compose-dash.sh stop

# create a backup
tar -zcvf influxdb-snapshot.tgz influxdb 

I recommend you copy that influxdb-snapshot.tgz file to a safe place.

Second, I'm not a Windows 11 user so I would welcome any help on the "cause". Having said that, if this is a known issue, telegraf may have fixed it in an update. The upgrade.sh won't fix it because we pin to a version. Here is a way to test:

# first back up the config
cp powerwall.yml powerwall.yml.bak

Edit the powerwall.yml file and update the version next to telegraf from 1.26.1 to 1.28.2:

    telegraf:
        image: telegraf:1.26.1
        container_name: telegraf

to

    telegraf:
        image: telegraf:1.28.2
        container_name: telegraf

Now try to start the stack:

./compose-dash.sh up -d

Check telegraf to see what happens: docker logs telegraf

@jaffro2k
Copy link
Author

jaffro2k commented Oct 12, 2023 via email

@jasonacox
Copy link
Owner

The error indicates that telegraph is unable to load the file. When docker compose runs the powerwall.yml file, it should have mapped the local file system into the container:

volumes:
- type: bind
source: ./telegraf.conf
target: /etc/telegraf/telegraf.conf
read_only: true
- type: bind
source: ./telegraf.local
target: /etc/telegraf/telegraf.d/local.conf
read_only: true

I suspect that something happened to the permissions on that local file ./telegraf.conf

Can you 'ls -l ./telegraf.conf to see if the permission still allow it to be read by anyone?

@jaffro2k
Copy link
Author

Permissions are as follows:

$ ls -l ./telegraf.conf
-rw-r--r-- 1 MINIBEAST+plex 197121 218620 Oct 11 20:23 ./telegraf.conf

Also when I check the file in the container via Docker Desktop it has the same permissions and has a Mount icon indicating a blind mount to that file.

@jasonacox
Copy link
Owner

jasonacox commented Oct 13, 2023

Try to log in to the container to see what happens:

# log in
docker exec -it telegraf sh

# check the file
ls -l /etc/telegraf/telegraf.conf
cat /etc/telegraf/telegraf.conf

@jaffro2k
Copy link
Author

Thanks @jasonacox - I'm unable to log into the container as it is stuck in restart - fails to start then tries again every few seconds.

@jasonacox
Copy link
Owner

Ok, thanks for trying @jaffro2k . I spent some time on a Windows 11 system to try to replicate your case. I couldn't replicate it exactly, but I confess, it was a painful experience. It seems that WSL is basically an "emulation" so there are some sharp edges (probably because I don't know Windows). But I did spot a possible issue that may be related to your case.

If you used a path like /mnt/c/... to install Powerwall-Dashboard you will have a terrible experience. Since Linux (WSL) is a bolt-on instead a native, it means that the /mnt/c file system is using permission and user ownership that is part of NTFS (or whatever other file system windows uses) that is not compatible with Linux. As a result, I noticed that chmod and chown commands would not work or only partially worked. When I upgraded the Docker Desktop and the docker containers tried to restart, InfluxDB and Grafana could not longer write to their directory. I can only speculate that you had a similar case where the file for telegraf (telegraf.conf) is no longer matching the NTFS to Linux translation that WSL is doing.

I could not figure out how to change the file ownership or permission using WSL to fix my use case. Perhaps a Windows guru here can help us figure out how to do that. For me, the only fix was to fully reinstall the stack. I created a new installation under /home/jason.

It seems you can copy the influxdb backup over and under /home/jason for me, I was able to change the ownership and permission so Grafana and Influxdb could start.

There has to be a better way for Windows system. I'm just not sure how to get there.

@jaffro2k
Copy link
Author

Thanks @jasonacox - very much appreciate you trying! I agree - something strange happened on upgrade where the linkage to the file has broken - I am unsure as to why this occurs (or in fact how Docker Desktop for Windows performs this).

I'm actually using Hyper-V virtualisation instead of WSL as I'm running other VM's - so potentially could just spin up a Linux VM to run docker as an alternate which may be a better approach. Do you have any preferred/suggested platform for running these containers?

I had also considered looking at cloud hosting (Azure or AWS) K8S - has anyone had a crack at that?

Meantime I'm going to start from scratch on a rebuild - I have the TGZ file from the backup I first made and the original folder structure too. Is the process just stop the containers and copy over the files?

Will let you know how I go.

@jaffro2k
Copy link
Author

@jasonacox - a quick update (and fix) for if anything like this happens (lets call it the sledgehammer approach :-)

  1. Stop all containers in Docker Desktop
  2. Copy entire Powerwall-Dashboard folder to a backup location (just in case)
  3. Delete containers in Docker Desktop
  4. Uninstall Docker Desktop
  5. Download and re-install fresh Docker Desktop
  6. Run .\setup.sh to re-create containers

Installation came back with all previous data intact!

@BJReplay
Copy link
Contributor

BJReplay commented Oct 14, 2023

I'm actually using Hyper-V virtualisation instead of WSL

WSL-2 (which you should be using - there is no reason to remain on WSL-1 if you are not running a legacy workflow that is broken on WSL-2) runs on Hyper-V

So you can have your cake and eat it.

You should, from an elevated command prompt, just be able to install WSL

wsl --install -d Ubuntu from an elevated command prompt should install Ubuntu under WSL-2 under Hyper-V.

A subsequent installation of Docker Desktop will detect your existing WSL insto, offer to use it, and you're set.

@jasonacox
Copy link
Owner

@jaffro2k That's great news! Congratulations! I'm glad the setup.sh got you running with your data still intact. You may want to explore using the Tesla History Import Tool to fill in any gaps when the system was down. It pulls the data from Tesla, which isn't as high-resolution as what we get from telegraf but at least it will fill in your data.

@BJReplay Thanks for the insights there. I have no idea how to tell if I'm on WSL-1 or WSL-2. I should figure it out. I generally run all new versions of the dashboard through a native Ubuntu Linux box and a Raspberry Pi. I would like to be able to do the same on Windows. The steps @jaffro2k outlines above seem close to what I was thinking, but with a complete nuke/pave (reinstall from git). Any suggestions?

Also, have to share something that you Windows gurus may find humours. I spent hours trying to figure out how to get to a bash shell. I hunted for the "icon" for ages. I installed some Cigwin, git-bash thing or anothers and it wasn't doing anything right. I finally found a video of someone who launched a thing called "Command Prompt" which looked like DOS, but then they typed "WSL"... and Voilà... I'm bourne again! 🤦 😂

@BJReplay
Copy link
Contributor

I have no idea how to tell if I'm on WSL-1 or WSL-2

wsl --status at the command prompt

Default Distribution: Ubuntu
Default Version: 2

image
If you install the terminal app from the windows store you can get this sort of right-click option from a pinned app to open any number of command environments

@jasonacox
Copy link
Owner

terminal app from the windows store

❤️ !!!

Should we add something about WSL-2 or otherwise update the instructions we provide for installing the Dashboard on Win11?

https://github.com/jasonacox/Powerwall-Dashboard#windows-11-instructions

@BJReplay
Copy link
Contributor

Should we add something about WSL-2 or otherwise update the instructions we provide for installing the Dashboard on Win11?

Good question. WSL-2 might be the default on Windows 11, but I'd have to check.

I'm post shoulder-surgery, so typing wrong-handed, and no mouse.

The other thing I'm looking into is system-d support that is now available on WSL-2 on Windows 11 that should allow for a better automatic startup on Windows 11 (and possibly) 10 and installing without docker desktop (i.e. just docker on WSL).

I've been doing some left-handed reading on my phone, but that's not going to answer the question.

When my arm is out of a sling, I'm thinking about seeing if I can get a docker desktop-less install going with system-d running, so that when windows boots, WSL starts, and docker starts and the dashboard starts. At the moment, it requires a user to log on.

@jasonacox
Copy link
Owner

I'm post shoulder-surgery

Yikes! Take care of yourself @BJReplay ! Get rest and get well. We can wait. 😉

Love the ideas... sounds promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants