Skip to content

Latest commit

 

History

History
78 lines (47 loc) · 2.75 KB

upgrade_dcos.md

File metadata and controls

78 lines (47 loc) · 2.75 KB

Upgrade DCOS

Example, v1.10.7 >>> v1.11.1

For upgrading the steps are:

  1. SSH to bootstrap node and cleanup space if required by removing older installer and tar:

cd ~ ; rm -rf *.tar *.sh ~/genconf/serve/ ~/genconf/cluster* ~/genconf/state/*

  1. Clean up docker volumes if required (optional):

systemctl stop docker rm -rf /var/lib/docker/* systemctl start docker

  1. Run the Nginx server

sudo docker run -d -p <your-port>:80 -v $PWD/genconf/serve:/usr/share/nginx/html:ro nginx

  1. Match the config.yaml to the current version (sample attached in repo)

  2. Download the current installer dcos_generate_config.ee.sh Generate the config with below command where the version is the present (older) version not the new where we want to upgrade to!

dcos_generate_config.ee.sh --generate-node-upgrade-script 1.10.7

  1. Use the url for downloading the upgrade script, it will be something like this one- curl -O http://<bootstrap-ip>/upgrade/b1989797fc91461ab4c9f4ffd64aa4bc/dcos_node_upgrade.sh

  2. Download in all the nodes and Run the upgrade with below command, first in the master server and then after a while in the Agent nodes after the Master nodes are up. sudo bash ./dcos_node_upgrade.sh


  1. Validate the upgrade:

    • Verify that curl http://<dcos_agent_private_ip>:5051/metrics/snapshot has the metric slave/registered with a value of 1.
    • Monitor the Mesos UI to verify that the upgraded node rejoins the DC/OS cluster and that tasks are reconciled (http://<master-ip>/mesos). If you are upgrading from permissive to strict mode, this URL will be https://<master-ip>/mesos.

Troubleshooting Recommendations

The following commands should provide insight into upgrade issues:

On All Cluster Nodes

sudo journalctl -u dcos-download
sudo journalctl -u dcos-spartan
sudo systemctl | grep dcos

If your upgrade fails because of a custom node or cluster check, run these commands for more details:

dcos-diagnostics check node-poststart
dcos-diagnostics check cluster

On DC/OS Masters

sudo journalctl -u dcos-exhibitor
less /opt/mesosphere/active/exhibitor/usr/zookeeper/zookeeper.out
sudo journalctl -u dcos-mesos-dns
sudo journalctl -u dcos-mesos-master

On DC/OS Agents

sudo journalctl -u dcos-mesos-slave