Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Fixed issues in reconnecting to device after firmware upgrade #235 #236

Merged
merged 5 commits into from
Jun 9, 2023

Conversation

nemesifier
Copy link
Member

@nemesifier nemesifier commented May 30, 2023

The test is misssing.

Fixes #235

This may fix also #233 (needs manual testing to confirm).

@nemesifier nemesifier added the bug Something isn't working label May 30, 2023
@nemesifier nemesifier added this to In progress in OpenWISP Priorities for next releases via automation May 30, 2023
@nemesifier nemesifier force-pushed the issues/235-uncaugh-value-error-stops-retries branch 3 times, most recently from 705895d to 0d1b05b Compare June 1, 2023 22:43
@nemesifier nemesifier changed the title [fix] Temptative fix of #235 [fix] Firmware upgrade fix for issues in reconnecting to device after upgrade #235 Jun 3, 2023
@nemesifier nemesifier changed the title [fix] Firmware upgrade fix for issues in reconnecting to device after upgrade #235 [fix] Fixed issues in reconnecting to device after firmware upgrade #235 Jun 3, 2023
@pandafy pandafy force-pushed the issues/235-uncaugh-value-error-stops-retries branch 4 times, most recently from 71458ed to 5844f3e Compare June 7, 2023 19:52
@pandafy pandafy force-pushed the issues/235-uncaugh-value-error-stops-retries branch from 5844f3e to 82ade68 Compare June 7, 2023 19:56
Comment on lines 426 to 436
self._refresh_addresses()
addresses = ', '.join(self.addresses)
self.log(
_(
'Trying to reconnect to device at {addresses} (attempt n.{attempt})...'.format(
addresses=addresses, attempt=attempt
)
),
save=False,
)
self.connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While performing manual testing locally, I got this log for the device

Connection successful, starting upgrade...
Image checksum file not found, proceeding with the upload of the new image...
Sysupgrade test passed successfully, proceeding with the upgrade operation...
Upgrade operation in progress...
Commencing upgrade. Closing all shell sessions.
Image metadata not found
Reading partition table from bootdisk...
Reading partition table from image...
Partition layout has changed. Full image will be written.

SSH connection closed, will wait 150 seconds before attempting to reconnect...
Device not reachable yet, (connection failed).
retrying in 30 seconds...
Device not reachable yet, (connection failed).
retrying in 30 seconds...
Trying to reconnect to device at 192.168.56.2 (attempt n.3)...
Connected! Writing checksum file to /etc/openwisp/firmware_checksum
Upgrade completed successfully.

See Trying to reconnect to device at 192.168.56.2 (attempt n.3)... shows third attempt. This is because in previous two attempts the device was unreachable and self._refresh_addresses() raised NoWorkingDeviceConnectionError which led to the logs getting skipped.

How can we get updated addresses without initiating a connection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

```python
def _log(addresses, attempt):
    self.log(
        _(
            f'Trying to reconnect to device at {addresses} '
            '(attempt n.{attempt})...'.format(
                addresses=addresses,
                attempt=attempt
            )
        ),
        save=False,
    )
try:
    self._refresh_addresses()
except NoWorkingDeviceConnectionError as error:
    _log(error.connection.addresses, attempt)
try:
    _log(', '.join(self.addresses), attempt)
    self.connect()
except (NoValidConnectionsError, socket.timeout, SSHException) as error:
    # etc...

@pandafy
Copy link
Member

pandafy commented Jun 7, 2023

I also tested #233 locally and this code mitigates that problem.

)
),
save=False,
)
self.connect()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we update this too?

Copy link
Member

@pandafy pandafy Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following wrapper methods so we don't need to reference the connection object again and again

def connect(self):
return self.connection.connect()
def disconnect(self):
return self.connection.disconnect()
def exec_command(self, *args, **kwargs):
return self.connection.connector_instance.exec_command(*args, **kwargs)

Comment on lines 426 to 436
self._refresh_addresses()
addresses = ', '.join(self.addresses)
self.log(
_(
'Trying to reconnect to device at {addresses} (attempt n.{attempt})...'.format(
addresses=addresses, attempt=attempt
)
),
save=False,
)
self.connect()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

```python
def _log(addresses, attempt):
    self.log(
        _(
            f'Trying to reconnect to device at {addresses} '
            '(attempt n.{attempt})...'.format(
                addresses=addresses,
                attempt=attempt
            )
        ),
        save=False,
    )
try:
    self._refresh_addresses()
except NoWorkingDeviceConnectionError as error:
    _log(error.connection.addresses, attempt)
try:
    _log(', '.join(self.addresses), attempt)
    self.connect()
except (NoValidConnectionsError, socket.timeout, SSHException) as error:
    # etc...

try:
self.connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to perform the connect operation here. self.connection.get_working_connection will open the SSH connection.

try:
self.connect()
except (NoValidConnectionsError, socket.timeout, SSHException) as error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -23 to +21
class OpenWrt(BaseOpenWrt):
class OpenWrt(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nemesisdesign since we are not directly backporting these changes to 1.0 branch, we can make this change of not making upgraders inherit SSH connectors.

@nemesifier nemesifier merged commit 75339ca into master Jun 9, 2023
8 checks passed
OpenWISP Priorities for next releases automation moved this from In progress to Done Jun 9, 2023
@nemesifier nemesifier deleted the issues/235-uncaugh-value-error-stops-retries branch June 9, 2023 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging this pull request may close these issues.

[bug] No valid IP addresses to initiate connections found
2 participants