Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocf:mssql:fci doesn't restart after change of server name #6

Open
simonf-dev opened this issue Jan 4, 2022 · 2 comments
Open

ocf:mssql:fci doesn't restart after change of server name #6

simonf-dev opened this issue Jan 4, 2022 · 2 comments

Comments

@simonf-dev
Copy link

Hello, I tried to test ocf:mssql:fci agent on our pacemaker cluster. I followed the official tutorial. When I tried to create ocf:mssql:fci instance, it failed, and when I started with the debug-start command, it threw this error:

[root@virt-537 ~]# pcs resource debug-start mssql-server
crm_resource: Error performing operation: OK
Operation start for mssql-server (ocf:mssql:fci) returned: 'invalid parameter' (2)
58932 58924
 
Jan 04 11:17:32 INFO: mssql_validate
Jan 04 11:17:32 INFO: Resource agent invoked with: start
Jan 04 11:17:32 INFO: mssql_start
Jan 04 11:17:32 INFO: SQL Server started. PID: 58924; user: mssql; command: /opt/mssql/bin/sqlservr
Jan 04 11:17:33 INFO: start: 2022/01/04 11:17:33 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-mssql-server-start]; connection-timeout [20]; health-threshold [3]; action [start]
Jan 04 11:17:33 INFO: start: 2022/01/04 11:17:33 fci-helper invoked with virtual-server-name [mssql-server]
Jan 04 11:17:33 INFO: start: 2022/01/04 11:17:33 From RetryExecute - Attempt 1 to connect to the instance at localhost:1433
Jan 04 11:17:33 INFO: start: 2022/01/04 11:17:33 Attempt 1 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
Jan 04 11:17:34 INFO: start: 2022/01/04 11:17:34 From RetryExecute - Attempt 2 to connect to the instance at localhost:1433
Jan 04 11:17:34 INFO: start: 2022/01/04 11:17:34 Attempt 2 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
Jan 04 11:17:35 INFO: start: 2022/01/04 11:17:35 From RetryExecute - Attempt 3 to connect to the instance at localhost:1433
Jan 04 11:17:35 INFO: start: 2022/01/04 11:17:35 Attempt 3 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
Jan 04 11:17:36 INFO: start: 2022/01/04 11:17:36 From RetryExecute - Attempt 4 to connect to the instance at localhost:1433
Jan 04 11:17:36 INFO: start: 2022/01/04 11:17:36 Connected to the instance at localhost:1433
Jan 04 11:17:41 INFO: start: 2022/01/04 11:17:41 Setting local server name to mssql-server...
Jan 04 11:17:41 INFO: start: 2022/01/04 11:17:41 Querying local server name...
Jan 04 11:17:41 INFO: start: 2022/01/04 11:17:41 Local server name is virt-537
Jan 04 11:17:41 INFO: start: ERROR: 2022/01/04 11:17:41 Expected local server name to be mssql-server but it was virt-537
ocf-exit-reason:2022/01/04 11:17:41 Expected local server name to be mssql-server but it was virt-537
Jan 04 11:17:41 INFO: mssql-server start : 2

Log from the 'pcs cluster status' command.

Full List of Resources:
  * fence-virt-535      (stonith:fence_xvm):     Started virt-535
  * fence-virt-537      (stonith:fence_xvm):     Started virt-537
  * mssql-server        (ocf::mssql:fci):        Stopped
 
Failed Resource Actions:
  * mssql-server_monitor_0 on virt-535 'invalid parameter' (2): call=17, status='complete', exitreason='2022/01/04 11:15:39 Expected local server name to be mssql-server but it was virt-535', last-rc-change='2022-01-04 11:15:33 +01:00', queued=0ms, exec=5169ms
  * mssql-server_monitor_0 on virt-537 'invalid parameter' (2): call=17, status='complete', exitreason='2022/01/04 11:15:39 Expected local server name to be mssql-server but it was virt-537', last-rc-change='2022-01-04 11:15:33 +01:00', queued=0ms, exec=5175ms

I think that problem is missing restart after a set of the local server name. It works when I remove a resource and create a new one with the same name. Also, if I restart the SQL server manually, it works.

@simonf-dev
Copy link
Author

Update:
If I want to setup resource successfully here are needed steps right now:

pcs resource create mssql-server ocf:mssql:fci op monitor timeout=60s
# wait until cluster tries to start resource on all nodes and fail
pcs resource remove mssql-server
pcs resource create mssql-server ocf:mssql:fci op monitor timeout=60s
# wait until cluster tries to start resource on all nodes and fail
pcs resource remove mssql-server
pcs resource create mssql-server ocf:mssql:fci op monitor timeout=60s

@simonf-dev
Copy link
Author

The workaround seems not to work on RHEL8.7+ versions anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant