Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Fixed #703: restart using rest api, iterate hosts to verify #770

Merged
merged 5 commits into from
Jun 7, 2017

Conversation

grtjn
Copy link
Contributor

@grtjn grtjn commented May 29, 2017

Fixes #703

@grtjn grtjn added this to the May 2017 milestone May 29, 2017
@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 1, 2017

Since the REST API restart is not available on ML7 should we keep the old restart method in there?

Additionally, this fails on ML 8.0-5.1 on Windows for me so I am going to attempt to upgrade to test it. Something to keep in mind for users who may have an environment on an older version of ML8..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 1, 2017

Hmm, good point, forgot about that!

@grtjn
Copy link
Contributor Author

grtjn commented Jun 1, 2017

Maybe push away till next release, to give us more time to think about this?

@RobertSzkutak
Copy link
Contributor

Given the time crunch, that sounds good to me.

@grtjn
Copy link
Contributor Author

grtjn commented Jun 1, 2017

One sec, I might have an idea..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 1, 2017

OK, this works against an ML 7 cluster, but like before it only checks ml.server..

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

Upgraded to the latest ML8 on Windows and ran restart and got this output:

$ ./ml local restart -v
Restarting MarkLogic Server cluster of localhost
[POST] http://localhost:8002/manage/v2?format=json
Password for admin user:
[GET] http://localhost:8002/manage/v2/hosts?format=json
Verifying restart for robspc.attlocal.net: FAILED

When I forced the ML7 method of restart to run I got this output :

$ ./ml local restart -v
Restarting MarkLogic Server cluster of localhost
[GET] http://localhost:8001/admin/v1/timestamp
Password for admin user:
this: #ServerConfig:0x000006007ace10
[POST] http://localhost:8000/v1/eval
Closing HTTP connection to localhost:8001
code: 200
Invoked cluster restart
[GET] http://localhost:8002/manage/v2/hosts?format=json
Closing HTTP connection to localhost:8000
Verifying restart for localhost: FAILED

So this leads me to believe there's something not working correctly with the old and new timestamp check. However, both restarts appear to be successful. I am digging into it...

In order to run the ML7 restart I added a param called ml.force-old-restart and set that true. Simple change on 550 to make that an or to get it to work. Recommend implementing this for users in a situation where they are stuck on an older ML8. "Best" approach would be to automatically try the ML7 method when the ML8 one fails, but I think including this property would be the bare minimum for now.

@RobertSzkutak
Copy link
Contributor

I dumped the value of new_timestamp and old_timestamp and I got (in that order) :

robspc.attlocal.net2017-06-07T00:18:07.063157-05:00
2017-06-07T00:18:07.063157-05:00

Continuing to look into it..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Did you check if the hostname inside ml works as dns name outside ml?

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

Potentially found the issue stepping through the function. I did not have "ml.verify_retry_max" defined. Setting that to a value of 10 made the function work correctly. So just add that to default.properties .

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Regarding old restart, maybe some flags instead of properties? I could look into that..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Hmm, never tried what happens if you set retry to zero..

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

Flags could work too. Setting server_version=7 throws the CSRF error so we need a better way since this doesnt work on every version of 8 and could potentially affect folks in production or staging with older versions of 8.

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

I'd expect it to work just fine on older ml8, but would not hurt to check. I might have a vm running old ml8, or else i'll spin one up. Will have to wait though, afk now.. :)

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

On ML8.0-5.1 it failed for me with this error:

$ ./ml local restart -v
Restarting MarkLogic Server cluster of localhost
[POST] http://localhost:8002/manage/v2?format=json
Password for admin user:
ERROR: 415 "Unsupported Media Type"
ERROR: {"errorResponse":{"statusCode":"415", "status":"Unsupported Media Type", "messageCode":"REST-INVALIDMIMETYPE", "message":"REST-INVALIDMIMETYPE: (rest:INVALIDMIMETYPE) Content-Type must be one of: 'application/x-www-form-urlencoded', Received: application/json"}}

Somewhat unrelated : I highly recommend switching from VMs to containers. I just discovered them last week and it's been a lifesaver for letting me quickly spin up and down multiple versions of ML in parallel to each other.

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

I use (ml)vagrant with which I can spin up a complete cluster from scratch with any specific ML version for which I happen to have a installer for. I already have plenty around, and it takes just one command-line command to spin up or down.

I'll switch to docker as soon as I have worked out how to use the Vagrant docker provider.. ;-)

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Works just fine on 8.0-6.3 for me, but let me try some more versions..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

I checked:

  • 9.0-1.1: OK
  • 8.0-6.3: OK
  • 8.0-3.2: OK
  • 7.0-6.2: OK
  • 7.0-4.1: OK

The CSRF change was introduced in 8.0-4. I think you were set on the wrong foot initially because of the missing verify_retry_max property. I did add a check if max was > 0, and if not, print SKIPPED. Seems to work nice, and gives better feedback if restart is not verified.

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

@RobertSzkutak We already have the --verify=false flag. I see no particular need for not using management rest api for ML8+. WDYT?

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Rereading your 8.0-5.1 issue. That must have been a regression in that specific patch release, as 8.0-3 as well as 8.0-6 worked fine. For the sake of production, some flag to not use manage rest api does make sense after all, just for the actual restart (not the verification, since that uses the old admin ui timestamp endpoint, which probably goes back to ml5 or older)..

I'm thinking about --legacy..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

OK, added the legacy flag, WDYT?

@RobertSzkutak
Copy link
Contributor

Looks good to me! Doing some testing on it...

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

Could you add a definition for max in default.properties so its defined by default? That may seem more intuitive than "SKIPPED" for our customers.

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

I thought it already was defined! Let me check again..

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

Yes, max and sleep are defined here: https://github.com/marklogic/roxy/blob/dev/deploy/default.properties#L278 (part of latest dev)

@grtjn
Copy link
Contributor Author

grtjn commented Jun 7, 2017

I was piling on top of the existing single-server restart verification..

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Jun 7, 2017

Thanks! git merge fail.

@RobertSzkutak RobertSzkutak merged commit 67d2d0b into marklogic-community:dev Jun 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants