-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(manager): introduce Manager installation test on different OS #7456
test(manager): introduce Manager installation test on different OS #7456
Conversation
|
||
# Check scylla-manager and scylla-manager-agent are up and running (/ping endpoint) | ||
manager_node.is_manager_server_up() | ||
scylla_node.is_manager_agent_up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the test also verify communication between agent and server?
Or at least cluster is registered in server?
I believe several short functional checks could find some regressions without sacrificing a lot of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense, added several checks on cluster addition and health check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please retry test builds if everything passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37e8d7e
to
e12f771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don’t understand why we need this PR. |
It does the same and even more but lasts longer. |
How much longer? if sanity became too long over the years, we can consider shortening it. |
It's ~2h for each sanity, and those are using spots From my POV, this is more then fine to minimize to multiple disto cases to mostly installation |
From my POV it’s an unneeded duplication. |
yet again, I'm all about let whom owns it to take the calls of how to structure it, |
Yes, it's not a duplication because the goal is to replace sanity jobs for debian10, debian11, ubuntu20 distributions with these installation jobs and use the sanity job only for ubuntu22. One more thing, these short installation jobs can be very good when we need to verify any changes in SCT setup steps or images, etc. We already had this case here (#7459) and instead of running ~1.5-2h jobs for every distro we've run these installation jobs that gave the result in 20 minutes. |
Some extra checks were added in test to address this comment. |
The test is intended for execution on non-main OS distributions (for example, debian10) where the main goal is to execute installation test and verify manager + agent are up and running.
e12f771
to
3077b47
Compare
In that case why do we need it also for ununtu22 in this PR? |
And now it looks like the Sanity when it was started. |
From my point of view, we can also benefit from having such a job if we need to quickly verify ubuntu22 installation process for some particular cases (changes in setup procedure).
It's not implemented yet, but we are planning to implement the triggers for them. |
But it takes 15 minutes instead of 1.5-2 hours.
I haven't known about such issue. In such case we may consider to leave, for example, one per release run of all sanity tests on all platforms (manager-3.2 release branch) and run installation tests only in master. When I joined to the project, all the sanity tests except Ubuntu22 and CentOS7 (which is deprecated already) have been disabled in Jenkins master branch for more than 1 year. So, nobody really cared much about these runs in master all that time. |
Because the developers probably never looked at master so it was worthless to run on master. IMO, if you want to have it in master, it should be tracked by developers, and if you want it to be tracked by developers, it should be part of their CI. |
From my perspective, I find it more efficient to not run 2 hours job if the same result can be got in 15 minutes job. It's not only the question of time but the cloud resources as well.
I don't really think they run something from SCT in Actions. |
@roydahan as we now own manager testing and these are manager specific changes: cc @mykaul |
Isn't this an argument why it's not very important?
Ouch! 20 minutes? What is taking so loooong? I'm very interested to know. Installation logs might help. |
It's not the test itself that takes 20 minutes but infra preparation steps. |
In current implementation I don't see really many intersections between these two tests: Sanity test:
Installation test:
So, of course, they can be joined but from my point of view it will make sanity test more complex. |
as to test speed - collect logs phase takes long because of getting monitoring screenshots mainly. |
Sure, once each will reach few hundreds significant commits we will consider adding new maintainers.
|
@mikliapko again, if it’s only installation of manager, have you considered using simple GH actions that will be part of CI? |
Hundreds?
|
I hope there's an issue on this - why would taking a screenshot take long? (I assume there's a reason, perhaps we can improve that). |
No, we haven't tried this option. But what for? |
Probably since the monitor is very very slow. |
That’s exactly my whole point. |
@roydahan we have a defined testing strategy, we have tests in GH actions maintained and watched by @karol-kokoszka and @Michal-Leszczynski . Having said that, as far as I know this is not a concern of a SCT maintainer. For the scope of this PR, Mikita already showed that the new test doesn't duplicate the old test, and Israel who is a maintainer didn't think so as well, as well as Lukasz, who initially approved the PR. We have approval from 2 maintainers and 2 members of the manager task force, which is more than we need. @mykaul, @gmizrahi - if I am wrong, and the SCT maintainers job is to define the test strategy (never witnessed this before), all the maintainers should attend all the manager meetings that we have (bi-weekly, weekly grooming etc) , be part of the manager channel and to keep up to date with all the changes that we make in real time. |
It does introduce duplication. |
It seems like we already discussed duplication question yesterday. Here you can find my vision and explanation
There is no need in sanities removing. This test should continue to be enabled for main distro (ubuntu22). In addition, we are planning to leave it for secondary distributions as well but to run only by request (manually) if we may suspect any possible issues related to OS.
There is a ongoing effort to rework test triggering approach a bit (scylladb/scylla-manager#3856). It will be addressed there. |
This PR is pending a few days now. |
Waiting to see the triggers PR which is anyway blocking these tests. |
Is it a must-have for any tests to be merged? We are planning to introduce the triggers for these installation tests as a part of this task (scylladb/scylla-manager#3856) where we plan to rework the triggering process itself and for manual builds allow to choose the group of tests to execute depending on their priority. I just don't wanna do the double work adding the triggers for installation tests only if in the near time this triggers would be reworked again. Moreover, not having such triggers yet doesn't prevent us from manually executing these tests for now. |
@roydahan please approve the PR or grant me permissions to approve it. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read the whole thread.
The reasoning for this PR makes sense to me.
The common SCT code is not affected, only manager-specific.
LGTM
While waiting for this PR to be merged, we've implemented the task (scylladb/scylla-manager#3856) where the tests triggering approach has been reworked a bit. Installation tests were also added. |
thank you @roydahan |
@mergify backport manager-3.2 |
✅ Backports have been created
|
Closes scylladb/scylla-manager#3852
The test is designed for execution on non-main OS distributions (for example, debian10) where the main goal is to execute installation test and verify manager + agent are up and running.
The motivation of having such a test:
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)