Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display error message for total VCPU values which can't be factored #1489

Merged
merged 5 commits into from
Aug 23, 2021

Conversation

hstastna
Copy link
Member

@hstastna hstastna commented Jul 20, 2021

Fixes: #1431

This PR allows to dispaly error message

  • in the VM Details card and
  • in the Create VM Wizard's Basic settings in Advanced options

for:

  • total VCPU values which can't be factored,
  • for some other buggy scenarios regarding incorrectly selected VCPU topology values from the drop downs, as there can be many combinations for the number of sockets, cores, threads, and we need to respect the maximum allowed values together with the product of the number of sockets, cores, threads which should be equal to the number of the total VCPUs.

TODO: (DONE)

  • disable Save button for invalid value(s) - total VCPUs or regarding the VCPU topology, in the Create VM Wizard

Before:
VM Details card:
vmdetails_before1
The next picture shows an example of one of the buggy scenarios in the VM Details where after setting value 241 as the number of total VCPUs, there can be only two options in the drop down for the number of cores. If the user chooses number 1 in the drop down, the topology values become incorrect because of the limits of the sockets and cores don't allow to choose 241 in the appropriate drop downs (same scenario reproducible also in the Create VM Wizard):
vcpu_issue
Create VM Wizard:
createvm_before1
createvm_before2

After:
after_factor
select_topology
afterr
wizard_after2

@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from ca985d8 to 371a71c Compare July 20, 2021 16:58
@hstastna hstastna changed the title Add error message for total VCPU numbers which can't be factored [WIP] Add error message for total VCPU numbers which can't be factored Jul 20, 2021
@hstastna hstastna self-assigned this Jul 20, 2021
@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch 6 times, most recently from 1695c68 to 12815d1 Compare July 26, 2021 12:22
@hstastna hstastna changed the title [WIP] Add error message for total VCPU numbers which can't be factored [WIP] Display error message for total VCPU values which can't be factored Jul 26, 2021
@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch 10 times, most recently from 719270f to c2287e1 Compare July 27, 2021 00:13
@hstastna hstastna changed the title [WIP] Display error message for total VCPU values which can't be factored Display error message for total VCPU values which can't be factored Jul 27, 2021
@hstastna hstastna requested review from sjd78, sgratch and rszwajko and removed request for sjd78 July 27, 2021 00:14
sjd78
sjd78 previously requested changes Jul 29, 2021
Copy link
Member

@sjd78 sjd78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally looks good. A few code style and a few prop type issues that need to be resolved.

src/components/VmDetails/cards/DetailsCard/index.js Outdated Show resolved Hide resolved
src/intl/messages.js Outdated Show resolved Hide resolved
src/intl/messages.js Outdated Show resolved Hide resolved
src/components/CreateVmWizard/steps/BasicSettings.js Outdated Show resolved Hide resolved
src/components/CreateVmWizard/steps/BasicSettings.js Outdated Show resolved Hide resolved
src/components/CreateVmWizard/steps/BasicSettings.js Outdated Show resolved Hide resolved
src/components/CreateVmWizard/steps/BasicSettings.js Outdated Show resolved Hide resolved
@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from c2287e1 to 590e197 Compare July 30, 2021 10:59
@hstastna hstastna requested a review from sjd78 July 30, 2021 11:19
@hstastna hstastna dismissed sjd78’s stale review July 30, 2021 11:20

Browser warnings fixed, issues resolved

@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from df83c63 to 8941c98 Compare August 3, 2021 10:11
Copy link
Member

@sgratch sgratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the following 2 issues:
issue 1) Regarding the validation of cpus topology:

select_topology

Can you please explain how do we get to this scenario in 1st place?
On webadmin for example, once selecting total vcpus to be 241, the topology is auto selected to V, C, T of: 1, 241, 1 and can't be changed since there are no options for changing it. Why letting the user with options that we already know that are non valid?

Another example is that for setting total cpus to be 340 for example, the user can select the following:
image

while on webadmin it is blocked since the options are more limited since cores num started from 17:
image

In addition, this PR doesn't cover all cases anyways since few are still verified and fail on backend:
image

Bottom line is that I don't think we need to cover all cases and it's ok to fail on backend. The important thing is to avoid creating a VM with one vcpu due to invalid architecture. I think we need to prefer backend failures as much as possible and just filter invalid options in advance where possible instead of current solution of displaying errors on ui for part of bad topology selection. The only important thing is to avoid creating a VM in that case. We just need to filter those options out within the selection box values.
The current solution causes strange behaviors.

issue 2: I remember we raised this in the past and it's not solved by this PR and I'm still not sure if solved by pr 1491 as well (if it is solved then please ignore this):

when manually entering a total vcpu value (not necessarily the upper limit value), the cpu arch is sometimes auto set to invalid values.
For example, set the "total vcpus" to 33 and then remove one digit to leave only 3 value and the following error appears:

image
or:
image

As discussed, this is reproduced only for create VM and not for edit VM and seems related to the UI fields input manipulation. <-- fixed by PR 1491 so not relevant anymore.

src/intl/messages.js Outdated Show resolved Hide resolved
@hstastna
Copy link
Member Author

hstastna commented Aug 3, 2021

Please check the following 2 issues:
issue 1) Regarding the validation of cpus topology:

select_topology

Can you please explain how do we get to this scenario in 1st place?
On webadmin for example, once selecting total vcpus to be 241, the topology is auto selected to V, C, T of: 1, 241, 1 and can't be changed since there are no options for changing it. Why letting the user with options that we already know that are non valid?

@sgratch, because this requires dealing with all the possible combinations of the numbers of sockets, cores, threads, together with dealing with their maximum values, in advance. This PR is not about implementing this functionality. Not sure how this is implemented in webadmin but I consider it as non trivial. In this PR, a simple check is added and error message for invalid values of total VCPUs or VCPU topology is displayed.

Another example is that for setting total cpus to be 340 for example, the user can select the following:
image

while on webadmin it is blocked since the options are more limited since cores num started from 17:
image

Yeah, that makes sense. It is because the user has chosen a value from some drop down (number of cores in this case) which is not valid according to other chosen values. The user should pick another value from some drop down or to choose a different value as the number of total VCPUs. This is why the drop downs and the Total VCPUs field have the red border.

Bottom line is that I don't think we need to cover all cases and it's ok to fail on backend. The important thing is to avoid creating a VM with one vcpu due to invalid architecture.

This is solved by disabling Next button in the wizard, for invalid values. Also this PR does some job.

I think we need to prefer backend failures as much as possible and just filter invalid options in advance where possible instead of current solution of displaying errors on ui for part of bad topology selection. The only important thing is to avoid creating a VM in that case. We just need to filter those options out within the selection box values.

I asked before starting working on this issue about possible solutions of some situations and more of us suggested to add some error message, to make clear to the user why they cannot choose some values. By simply filtering the values user will have no idea...

The current solution causes strange behaviors.

Hmm, I haven't found anything strange. Except that the error messages are not 100% clear, IMHO.

issue 2: I remember we raised this in the past and it's not solved by this PR and I'm still not sure if solved by pr 1491 as well (if it is solved then please ignore this):

when manually entering a total vcpu value (not necessarily the upper limit value), the cpu arch is sometimes auto set to invalid values.
For example, set the "total vcpus" to 33 and then remove one digit to leave only 3 value and the following error appears:

image
or:
image

As discussed, this is reproduced only for create VM and not for edit VM and seems related to the UI fields input manipulation.

Yep, I was able to reproduce, but only sometimes. It appears after I press Enter a couple of times :D

@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from 8941c98 to e9d6949 Compare August 3, 2021 12:54
@hstastna
Copy link
Member Author

hstastna commented Aug 3, 2021

In addition, this PR doesn't cover all cases anyways since few are still verified and fail on backend:
image

@sgratch this is weird because I am pretty sure those values are not above the maximum values, in this example. I was able to reproduce, but it does not make sense. I suggest to create a separate issue for that. Not sure if it is the UI issue. You can see the maximum possible values of the number of sockets, cores, threads in the browser console in the picture below:
wtf

@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from e9d6949 to 7ad7971 Compare August 3, 2021 14:19
@sgratch
Copy link
Member

sgratch commented Aug 3, 2021

Yeah, that makes sense. It is because the user has chosen a value from some drop down (number of cores in this case) which is not valid according to other chosen values. The user should pick another value from some drop down or to choose a different value as the number of total VCPUs. This is why the drop downs and the Total VCPUs field have the red border.
Originally, the error message said that the user should pick another value, as there are all the possible values present in the drop downs, but I was suggested to change the error message to how it is now.

The error for invalid total cpus is ok.
The issue is regarding the error handling for the cpu architecture values. We needed to assume the user chooses sockets, cores and threads on that order up to down and filter out the invalid values in advance or just make the VM creation to fail on backend with the correct error for all cases.

The current scenario is that the user can choose all kinds of combinations while the values are dynamically changed when selecting different values in different order without any logic. If values are invalid then it failed on backend or it is turned to 1, 1, 1 or other invalid non chosen values with the error displayed but without letting the user remember what he tried to set.
For example- the user set values to 1, 1, 384 but error appears for 16,1,8
Peek 2021-08-03 16-43

in addition, current web-ui blocks options like in case of total vcpus set to 9 and topology is set to S=1, C=3, T=3.
This is possible on webadmin:
image

But is not possible on web-ui:
Peek 2021-08-03 17-17

So there are 2 issues here:

  1. The current cpu topology options selection is not fully supported and working a bit strange. This is not part of this fix so we can open a separate issue for this.

  2. Error handling for detected invalid CPU architecture:
    If we decide to leave the current solution of displaying an error then please rephrase the error to:
    "Selected CPU topology is invalid" (instead of vCPU)
    And check if possible to display a fixed (empty) or the original invalid value in case of this error occurrence instead of switching to a different invalid values that the user didn't select.

Yep, I was able to reproduce, but only sometimes. It appears after I press Enter a couple of times :D

This is not reproduced to me anymore so seems fixed by PR 1491

this is weird because I am pretty sure those values are not above the maximum values, in this example. I was able to reproduce, but it does not make sense. I suggest to create a separate issue for that. Not sure if it is the UI issue. You can see the maximum possible values of the number of sockets, cores, threads in the browser console in the picture below:

The combination of 340 (1, 170, 2) is invalid on webadmin as well so it's totally ok that it failed on backend with our current solution.

@hstastna
Copy link
Member Author

hstastna commented Aug 3, 2021

The error for invalid total cpus is ok.
The issue is regarding the error handling for the cpu architecture values. We needed to assume the user chooses sockets, cores and threads on that order up to down and filter out the invalid values in advance...

I think the #1431 is about weird topology values according to the total VCPUs value, not about changing the user experience significantly. The user always could choose sockets, cores, threads randomly, not in any order, and the other values (other than actually changed value) were always changed according to the user's change and the number of total VCPUs.

... or just make the VM creation to fail on backend with the correct error for all cases.

I am not sure how I can achieve this here in the UI. First, the backend has to fail only when it should fail. Seems that for some cases this doesn't happen.

The current scenario is that the user can choose all kinds of combinations while the values are dynamically changed when selecting different values in different order without any logic. If values are invalid then it failed on backend or it is turned to 1, 1, 1 or other invalid non chosen values with the error displayed but without letting the user remember what he tried to set.

Yes, it seems this is the original design of the behavior. From my experience, it always worked like that. And when it turns to 1,1,1, it is because it is the only possible combination of values, according to their maximum. But unfortunately inconsistent with the total VCPUs value. And this was missing: some check if the product of those values = number of total VCPUs.

For example- the user set values to 1, 1, 384 but error appears for 16,1,8

That's how the behavior is designed. After every change, the other values are changed, too, to respect maximum values and the number of total VCPUs. The number of sockets/threads change when changing the number of cores to 1. It seems it is not possible to set 1,1,whatever values, for 384 total VCPUs. We would have to have possibility to set 384 threads, but maximum is much less.

in addition, current web-ui blocks options like in case of total vcpus set to 9 and topology is set to S=1, C=3, T=3.

Yes, this IS the problem. Seems like it is not possible to reach some of the valid combinations. But this problem is different than we are dealing with in this PR and already present for a long time. It follows from the design. The same happens in the VM Details card. I prefer not to deal with too many different problems incl. UX stuff in one PR. I suggest to open a separate issue for that and to have a talk with the UX folks if we want to change how VCPU topology feature works.

@sgratch
Copy link
Member

sgratch commented Aug 4, 2021

@hstastna

So there are 2 issues here:

1. The current cpu topology options selection is not fully supported and working a bit strange. This is not part of this fix so we can open a separate issue for this.

Based on your reply, please open a separate new issue for this inconsistency and gaps with setting the cpu architecture (with the example for setting S=1, C=3, T=3 for total vcpus=9 and maybe other examples raised here).
Not sure if UX shoud help since it's a functional bug on the way we implement this but let's see.

2. Error handling for detected invalid CPU architecture:
If we decide to leave the current solution of displaying an error then please rephrase the error to:
"Selected CPU topology is invalid" (instead of vCPU)
And check if possible to display a fixed (empty) or the original invalid value in case of this error occurrence instead of switching to a different invalid values that the user didn't select.

Based on your reply - do you mean that we can't fix this use case?:

values to 1, 1, 384 but error appears for 16,1,8

So the error appears but the CPU topology marked in red is switched to a suggested other topology which is also wrong but different than the original user's entering values?
If we can't display the original invalid topology set by the user then at least we can make sure that the cpu topology marked in red in case of a topology error is always empty or set to 0, 0, 0 or even 1, 1, 1.

@hstastna
Copy link
Member Author

hstastna commented Aug 4, 2021

And check if possible to display a fixed (empty) or the original invalid value in case of this error occurrence instead of switching to a different invalid values that the user didn't select

@sgratch, I have only a few questions on this: How this would work? How and when the remaining values would be chosen after the user makes some choice? If ever. When the validation of those values would happen?

It doesn't switch to a different values because of the error but because those values are the best possible values.
It is possible to change everything, of course, but I think it is not a good idea. Not here and right now. We would have to change the behavior of that feature, to answer all of my questions above. I think such a change should be discussed with the UX folks and solved by a different issue/PR - it can take some time.

Actually it works the way that as soon as the user chooses some value in the drop down, the other values are chosen according to the user's selected value, with respect to the maximum allowed values and to the total VCPU value. Then the validation of all the chosen values happens because sometimes it is not possible to choose the values that would meet all the required conditions. Such a validation was missing there. If we wanted the validation before the two remaining values are chosen (+ the chosen values would be fixed), the error would appear immediately and very often. Only after the user chooses the third value corresponding to the maximum values and the total VCPUs, the error would disappear. I consider this quite annoying. Or we can have validation of those values later, only after the user clicks on Save button. But I consider this as a step back as we can check the values immediately and to display the corresponding errors. This would be also inconsistent with some other already present checks there, for example checking the maximum allowed value of the total VCPUs.

@sgratch
Copy link
Member

sgratch commented Aug 5, 2021

@sgratch, I have only a few questions on this: How this would work? How and when the remaining values would be chosen after the user makes some choice? If ever. When the validation of those values would happen?

Let me summarize my concerns and suggestions:
As far as I checked, the backend is running all cpu topology validations for us so in case of a wrong cpu topology, a backend error is diaplyed e.g. "Cannot add VM. Maximum number of threads per cpu exceeded" will be displayed."

The main problem with our current topology calculations/validation logic s that it validates user input for vcpu=X and if not succeeded then:

  1. either switch to 1,1,1 (which leads to a VM created with vcpu=1 <> X)
  2. or switched to a suggested other topology which might also fail or include a result in which the product of the number of sockets, cores, threads is inconsistent with the number of the total VCPus (which leads to a VM created with vcpu != X).

So the backend is working correctly and the only problem that I can see is with this whole auto calculation for cpu topology.

As a quick solution to make it less confusing till refactoring might be done, we can either:

  1. Cancel this whole auto calculation for cpu topology and let the VM created with the original user cpu topology selection. If wrong then it will fail on backend with the correct error.

  2. Leave the code as is for now but at least avoid those jumping/invalid re-calculations by displaying a consistent empty values in case of a vcpu topology error. e.g.:
    Peek 2021-08-05 14-47

This is easy to implement - if validateTopologyValues() returns false, display the error and just set the topology values to -1 for example

@hstastna
Copy link
Member Author

hstastna commented Aug 9, 2021

@sjd78 Any thoughts about this? Thanks!

  1. Leave the code as is for now but at least avoid those jumping/invalid re-calculations by displaying a consistent empty values in case of a vcpu topology error. e.g.:

@sgratch I can't imagine avoiding those jumping/invalid re-calculations without canceling the whole auto calculation. But we can add another "jumping", by adding the condition you've already mentioned, after the validation. Those re-calculations happen before validation and it makes sense to me like that.

@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch 2 times, most recently from e686f06 to 40076b8 Compare August 11, 2021 09:08
@sgratch
Copy link
Member

sgratch commented Aug 12, 2021

@hstastna

@sgratch I can't imagine avoiding those jumping/invalid re-calculations without canceling the whole auto calculation. But we can add another "jumping", by adding the condition you've already mentioned, after the validation. Those re-calculations happen before validation and it makes sense to me like that.

If it's not easy to reset those fields after the validation failure since it will break the behavior when choosing invalid values in a row, then let's not invest more time on this now, merge this PR and open a follow up issue for the vCPU problems that were mentioned above.
Please open the issue before merging so that we won't forget.

Copy link
Member

@sgratch sgratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just suggest to rephrase the message.

A follow up issue was opened for open problems: #1503.

@@ -120,6 +120,8 @@ export const messages: { [messageId: string]: MessageType } = {
},
coresPerSockets: 'Cores per Virtual Socket',
cpus: 'Total Virtual CPUs',
cpusBadTopology: 'No valid CPU topology exists for this total Virtual CPUs count.',
cpusBadTopologySelection: 'Selected vCPU topology is invalid.',
Copy link
Member

@sgratch sgratch Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was somehow hidden between all posts but as mentioned on #1489 (comment):

  1. Error handling for detected invalid CPU architecture:
    If we decide to leave the current solution of displaying an error then please rephrase the error to:
    "Selected CPU topology is invalid" (instead of vCPU)

Hilda Stastna added 4 commits August 23, 2021 12:32
Issue: oVirt#1431

Display an error message for VCPU values which cannot be factored properly
for the VCPU topology (number of sockets, cores, threads).
Display an error message for incorrectly selected VCPU topology value
(number of sockets, cores or threads), in the VM Details card.
Display an error message for VCPU values which cannot be factored properly
for the VCPU topology. Also display an error message for incorrectly selected
VCPU topology value from the drop down (number of sockets, cores or threads).
Both in the CreateVMWizard.
…l VCPU

Prevent from continuing in VM creation if the total VCPU value cannot be factored
properly for the VCPU topology values. Same for the invalid VCPU topology values,
regarding the value of the total VCPUs.
@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from 40076b8 to 83a58ed Compare August 23, 2021 10:32
@hstastna hstastna force-pushed the Calculation_CPU_topology_some_values branch from 83a58ed to d95b813 Compare August 23, 2021 10:40
@hstastna hstastna requested a review from sgratch August 23, 2021 10:43
Copy link
Member

@sgratch sgratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@isaranova
Copy link

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculation of CPU topology is incorrect for some vCPU values
4 participants