Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: System Info & Diagnose #4379

Merged
merged 7 commits into from Jul 29, 2022
Merged

Conversation

foursevenlove
Copy link
Contributor

Signed-off-by: foursevenlove foursevenlove@gmail.com

Description of your changes

Feat #3924

I have:

  • Read and followed KubeVela's contribution process.
  • Related Docs updated properly. In a new feature or configuration option, an update to the documentation is necessary.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

  • run vela system info to check all detail system information
  • run vela system diagnose to diagnose system's health

Special notes for your reviewer

@Somefive

Signed-off-by: foursevenlove <foursevenlove@gmail.com>
@codecov
Copy link

codecov bot commented Jul 13, 2022

Codecov Report

Merging #4379 (558812d) into master (0e71a9d) will increase coverage by 0.91%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4379      +/-   ##
==========================================
+ Coverage   60.59%   61.51%   +0.91%     
==========================================
  Files         343      348       +5     
  Lines       33670    34512     +842     
==========================================
+ Hits        20402    21229     +827     
+ Misses      10589    10512      -77     
- Partials     2679     2771      +92     
Flag Coverage Δ
apiserver-e2etests 27.65% <ø> (?)
apiserver-unittests 40.26% <ø> (+5.64%) ⬆️
core-unittests 56.49% <ø> (+1.28%) ⬆️
e2e-multicluster-test 19.64% <ø> (-0.46%) ⬇️
e2e-rollout-tests 22.86% <ø> (+0.55%) ⬆️
e2etests 29.28% <ø> (+0.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/apiserver/domain/repository/application.go 42.59% <0.00%> (-16.39%) ⬇️
pkg/resourcekeeper/dispatch.go 77.35% <0.00%> (-7.55%) ⬇️
pkg/utils/file.go 55.76% <0.00%> (-6.40%) ⬇️
pkg/oam/util/test_utils.go 57.50% <0.00%> (-5.52%) ⬇️
pkg/utils/apply/apply.go 82.27% <0.00%> (-4.80%) ⬇️
pkg/cue/model/value/value.go 77.74% <0.00%> (-4.59%) ⬇️
pkg/apiserver/domain/model/application.go 86.41% <0.00%> (-4.39%) ⬇️
pkg/apiserver/event/sync/cache.go 72.97% <0.00%> (-4.17%) ⬇️
pkg/apiserver/domain/service/workflow.go 53.28% <0.00%> (-4.03%) ⬇️
pkg/velaql/providers/query/utils.go 40.00% <0.00%> (-2.86%) ⬇️
... and 81 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e71a9d...558812d. Read the comment docs.

return cmd
}

// NewSystemDiagnoseCommand create command to help user to diagonse system's health
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misspell: diagonse is a misspelling of diagnose

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

"strings"

"github.com/oam-dev/cluster-gateway/pkg/generated/clientset/versioned"
"github.com/oam-dev/kubevela/apis/types"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports: File is not goimports-ed with -local github.com/oam-dev/kubevela

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Signed-off-by: foursevenlove <foursevenlove@gmail.com>
Signed-off-by: foursevenlove <foursevenlove@gmail.com>
Signed-off-by: foursevenlove <foursevenlove@gmail.com>
if err != nil {
panic(err)
}
table.AddRow(deployment.Name, deployment.Namespace, deployment.Spec.Template.Spec.Containers[0].Image, strings.Join(deployment.Spec.Template.Spec.Containers[0].Args, " "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are getting a single deployment, which means the user want some detailed info, is fitting that deployment in a single row a good choice?

func NewSystemInfoCommand(c common.Args) *cobra.Command {
cmd := &cobra.Command{
Use: "info",
Short: "Print the system deployment detail information in vela-system namespace.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that while dealing with SystemInfo, the install namespace of KubeVela could be something other than "vela-system", so it would be better if we can have namespace-agnostic detection logics. This can be reserved as a future enhancement, not necessary for this PR.

Copy link
Collaborator

@Somefive Somefive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the comments by @charlie0129 (which are great advices), this PR, as an initial bootstrap for the vela system command generally LGTM. There are more details to dig for these two commands, which can be addressed in future PRs. Thanks for the contribution!

Details:

  1. The logic for vela system info is looking for the deployments in vela-system. But there are potential problems for that. Users can install KubeVela controller in other namespaces instead of vela-system and it is also possible to install vela-irrelevant deployments in the vela-system. Furthermore, even deployments can have zero replicas, which indicates they are not working. Deployments could be upgrading as well, which means the current working pods might not use the latest configuration for deployments while upgrading. So it might be better to add checks for the pods and look for KubeVela controllers by labels.
  2. We could add simple resource check for KubeVela controller & ClusterGateway, for example, showing the current CPU/Memory usage.
  3. Adding information for environment variables is also helpful, not only args.
  4. Args can be grouped, according to their usage. You can try to make some groupings according to your understanding and let's make discussion in the future.
  5. The discussion of diagnose command can be reserved in the future as well.

@Somefive
Copy link
Collaborator

Would you like to make a small video / gif / snapshots for the command and attach it in the introduction of the PR? It would be nice and straightforward for other reviewers to understand it. :)

… of by namespace 3.when getting a single deployment, the result is displayed in multi rows. Feat: 1.the system info command displays the cpu and memory metrics 2.the system info command displays the numbers of ready pods and desired pods.
@foursevenlove
Copy link
Contributor Author

Thank you @Somefive @charlie0129 for reviewing my code and making valuable suggestions. Based on the original, I made the following modifications:
Fix:

  • 1.Returning error instead of panic.
  • 2.Getting deployment by label instead of by namespace .
  • 3.When getting a single deployment, the result is displayed in multi rows.
  • 4.Using -s to specify a deployment instead of -n.

Feat:

  • 1.The system info command displays the cpu and memory metrics.
  • 2.The system info command displays the numbers of ready pods and desired pods.
  • 3.The system info command displays the environment variables.

Finally I made a gif to show how to use the system command:
2022-07-28_15-57-17

@Somefive
Copy link
Collaborator

Thank you @Somefive @charlie0129 for reviewing my code and making valuable suggestions. Based on the original, I made the following modifications: Fix:

  • 1.Returning error instead of panic.
  • 2.Getting deployment by label instead of by namespace .
  • 3.When getting a single deployment, the result is displayed in multi rows.
  • 4.Using -s to specify a deployment instead of -n.

Feat:

  • 1.The system info command displays the cpu and memory metrics.
  • 2.The system info command displays the numbers of ready pods and desired pods.
  • 3.The system info command displays the environment variables.

Finally I made a gif to show how to use the system command: 2022-07-28_15-57-17 2022-07-28_15-57-17

Looks great! I notice that the DCO CI has some problem. You could follow the guide to fix your commit signature.

Copy link
Collaborator

@Somefive Somefive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first initial PR, looks good to me. Following possible enhancements in later PR:

  1. For system info, the list of arguments in the demo gif looks to be truncated due to the width of the terminal. Multi-line display might be more helpful.
  2. For demo gif, it is clear that when everything is right, the command works pretty well. Let's make some fake broken scenarios, for example, let controller down, and show how the system-info will show.
  3. For demo gif, another interesting thing is the upgrade process. What if the kubevela-controller is upgrading and there are two different pods running at the same time. Could the system info command show their differences?
  4. For system info, showing CPU/Memory usage is cool, let's add the percentage usage if the resource limit is set. For example, memory: 456Mi (45.6%).
  5. For system info, there are some logs could be potentially useful. But I haven't come up with the idea of how to handle the log information. Feel free to make discussions if you have any ideas on it.

As for the diagnose, similar to [2] above, let's make some bad situation and see how this command performs.

Copy link
Member

@charlie0129 charlie0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@wonderflow wonderflow added the backport release-1.5 add this label will automatically backport this PR to release-1.5 branch label Jul 29, 2022
@wonderflow wonderflow merged commit 8a82ac6 into kubevela:master Jul 29, 2022
@github-actions
Copy link

Successfully created backport PR #4499 for release-1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-1.5 add this label will automatically backport this PR to release-1.5 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants