Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src debug command #731

Merged
merged 114 commits into from
May 13, 2022
Merged

src debug command #731

merged 114 commits into from
May 13, 2022

Conversation

DaedalusG
Copy link
Contributor

@DaedalusG DaedalusG commented Apr 22, 2022

Src Debug

src debug is a little pet project I've been working on for a long time now as a way to learn golang. The intention here is to create a cli command that collects the data from common diagnostic commands to speed up debugging and make sharing information about a Sourcegraph instance easier for our support team.

The command is divided into three separate sub commands (server,compose, and kube) each specific to their deployment type and with some unique flags and values.

Here is an example of the return from a src debug server command:

warrengifford@Warrens-MacBook-Pro src-debug-test % src -v debug server -c vibrant_ritchie -o serv3
This command will archive docker-cli data for container: vibrant_ritchie
 SRC_ENDPOINT: https://cse-aws-test.sgdev.org
 Output filename: serv3.zip
Do you want to start writing to an archive? [y/N]: y
archiving file "serv3/inspect-vibrant_ritchie.txt" with 12294 bytes
archiving file "serv3/vibrant_ritchie.log" with 571587 bytes
archiving file "serv3/top-vibrant_ritchie.txt" with 84951 bytes
archiving file "serv3/config/siteConfig.json" with 3772 bytes
archiving file "serv3/config/external_services.txt" with 9480 bytes

This is the simplest version of the command and output

For an abridged example of kubernetes --

warrengifford@Warrens-MacBook-Pro src-debug-test % src debug kube -n ns-sourcegraph -o kube-test
Archiving kubectl data for 33 pods
 SRC_ENDPOINT: https://cse-aws-test.sgdev.org
 Context: gke_beatrix-test-overlay_us-central1-c_beatrix-test
 Namespace: ns-sourcegraph
 Output filename: kube-test.zip
Do you want to start writing to an archive? [y/N]: y

... after unzip

warrengifford@Warrens-MacBook-Pro src-debug-test % cd kube-test
warrengifford@Warrens-MacBook-Pro kube-test % ls
config  kubectl
warrengifford@Warrens-MacBook-Pro kube-test % ls config
external_services.txt   siteConfig.json
warrengifford@Warrens-MacBook-Pro kube-test % ls kubectl
events.txt                      persistent-volume-claims.txt    pods
getPods.txt                     persistent-volumes.txt
warrengifford@Warrens-MacBook-Pro kube-test % ls kubectl/pods
cadvisor-44nvf                                  jaeger-56fbb74f8-jmtjs
cadvisor-55b9d                                  minio-54b558cdd6-hlgbt
cadvisor-79pgg                                  pgsql-8565fd54ff-6n8tk
cadvisor-7s8gw                                  precise-code-intel-worker-67946948c5-f9282
cadvisor-9dxct                                  precise-code-intel-worker-67946948c5-xq8pf
cadvisor-cwmpm                                  prometheus-79f68b8d76-x8skc
cadvisor-gnnbq                                  redis-cache-bb84b65cf-j6gfj
cadvisor-lmd6q                                  redis-store-75ffcfb74b-b8pxf
cadvisor-nq75p                                  repo-updater-978f46dff-27b52
cadvisor-zkspf                                  searcher-67fd7f64cc-l6dqs
codeinsights-db-78b8747cdd-bzj72                searcher-67fd7f64cc-tpjj9
codeintel-db-8fb75ddfd-xhncd                    sourcegraph-frontend-6869dbb7d-tf4v8
github-proxy-7bfd897445-tgtxv                   sourcegraph-frontend-6869dbb7d-tr7rk
gitserver-0                                     symbols-95d9c59d9-v6brz
grafana-0                                       syntect-server-6b4478987-tpvsg
indexed-search-0                                worker-6dc84c8cf5-bkmdh
indexed-search-1
warrengifford@Warrens-MacBook-Pro kube-test % ls kubectl/pods/gitserver-0
describe-gitserver-0.txt        jaeger-agent.log                prev-gitserver.log
gitserver.log                   manifest-gitserver-0.yaml

Here you can see the general directory structure outputted in the zip archive and get an idea about the data captured by the command.

Test plan

I'd love to get some testing from friends here 🙏

I tested each command with every iteration against the two AER test environments cse-aws (the command was tested against a local compose instance for docker cli outputs) and cse-k8s. These instances are fairly small so it might be a good idea to test out the kube command against a much larger instance to test out the semaphore throttling, I don't have kubectl access to .com but that would be a good test ground.

Testing amounted to executing the command and checking to make sure that expected outputs were received and written to a zip file.

To run the command checkout the src-debugger and run the command from there:

wglaptop@Warrens-MacBook-Pro-2 src-cli % pwd
/Users/wglaptop/src-cli
wglaptop@Warrens-MacBook-Pro-2 src-cli % go run ./cmd/src debug
'src debug' gathers and bundles debug data from a Sourcegraph deployment for troubleshooting.

Usage:

	src debug command [command options]

The commands are:

	kube                 dumps context from k8s deployments
	compose              dumps context from docker-compose deployments
	server               dumps context from single-container deployments


Use "src debug [command] -h" for more information about a subcommands.
src debug has access to flags on src -- Ex: src -v kube -o foo.zip

Note that the command makes use of your SRC_ENDPOINT to gather siteconfig.json and external service config json. It also uses your local machine kubectl context to target the Sourcegraph kubernetes cluster, and a host machines docker cli.

Feel free to hit me here or DM me with any questions!

DaedalusG and others added 30 commits July 2, 2021 09:03
@DaedalusG DaedalusG changed the title Draft: src debug command src debug command May 6, 2022
@bobheadxi
Copy link
Member

bobheadxi commented May 6, 2022

This looks great! I wonder if this might be a good opportunity to remove the report a bug page entirely and ask admins to provide a dump using this new command instead?

@DaedalusG
Copy link
Contributor Author

@bobheadxi Maybe we can synthesize the page and this command. I'd like to eventually add some more functionality to this command. In particular I'd love for it to include a dump of the prometheus data, with a supporting way to reconstruct our grafana interface from that data. I like the alerts that are listed In the site admin debug page though, and I'll def review those PRS for further Ideas here.

Listing for later:
sourcegraph/sourcegraph#10704
sourcegraph/sourcegraph#7657

@bobheadxi
Copy link
Member

Happy to chat about the alerts summary implementation if you need help/more context! It's been a looong time but I also find the "smart summary" concept behind it neat

Copy link
Contributor

@mucles mucles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaedalusG Great work 💯 ! this looks great 🔥 I think it would also be helpful to be able to see if there are any alerts being fired in the instance from the Prometheus data to be in the data dump. it would also be great to have a docs page we could share with our site admins.

@DaedalusG
Copy link
Contributor Author

@mucles Definitly wanting to build out som further features and usability here-- for the docs page, I believe src commands have a doc page automatically generated, but I'll probably flesh things out a bit after merging this.

@DaedalusG DaedalusG requested a review from ferozsalam May 11, 2022 17:44
Copy link
Contributor

@LawnGnome LawnGnome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much like @ericfritz, I don't think there's anything here that blocks merge, other than maybe one missing error return. I've tried to call out places where I think Go conventions would normally lead to something slightly different, or where there might be some room for future improvements, but none of that requires fixes in this PR.

Great job conceiving and leading this!

CHANGELOG.md Outdated Show resolved Hide resolved
cmd/src/debug.go Outdated Show resolved Hide resolved
cmd/src/debug_common.go Outdated Show resolved Hide resolved
type archiveFile struct {
name string
data []byte
err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual in Go to bundle an error field into a structure, as Go tends to strongly prefer returning error values as separate return parameters. Normally, the "constructor" would return something like *archiveFile, error, and let callers handle err that way, rather than having to introspect the returned *archiveFile. This tends to look something like this in practice:

type myType struct {
  derivedFoo any
  derivedBar any
}

func NewMyType(foo any, bar any) (*MyType, error) {
  derivedFoo, err := doSomethingWithFoo(foo)
  if err != nil {
    return nil, errors.Wrap(err, "could not derive foo")
  }

  derivedBar, err := doSomethingWithBar(bar)
  if err != nil {
    return nil, errors.Wrap(err, "could not derive bar")
  }

  return &MyType{
    derivedFoo: derivedFoo,
    derivedBar: derivedBar,
  }, nil
}

Where we do tend to break this in Sourcegraph is where the error actually needs to persist across invocations — the most common case is where a GraphQL resolver does some sort of one time internal call to retrieve things from the database, but then might need to return the same error for each field that was requested in the GraphQL query.

You don't need to change this, but it's worth being aware of for future Go projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will investigate after merge, ty ty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LawnGnome FYI these values go through a channel which is why it's packaged as a struct in the first place. Does that change this advice?

cmd/src/debug_comp.go Outdated Show resolved Hide resolved
cmd/src/debug_kube.go Outdated Show resolved Hide resolved
cmd/src/debug_kube.go Outdated Show resolved Hide resolved
cmd/src/debug_serv.go Outdated Show resolved Hide resolved
cmd/src/debug_serv.go Outdated Show resolved Hide resolved
cmd/src/debug_serv.go Outdated Show resolved Hide resolved
DaedalusG and others added 11 commits May 13, 2022 11:26
Co-authored-by: Adam Harvey <adam@adamharvey.name>
Co-authored-by: Adam Harvey <adam@adamharvey.name>
Co-authored-by: Adam Harvey <adam@adamharvey.name>
Co-authored-by: Adam Harvey <adam@adamharvey.name>
Co-authored-by: Adam Harvey <adam@adamharvey.name>
@DaedalusG DaedalusG merged commit ee83879 into main May 13, 2022
@DaedalusG DaedalusG deleted the src-debugger branch May 13, 2022 18:51
@efritz efritz mentioned this pull request May 13, 2022
@efritz
Copy link
Contributor

efritz commented May 13, 2022

Small follow-up: #743

efritz added a commit that referenced this pull request May 18, 2022
scjohns pushed a commit that referenced this pull request Apr 24, 2023
* setup debug command scaffold

* added test kubectl events logging

* debug: Create ZIP archive

* debug: Fail if zip file already exists

* debug: introduce directory structure in zip

* added --all-naamespaces to get events command

* populated with TODO tasks, add get pods output

* setting up k8s get pods to enable grabbing logs

* got pod names parsed to slice

* pulling logs from all pods, TODO: handle subcontainers

* add missing .txt string to log archive names

* used json get pods out, refactored k8s logs into their own spot

* refactor get logs into savek8sLogs

* working on logs for previous pods, need to only write when command returns without error

* archives prev-logs if exists

* refactoring getPods to handler scope

* refactored all kubectl call functions to be declared outside of handler

* added pod manifests and describes

* added some more TODOs

* added a little error handling on archives

* corrected some error handling

* added validation on out flag, and setup for deployment flag

* improved logic for deployment flag

* refactored deployment script

* added get PV and PVC, used out flag as baseDir for archive filestructure

* changed archive filestructure to be oriented around pod directories containing logs, past logs, and manifests

* cleaned up some finished TODOs, shortened selector values for -d flag

* refactored kubectl archiving into -d flag switch

* working get containers

* Concurrency!!

* made deploy level kubectl functions concurrent

* figured out bug in getLogs caused by immediately invoked function and variable scope in for loop

* all kubectl functions refactored to run as goroutines, bug present causing some zip writes to write empty files

* added some prints for debugging

* clean up some flag validating prints and comments

* A bunch of improvements

1. Set open file limits in process to support this much concurrency.
   This could also be done in the shell with `ulimits -n 99999`.
2. Pass in `context.Context` so we cancel any pending go-routines when
   returning early.
3. Change getXXX signatures from returning a slice to a single
   *archiveFile (since we were never actually returning more than one
   file)
4. Check for error in the file archiving loop and abort if there's any.
5. Verbose logging support.

* add todo for logging

* fixed hardcoding 999999 in setOpenFileLimits

* removed plural k8s func names, started work on docker commands

* working log archival for docker containers

* added docker inspect

* added docker container stats

* improved logic for get containers

* cleanup

* clarify assumptions

* introduced debug sub commands kube, comp, and serv -- Ex: src debug kube -out=test

* for some reason my .gitignore was ignoring these new files

* changed flagset to use .StringVar rather than .String in flagSet package

* correct flag typo in serv

* fixed verbose output in kube command, changed -out flag to -o

* switching to passanger, stating getConfig

* ignore errors and get extsvc configs

* add some comments from work with Tomas

* added namespace flag for kube subcommand

* added a little safety check to the kube command

* added a semaphore to kube, added some text safty checks

* added semphore to all RPC calls in archiveKube

* added current-context logging before kube command execution

* Atempting to debug error inconsistent --previous logs call

* added network validation to comp subcommand

* added a function to pull site config, cleaned up logging and usage funcs

* moved sub-functions to relevant file, trial utility archiveFileFromCommand in src extsvc function

* converted all kube commands to , renamed kube getContainers to getPods

* refactored compose functions with archiveFileFromCommand

* Made archiveFileFromCommand calls aesthetically pleasing, added docker ps, and get pods commands as a sort of index

* changes all path package calls to path/filepath calls

* emptied graveyard

* return early from failed semaphore.Acquire rather than reading error

* handle errors, remove direct comparison to booleans

* cleaned up some more linter erros

* refactor adding verify func

* fixed final improperly handled error

* use semaphore in debug comp

* working errgroups in comp

* refactor to errgroup in kube

* refactored writer in archive functions

* corrected all complex filepath calls using fmt.Sprintf()

* filter docker ps output to appropriate network

* removed setup debug as will as openfile limit adjustment

* normalize site config

* correctly process json

* fleshed out serv command

* Update .gitignore

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Apply suggestions from code review

Implementation of Eric's review suggestions; untested, requiring duplication in other files

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* implement and repair infered refactors in erics suggestions

* addressed many suggested code improvements for readability and style; still needs run through for error handling, and potentially some more refactors

* refactor baseDir processor

* went over errors, still probably needs another run since I widely used errors.Wrapf

* clarify all string serializations

* Update cmd/src/debug_comp.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Update cmd/src/debug_common.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Update cmd/src/debug_common.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Update cmd/src/debug_kube.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Update cmd/src/debug_kube.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* Update cmd/src/debug_kube.go

Co-authored-by: Eric Fritz <eric@eric-fritz.com>

* finishing touches

* final error handling cleanup

* add changelog entry

* Update cmd/src/debug_serv.go

Co-authored-by: Adam Harvey <adam@adamharvey.name>

* Update cmd/src/debug_comp.go

Co-authored-by: Adam Harvey <adam@adamharvey.name>

* Update cmd/src/debug_kube.go

Co-authored-by: Adam Harvey <adam@adamharvey.name>

* Update cmd/src/debug_kube.go

Co-authored-by: Adam Harvey <adam@adamharvey.name>

* Update cmd/src/debug.go

Co-authored-by: Adam Harvey <adam@adamharvey.name>

* address harvey suggestions

* finalize changes from harvey's suggestions

* fix go.mod

* really fix go.mod

Co-authored-by: Tomás Senart <tsenart@gmail.com>
Co-authored-by: Tomás Senart <tomas@sourcegraph.com>
Co-authored-by: Eric Fritz <eric@eric-fritz.com>
Co-authored-by: Adam Harvey <adam@adamharvey.name>
scjohns pushed a commit that referenced this pull request Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants