Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy collector enhanced #237

Merged
merged 26 commits into from Aug 24, 2020

Conversation

manavellamnimble
Copy link
Contributor

Hi there! To copy all the files in a given directory, I added this three steps:

  1. Execute find <folder or file path specified> -type f in the container console, which finds all the files and returns each path
  2. I then split the result and store it in an arrange of strings, variable file
  3. Lastly I made a loop in the Copy function to cat the file in each path.

I've already tested it with different pods, and with multiple copy statements in the yaml file referencing different containers. Files are stored as before, under namespace/pod.name/containerpath// filename in the support-bundle.

Let me know if there is something you would like to edit/add!

Fix #212

@divolgin
Copy link
Member

divolgin commented Aug 5, 2020

@manavellamnimble this should use tar command to retrieve the entire folder (or file, if the path is a file). There are two main reasons for this. Using tar allows copying all data in a single API call, leaving less chance for an error. But also we need to preserve file metadata: permissions, ownership, and timestamps. Metadata should be applied to files when the archive is extracted.

@marccampbell
Copy link
Member

@divolgin what should we do in the event hat tar is not installed in the container?

@manavellamnimble
Copy link
Contributor Author

@divolgin @marccampbell those are both good points. I can figure out a way to test if tar is installed (checking if running tar --help for example, returns an error). If it is not installed we go with the find+cat function, which is working (or at least it seems to be).The thing is, in case we use tar, should we leave the compressed file in /namespace/pod.name/ filename.tar.gz?
I should also investigate a if it is possible to run multiple commands (or concatenate commands) using the k8s client.

I assume we are not considering windows-based containers, is that correct?

Let me know what you think!

@divolgin
Copy link
Member

divolgin commented Aug 5, 2020

@marccampbell it is possible for tar to not installed. But I'd argue that this is such a rare configuration that we can resolve it with documentation. Even alpine image comes with tar.

@manavellamnimble
Copy link
Contributor Author

Ok, I am working on a solution using tar too, and making only one call to k8s client, I hope to have it working by tomorrow, and then you can tell me which of the two solutions would you rather implement. Does that sounds ok?

@emosbaugh
Copy link
Member

emosbaugh commented Aug 6, 2020

@manavellamnimble
Copy link
Contributor Author

@marccampbell @divolgin @emosbaugh I was able to extract the folder or files as .tar and place the tar file inside the support bundle under /<pod.name>/<filename.tar>. Nevertheless, it doesn't work when the folder being copied contains an image. I tracked the problem to the redactor (If I skip the redaction stage, everything works ok). My question is, should I work in finding way to preserve images? Or are images not important and I can exclude them from the tar?

@emosbaugh
Copy link
Member

There is a known issue that redaction appends new lines. I wonder if this is the issue with images?

@manavellamnimble
Copy link
Contributor Author

@emosbaugh It might be, yes! But you made me realize, it is not posible to redact any file in a tar file. Based on what you say, the redactor must be making a modification in the bytes of the tar file, corrupting the file.

I think...maybe extracting the files in the tar prior to redaction may work and have a double benefit. We only make one request to the client, and we can apply the redactors to the files, which may solve issue #58 too.

@emosbaugh
Copy link
Member

@laverya you may have more of an opinion on the last comment from @manavellamnimble

@marccampbell
Copy link
Member

I think the redactors should be taking archives into account. If a tar or zip or tgz is found in a bundle, the Redactor phase must decompress, redact and recommpress.

copyCollector.ContainerPath: output.Bytes(),
}, nil
for _, file := range filepaths {
if file != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better written has

if file == "" {
  continue
}

The code is less conditionally indented and more readable this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeLingoBot capture continue in conditional in loop

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule captured at https://dash.codelingo.io/repos/replicatedhq/troubleshoot/rules/1392

CodeLingoBot Help

When responding to PR-level comments, I understand the following commands:

  • capture to capture the context for a fresh rule.

  • review to trigger a fresh review.

When responding to comments on specific lines of code, I understand the following commands:

  • capture to capture the context for a fresh rule.

  • review to trigger a fresh review.

  • ignore to ignore the parent review comment.

  • unignore to stop ignoring a parent review comment.

if file != "" {
command := []string{"cat", file}

req := client.CoreV1().RESTClient().Post().Resource("pods").Name(pod.Name).Namespace(pod.Namespace).SubResource("exec")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily verbose.

req := client.Corev1().Pods(pod.Namespace).Get(pod.Name)

Is cleaner and how we use the clientset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeLingoBot capture unnecessary use of client-go restClient

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule captured at https://dash.codelingo.io/repos/replicatedhq/troubleshoot/rules/1393

CodeLingoBot Help

When responding to PR-level comments, I understand the following commands:

  • capture to capture the context for a fresh rule.

  • review to trigger a fresh review.

When responding to comments on specific lines of code, I understand the following commands:

  • capture to capture the context for a fresh rule.

  • review to trigger a fresh review.

  • ignore to ignore the parent review comment.

  • unignore to stop ignoring a parent review comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Marc! I didn't have much contact with the client set yet, so I left that line as it was. The way you suggest returns another type of variable and requires another type of variable too instead of pod.Name. I didn't investigate it further because I am sure I will have to deal with the clientset in later PR's, and I didn't wanted to delay the copy collector.
That said I can either change it later, once I've dealt with the client set, or look into it now, as you prefer!

@manavellamnimble
Copy link
Contributor Author

@marccampbell Good Morning Marc! You are right, it is way better as you propose in both cases, thank you for the observation!

I will work in the following process for the copy collector (@divolgin and @emosbaugh I think your opinion about the process would be helpful too):

1- Compress file or folder in the pod and copy it (1 request to the client to get all, already working)
2- Decompress the tar file and send it through Redactor. In case one of the files in the selected folder is a tar, decompress it and redact it too.
3- Compress all again and save it in the bundle under /<pod.name>/<folder/file>.tar

Considering that originally the copy collector used cat to copy a file, I would think this would be a great improvement. Let me know what you think!

@emosbaugh
Copy link
Member

that looks good. would you include gzipped files as well?

@manavellamnimble
Copy link
Contributor Author

Yes, I was thinking about choosing a decompressor/compresor according to the extension of the compressed file in the folder.

@divolgin
Copy link
Member

@manavellamnimble I think it's not necessary to re-compress everything. If fact, it's probably better to leave the result as a folder structure matching the original folder in the pod.

@manavellamnimble
Copy link
Contributor Author

@divolgin taking into account what you said about keeping the file info of the copied files (for example, permissions), I found that the best way to keep this info was to make a tar file.
I have a version already working, where both the tar with the folder and any tar inside the folder are decompressed, redacted, and compressed back. The tar file is, as you said, matching the original folder in the pod.

In the image, for example, I copied two folders (httpd and httpdocs) and one file (cheddar.yaml). And every file inside each tar is redacted (and also each file inside a tar inside the tar...tar-inception).

Of course it may be removed but I think it is a robust solution! First thing tomorrow I will commit and we can discuss the code! Let me know what you think!

Screen Shot 2020-08-11 at 7 55 28 PM

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@manavellamnimble
Copy link
Contributor Author

manavellamnimble commented Aug 12, 2020

@marccampbell @divolgin @emosbaugh Morning! I've made a new commit. The upgrades are:

  • It compress and extract a folder or file in a tar, making just one call to the client.
  • The redaction algorithm take the tar file, and every tar file inside it, decompress it, redact each file, and compress all back together.
  • Copied folder or file is placed compressed into the bundle matching the original folder path in the pod. In case of copying multiple folders from the same directory, both compressed folder are placed under the same path (no duplicate folders)
  • As a plus, it implies no changes or modifications to the copy collector function, or how it is specified in the .yml file.
  • FileInfo (including permissions) is saved.
  • In my case, the files inside each tar where redacted (when it comes to yaml files)

Possible upgrades:

  • extend functionality to decompress other tar-ish files (gzip, tar.gz,...)
  • avoid redacting images or fix bug if there were one (still not working)

Please let me know any thought you have! Have a great day!

As an example, the image represents the output of the following yaml file:

collectors:
    - copy:
        selector: 
          - app=myhttpd
        namespace: default
        containerPath: /usr/local/apache2/htdocs
    - copy:
        selector: 
          - app=myapp
        namespace: default
        containerPath: /go/src/app/k8s/Ejemplos/cheddar.yaml
    - copy:
        selector: 
          - app=myapp
        namespace: default
        containerPath: /go/src/app/httpd

Screen Shot 2020-08-11 at 7 55 28 PM

pkg/collect/copy.go Outdated Show resolved Hide resolved
pkg/collect/redact.go Outdated Show resolved Hide resolved
pkg/collect/redact.go Outdated Show resolved Hide resolved
pkg/collect/redact.go Outdated Show resolved Hide resolved
pkg/collect/redact.go Outdated Show resolved Hide resolved
pkg/collect/copy.go Outdated Show resolved Hide resolved
Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@manavellamnimble
Copy link
Contributor Author

@divolgin @emosbaugh Hi there! I've added the possibility to decompress and redact files from .tar.gz and .tgz files. It works on nested tar-ish files as well. I tried to use the same functions, to keep it simple.
Tomorrow is holiday in Argentina, and I thought about committing the code today so you can see it tomorrow morning!

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@divolgin
Copy link
Member

@manavellamnimble does this copy collector leave folders as folders in the support bundle, or are they in tar format?

@manavellamnimble
Copy link
Contributor Author

@divolgin The target folder (lets say, folder httpd in go/src/app/httpd) is place as tar in the bundle under /<pod_name>/ go/src/app/htrpd.tar as tar. Every content in the tar is left as it was (folders as folders, files as files and compressed files (tar, tar.gz and tgz) as compressed files).
That way we not only copy the folder in its entirety, but also keep all the info of it contents, and of the target-folder itself. Image as example, with the target folder decompressed:
Screen Shot 2020-08-18 at 10 28 45 AM

var err error
if filepath.Ext(filename) != ".tar" {
zr, err = gzip.NewReader(tarFile)
defer zr.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defer should come after checking the err != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right! Done

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@divolgin
Copy link
Member

@manavellamnimble files and folders in the support bundle should match the source. So in your example, cheddar.yaml.tar should be plain cheddar.yaml text file. And httpd.tar should be httpd folder with the same structure as in the container.

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@manavellamnimble
Copy link
Contributor Author

manavellamnimble commented Aug 19, 2020

@divolgin Hi Dimitriy! After considering many approaches, I think this is the cleaner one. It requires almost no modifications, and it doesnt spread changes through the application. It stores folders and files in the support bundle, keeping the permissions and matching the path in origin.

It maintains all the upgrades developed until this point.

I've tested it copying two folders (httpd and htdocs) and a file (k8s/ingress.yaml), see image below:

Screen Shot 2020-08-19 at 4 29 14 PM

for filename, maybeContents := range output {
if strings.Contains(c.GetDisplayName(), "copy") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be if c.Collect.Copy != nil {

Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@manavellamnimble
Copy link
Contributor Author

manavellamnimble commented Aug 21, 2020

@divolgin Hi Dimitriy, good news... I end up changing the command. Working with the client set today I realized that stdout is redirected when called from the client, therefore avoiding the error "refusing to write...". I've tested it and it works. I've also changed the conditional you suggested.

@divolgin divolgin merged commit 6ad9103 into replicatedhq:master Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'copy' collector should work with directories
5 participants