feat: add docker-archive unpacking #106

ivanstanev · 2019-09-24T16:13:10Z

As part of static image analysis we need to process an image tar on the filesystem.
This PR adds functionality to allow unpacking of what is known as a docker-archive.
For the Runtime team this means processing docker-archive as produced by the Skopeo tool; the format there is slightly different but handled in this PR as well.
For every unpacked file of interest we can run a bunch of actions: useful for key binaries hashing in the future.

Also added unit tests with mocks/fixtures to verify it's working as intended.

Ready for review
Follows CONTRIBUTING rules
Reviewed by Snyk internal team

Where should the reviewer start?

Have a look at the image-extractor.ts file: the entry point is extractFromTar.

Any background context you want to provide?

Part of static image analysis for @snyk/runtime

What are the relevant tickets?

Jira ticket RUN-450
Jira ticket RUN-462

lib/analyzer/image-extractor.ts

lib/stream-utils.ts

lib/analyzer/image-extractor.ts

ivanstanev · 2019-09-25T13:49:12Z

@mladkau I think by answering I'll address both your comments! The current implementation is not yet thinking about key binaries hashing, we most likely would like to pass through the archive twice as there isn't a clean way to work with a stream by consuming it N times. This current approach is for finding individual specific files and applying some sort of "transformation" to it if necessary, something like having an array and applying .map() to it.

The idea is that we will have two phases to processing files: one is to get or locate the file in the archive, the second step is to apply the different analyzers (e.g. apk, apt, rpm) once a file is obtained.

I would like to get the initial implementation here so we can start working with something and when it comes time to worry about key binaries we can adapt the code, for example we can pass the stream directly to the callbacks and the callback handler can pipe to its own processing logic or listen to 'data' events. This is a really good example how we can adapt it: https://stackoverflow.com/a/51143558

As to why we have the streamToBuffer function, it's just a default case to return the file as binary. We could have it do streamToString by default but not every file is a text file!

mladkau · 2019-09-25T15:13:27Z

My main concern is memory consumption here. You might encounter some large files in a file system and loading them into memory is not a good idea. If you really cannot work with streams then do a stream to Temp file.

Shesekino · 2019-09-25T15:24:00Z

@mladkau these are great concerns, thanks for raising them!
just out of curiosity, today (with the "dynamic" scanning) - aren't we running into the same memory consumption scenario?

the package manager manifest files would still be cat in their entirety to a variable, and the binaries would still be executed (which requires loading into memory, I guess?).

basically my question is - with this current implementation, are we simply not improving some aspects of the scan, or are we actually making some aspects worse, to your knowledge?

ivanstanev · 2019-09-25T16:25:24Z

@mladkau I have updated the PR; we can now process the stream directly in the callback! This would help us do things like key binaries hashing efficiently.

lib/stream-utils.ts

lib/extractor/layer.ts

test/lib/stream-utils.test.ts

lib/extractor/layer.ts

test/lib/stream-utils.test.ts

As part of static image analysis we need to process an image tar on the filesystem. This PR adds functionality to allow unpacking of what is known as a docker-archive. For the Runtime team this means processing docker-archive as produced by the Skopeo tool; the format there is slightly different but handled in this PR as well. For every unpacked file of interest we can run a bunch of actions: useful for key binaries hashing in the future. Also added unit tests with mocks/fixtures to verify it's working as intended.

mladkau · 2019-09-26T16:30:19Z

@Shesekino You are correct we do run into the same memory consumption errors with the current approach. One reason why we don't do proper key binary lookup in the moment. Getting static scanning to work properly in terms of architecture and performance would be a HUGE step forward!

@ivanstanev giving the stream directly to the callback avoid loading everything into memory. But we need now to be careful as the stream can only be read once or?

ivanstanev · 2019-09-27T09:43:51Z

@mladkau If we can trust both first two answers here https://stackoverflow.com/questions/51076356/multiple-listeners-reading-from-the-same-stream-in-nodejs then the approach I went with (running the tar file stream through PassThrough streams for every callback) should allow us to clone the stream and process it independently in every callback! If that really doesn't work then we'll have to introduce some code changes but I think it's a problem only for static analysis so far and it'll be on me to make it work 😄

snyksec · 2019-09-27T09:50:17Z

🎉 This PR is included in version 1.31.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

team-lumos · 2023-10-22T14:21:41Z

🎉 This issue has been resolved in version 6.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

ivanstanev requested a review from a team September 24, 2019 16:13

ivanstanev self-assigned this Sep 24, 2019

ghost requested review from hisenb3rg and removed request for a team September 24, 2019 16:13

ivanstanev requested review from karniwl and moshikod September 24, 2019 16:14

Shesekino reviewed Sep 25, 2019

View reviewed changes

lib/analyzer/image-extractor.ts Outdated Show resolved Hide resolved

mladkau reviewed Sep 25, 2019

View reviewed changes

lib/stream-utils.ts Outdated Show resolved Hide resolved

mladkau reviewed Sep 25, 2019

View reviewed changes

lib/analyzer/image-extractor.ts Outdated Show resolved Hide resolved

ivanstanev force-pushed the feat/image-unpacking branch from b86f89b to 04246ec Compare September 25, 2019 15:29

ivanstanev requested a review from a team as a code owner September 25, 2019 15:29

ghost removed their request for review September 25, 2019 15:29

ivanstanev requested review from mladkau and Shesekino September 25, 2019 16:25

moshikod reviewed Sep 26, 2019

View reviewed changes

ivanstanev commented Sep 26, 2019

View reviewed changes

test/lib/stream-utils.test.ts Outdated Show resolved Hide resolved

ivanstanev force-pushed the feat/image-unpacking branch from 04246ec to 77d0b00 Compare September 26, 2019 15:09

ivanstanev requested a review from moshikod September 26, 2019 15:10

moshikod approved these changes Sep 26, 2019

View reviewed changes

ivanstanev merged commit 00043ca into master Sep 27, 2019

ivanstanev deleted the feat/image-unpacking branch September 27, 2019 09:44

snyksec added the released label Sep 27, 2019

snyk-bot mentioned this pull request May 18, 2021

[Snyk] Upgrade snyk-nodejs-lockfile-parser from 1.33.1 to 1.33.2 #345

Merged

mariasnyk mentioned this pull request Jun 19, 2022

[Snyk] Security upgrade snyk-nodejs-lockfile-parser from 1.30.1 to 1.35.1 mariasnyk/snyk-docker-plugin#7

Open

snyk-bot mentioned this pull request Sep 1, 2022

[Snyk] Upgrade dockerfile-ast from 0.2.1 to 0.5.0 #449

Closed

mariasnyk mentioned this pull request Apr 25, 2023

[Snyk] Security upgrade snyk-nodejs-lockfile-parser from 1.30.1 to 1.34.1 mariasnyk/snyk-docker-plugin#8

Open

snyk-internal-pr-bot mentioned this pull request Oct 1, 2023

[Snyk] Upgrade dockerfile-ast from 0.2.1 to 0.6.1 #534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add docker-archive unpacking #106

feat: add docker-archive unpacking #106

ivanstanev commented Sep 24, 2019

ivanstanev commented Sep 25, 2019 •

edited

Loading

mladkau commented Sep 25, 2019

Shesekino commented Sep 25, 2019 •

edited

Loading

ivanstanev commented Sep 25, 2019

mladkau commented Sep 26, 2019

ivanstanev commented Sep 27, 2019

snyksec commented Sep 27, 2019

team-lumos commented Oct 22, 2023

feat: add docker-archive unpacking #106

feat: add docker-archive unpacking #106

Conversation

ivanstanev commented Sep 24, 2019

Where should the reviewer start?

Any background context you want to provide?

What are the relevant tickets?

ivanstanev commented Sep 25, 2019 • edited Loading

mladkau commented Sep 25, 2019

Shesekino commented Sep 25, 2019 • edited Loading

ivanstanev commented Sep 25, 2019

mladkau commented Sep 26, 2019

ivanstanev commented Sep 27, 2019

snyksec commented Sep 27, 2019

team-lumos commented Oct 22, 2023

ivanstanev commented Sep 25, 2019 •

edited

Loading

Shesekino commented Sep 25, 2019 •

edited

Loading