Implement zip extract #158

tempusfrangit · 2024-02-15T06:37:25Z

Implements zip extraction consumer. This is require to handle dreambooth updates.

In golang extraction of tar and zip are within ~300ms for equivalent files (test case for dreambooth processing), lets enable lightening the load and directly handle zip files in pget.

Include FileSize to call to consume() to support consumers like ZipExtractor that requires a size
Implement a ReaderAt interface that can convert the muiltiChanReader, ZipExtractor requires an io.ReaderAt interface implementation
Implement the ZipExtractor consumer
Add some better logging when -x is supplied
Handle overwrite support in the Consumer
Address an Import loop.
Add contentType as input to consumer (and output from Fetch)
Support tar and zip extraction from multifile mode.
Correct issue with MultiReader, it was not blocking on bufferedReader ready signal
Implement --unzip (-u) option for setting unzip consumer
Multifile uses --unzip and --extract options now
README update
Debug Logging update for unzip and tar extract

Some future consumers will need to know the expected fileSize depending on implementation (e.g. unzip). This wires up basic support for adding the fileSize as an argument to Consume; the value is already available at the time Consume is called.

tempusfrangit · 2024-02-15T06:38:00Z

~~Still Missing: muiltifile support for .tar and .zip~~

multiReader is a reader that implements the ReadAt functionality needed for some future consumers (e.g. unzip). The multiReader at a basic level consumes a mutltiChanReader via the NewmultiReader() function and returns an io.ReaderAt implementation. bufferedReader now has a .len() calculation that will report the content-length once that header is received. Since we do not know the actual content length until the download starts, there is a new signal channel to indicate the download has started and allows us to read the size of the bufferedReader. This means that there is the real likelihood that reading from multiReader may block more often than chanmultiReader. MultiReader may be able to implement Seek() and other related functions for reading the data out of strict order.

Implement ZipExtractor consumer

If the consumer is not File or tar-extractor when -x is used, log a warning that the tar-extractor supersedes the specified consumer.

Make the consumer handle overwriting explicitly. This addresses edge cases with tar and zip consumer when extracting files.

Move the ConsistentHashingStrategyKey to client not config.

'unzip' is the binary used in linux to extract from a zip file, lets stick with names that are more aligned with the CLI tools we otherwise use.

Fetch now returns contentType and consumers take ContentType as an argument. This is in preperation of multifile being able to direct differnt contentTypes to different consumers in the case of tar/zip extraction.

Multifile can now extract tar and zip files based upon the content-type. The -u and -t flags for multifile command control unzip and untar capabilities respectively.

* MultiReader was not blocking on the buffered reader ready signal * Unzip now joins the path name to the target instead of using '+' incorrectly

* Implement `-u` short hand for `--unzip` * `--unzip` option for invoking the unzip consumer added * multifile mode utilizes `--unzip/-u` and `--extract/-x` for tar and unzip modes * Improved Debugging logs for tar and unzip * Update README

PreRun and PreRunE are mutually exclusive. This moves the extraction and unzip consumer handling via short-hand options to PreRunE where we validate that -x and -u are not consurrently used.

dkhokhlov · 2024-02-21T06:06:56Z

pkg/extract/zip.go

+
+	for _, file := range zipReader.File {
+		err := handleFileFromZip(file, destPath, overwrite)
+		if err != nil {


extracted files do not end up outside the intended destination directory?
I assume ZIP code checks archive's size, structure for signs of corrupted/junk archive...
same for other arch types.

Malicious Tar and Zip checking should be added. I am not wanting to support non-standard zip (read: extensions) unless there is a real need.

Include fileSize to the call to Consume()

e192bd2

Some future consumers will need to know the expected fileSize depending on implementation (e.g. unzip). This wires up basic support for adding the fileSize as an argument to Consume; the value is already available at the time Consume is called.

tempusfrangit requested review from philandstuff and anotherjesse February 15, 2024 06:37

tempusfrangit added 5 commits February 14, 2024 22:40

Implement ZipExtractor consumer

8ebeb05

Implement ZipExtractor consumer

Log warning on -x if consumer doesn't match

4cd9172

If the consumer is not File or tar-extractor when -x is used, log a warning that the tar-extractor supersedes the specified consumer.

Handle Overwriting in the Consumer

46ed2db

Make the consumer handle overwriting explicitly. This addresses edge cases with tar and zip consumer when extracting files.

Address import loop

9ca65bd

Move the ConsistentHashingStrategyKey to client not config.

tempusfrangit force-pushed the implement-zip-extract branch from e7d58cf to 9ca65bd Compare February 15, 2024 06:42

tempusfrangit mentioned this pull request Feb 15, 2024

Make O_TRUNC use better #152

Closed

tempusfrangit added 3 commits February 14, 2024 22:58

Rename zip-extractor to unzip

e0b5c39

'unzip' is the binary used in linux to extract from a zip file, lets stick with names that are more aligned with the CLI tools we otherwise use.

Add ContentType support for Consumer use

e822d33

Fetch now returns contentType and consumers take ContentType as an argument. This is in preperation of multifile being able to direct differnt contentTypes to different consumers in the case of tar/zip extraction.

Enable Multifile to extract tar and zip

fbab194

Multifile can now extract tar and zip files based upon the content-type. The -u and -t flags for multifile command control unzip and untar capabilities respectively.

tempusfrangit requested a review from Pwntus February 15, 2024 07:52

tempusfrangit mentioned this pull request Feb 15, 2024

Enhancement Request: GZIP support (tar mode) #1

Closed

tempusfrangit added 3 commits February 19, 2024 12:10

Fix unzip

27a5054

* MultiReader was not blocking on the buffered reader ready signal * Unzip now joins the path name to the target instead of using '+' incorrectly

Update: Tar/Unzip

c21731c

* Implement `-u` short hand for `--unzip` * `--unzip` option for invoking the unzip consumer added * multifile mode utilizes `--unzip/-u` and `--extract/-x` for tar and unzip modes * Improved Debugging logs for tar and unzip * Update README

PreRun and PreRunE are not both run.

3e02620

PreRun and PreRunE are mutually exclusive. This moves the extraction and unzip consumer handling via short-hand options to PreRunE where we validate that -x and -u are not consurrently used.

dkhokhlov reviewed Feb 21, 2024

View reviewed changes

tempusfrangit marked this pull request as draft February 26, 2024 20:10

tempusfrangit closed this Feb 27, 2024

tempusfrangit deleted the implement-zip-extract branch March 1, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement zip extract #158

Implement zip extract #158

tempusfrangit commented Feb 15, 2024 •

edited

Loading

tempusfrangit commented Feb 15, 2024 •

edited

Loading

dkhokhlov Feb 21, 2024

tempusfrangit Feb 21, 2024 •

edited

Loading

Implement zip extract #158

Implement zip extract #158

Conversation

tempusfrangit commented Feb 15, 2024 • edited Loading

tempusfrangit commented Feb 15, 2024 • edited Loading

dkhokhlov Feb 21, 2024

Choose a reason for hiding this comment

tempusfrangit Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

tempusfrangit commented Feb 15, 2024 •

edited

Loading

tempusfrangit commented Feb 15, 2024 •

edited

Loading

tempusfrangit Feb 21, 2024 •

edited

Loading