Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Archive from string? #42

Open
Fossil01 opened this issue Mar 6, 2024 · 11 comments
Open

Read Archive from string? #42

Fossil01 opened this issue Mar 6, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@Fossil01
Copy link

Fossil01 commented Mar 6, 2024

What happened?

Would it be possible to pass data to this package as a string? I am reading the first X amount of bytes from a partially downloaded file which is a string and not saved to disk. Currently I only see the read function but that only takes a disk path.

I am trying to list the files in the archive and read their CRC32 value.

Another thing that would be nice is the ability to set the path for 7zip. Since the p7zip is unmaintained there's the normal 7zip package now which comes with the 7zz binary.

Also, is there password support on the horizon some time? Like $archive->setPassword('password'); before trying to list/extract.

How to reproduce the bug

Package Version

Latest

PHP Version

8.3.2

Which operating systems does with happen with?

Linux

Notes

No response

@Fossil01 Fossil01 added the bug Something isn't working label Mar 6, 2024
@ewilan-riviere ewilan-riviere self-assigned this Mar 7, 2024
@ewilan-riviere
Copy link
Contributor

Thanks for these ideads, I will work on it!

@Fossil01
Copy link
Author

Fossil01 commented Mar 7, 2024

Awesome!

I am trying to migrate from this fork https://github.com/DariusIII/rarinfo, which does not support passwords either. So I am calling unrar/unzip/7zz manually now which is kind of a pain in the ass.

The one thing that package does properly (most of the time) is being able to read archive contents without extracting (if there is no password) with the getArchiveFileList() function.

@ewilan-riviere
Copy link
Contributor

To summary, I will try to add these features:

  • password option with setPassword()
  • read an archive from file contents as string
  • manually set path for 7zip binary
  • read content of archive without extract

It's ok?
I can't assure all these features but I will try to implement it.

@Fossil01
Copy link
Author

Fossil01 commented Mar 7, 2024

Sounds great!

@ewilan-riviere
Copy link
Contributor

ewilan-riviere commented Mar 7, 2024

About read an archive as string, a solution could be to copy archive into temporary directory and read it.

<?php

$contents = file_get_contents($path); // simulate the zip file content
$path = tempnam(sys_get_temp_dir(), 'zip'); // create a temporary file
file_put_contents($path, $contents); // write the content to the temporary file

$archive = Archive::read($path); // now we can read the zip file

We can imagine another method, like fromString().

<?php

$contents = file_get_contents($path);
$archive = Archive::fromString($contents);

Inside this method, $contents will be written to a temporary file and then read.

@Fossil01
Copy link
Author

Fossil01 commented Mar 7, 2024

I was trying to keep it from disk and keep it in memory. This will save my SSDs in the long run :-)

fromString() would be perfect.

@ewilan-riviere
Copy link
Contributor

ewilan-riviere commented Mar 7, 2024

I publish a beta version, you can test it and report any issue.

{
  "require": {
    "kiwilan/php-archive": "dev-main"
  }
}

Now you can use readFromString() to read an archive from a string.

$archive = Archive::readFromString($contents);

This method will try to detect the archive type from the string. If it fails, it could throw an exception. You can set manually the archive type using the third parameter.

$archive = Archive::readFromString($contents, extension: 'zip');

You can set a password for the archive using the second parameter.

Not work on Windows for RAR and 7z (WIP).

$archive = Archive::read($path, 'password');
$archive = Archive::readFromString($contents, 'password');

You can also manually set 7z binary path.

Not work on Windows (WIP).

$archive = Archive::read($path)->overrideBinaryPath($binary_path);

@ewilan-riviere
Copy link
Contributor

Now password and override binary works on Windows.

@Fossil01
Copy link
Author

Fossil01 commented Mar 8, 2024

Looks great. Will test this weekend or Monday.

@Fossil01
Copy link
Author

Fossil01 commented Mar 10, 2024

Okay so I tried it on a partial RAR file (Size: 3,5 MB)

$test = Archive::readFromString($this->_tmpExtractPath, $this->_release->password, 'rar')->overrideBinaryPath($this->_7zipPath);

Archive: Error detecting extension from mime type, please add manually archive extension as third parameter of readFromString().

Running unrar -l on this file shows the contents of the RAR. Albeit it (obviously) throws an Unexpected end of archive error since it's not the full file I am reading. All good.

7Zip however can't read the file: Cannot open the file as archive. I was running 7zip 21.07, so I upgraded to the latest beta 24.00 (https://www.7-zip.org/download.html) and that could read this file. But readFromString() keeps throwing that same error.

It looks like it will always set $extension to null if the match() fails, therefore ignoring the $extension passed into the function as a parameter.

Also relying on Mime Type might give some issues. I have been relying on my own function in another bit of code for a while now to detect the archive type by the first few bytes of the file which seems to work pretty well. This could be used instead of mime-type or as a fallback perhaps:

public function detectArchiveType($filePath): string|bool
{
    $handle = fopen($filePath, 'rb');
    if (! $handle) {
        return 'Cannot open file';
    }

    $bytes = fread($handle, 16); // Read the first 16 bytes
    fclose($handle);

    $hexBytes = bin2hex($bytes);

    // Check for PAR2
    if (strpos($bytes, "PAR2\0PKT") === 0) {
        return 'PAR2';
    }

    // Combined regex for ZIP formats
    if (preg_match('/504b0304|504b0708/', $hexBytes)) {
        return 'ZIP';
    }

    // Check for RAR, including version 5
    if (preg_match('/526172211a07(0100)?/', $hexBytes, $matches)) {
        return isset($matches[1]) ? 'RAR5' : 'RAR';
    }

    // Check for TAR
    if (strpos($hexBytes, '7573746172') !== false) {
        return 'TAR';
    }

    // Check for 7z
    if (strpos($hexBytes, '377abcaf271c') !== false) {
        return '7z';
    }

    // Check for gzip
    if (substr($hexBytes, 0, 4) == '1f8b') {
        return 'GZIP';
    }

    // Check for bzip2
    if (substr($hexBytes, 0, 6) == '425a68') {
        return 'BZIP2';
    }

    // Check for SFV (simple heuristic). This is not a reliable method to detect SFV files.
    if (preg_match('/^;.*\r?\n;.*\r?\n[\w.-]+\s+[A-Fa-f0-9]{8}\r?\n/', $bytes)) {
        return 'SFV';
    }

    return false;
}

This might work more reliably. However there are more MIME Types for RAR:

  • application/vnd.rar
  • application/x-rar-compressed

@Fossil01
Copy link
Author

Fossil01 commented Mar 10, 2024

Edited my above comment with why it fails to check mime type etc.

By the way. p7zip is old and not updated, best would be to use the official binaries from 7-zip.org 7zz

I removed the mime-type check to see if it worked then but looks like binary override also does not work:

sh: 1: 7z: not found

@ewilan-riviere ewilan-riviere mentioned this issue Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants