Skip to content

Conversation

WebReflection
Copy link
Contributor

@WebReflection WebReflection commented Feb 23, 2024

In this Discrod conversation a user asked to be able to provide a single .zip file (or .tar.gz or other formats) to be easily extracted into a generic destination folder.

As pyodide provides a way to do so we've exchanged a bit of ideas so that explicitly targeting that .zip file as ./destination/*, where /* as end of the url is the trigger and the recognized archive extension completes it, I've explored that possibility and to me this looks like a no-brainer.

About MicroPython (or others)

This feature requires an interpreter able to deal with these extensions in a way that complex structures with both folders and files, not just compressed single entry, can be represented in the file system so that right now (in this MR) this is available on Pyodide only.

@ntoll @fpliger @JeffersGlass thoughts? I'd be happy to move this first bit forward, if the contract is clear/approved:

[files]
"./stuff.zip" = "./stuff/*"

That's it, that's the contract, it's explicit in the config and it understands via both extension and /* ending of the target, the file should be extracted.

Waiting for comments or thumbs up before moving forward.

@WebReflection WebReflection force-pushed the unzip branch 4 times, most recently from 1a99647 to 6f283ec Compare February 23, 2024 16:20
@ntoll
Copy link
Member

ntoll commented Feb 23, 2024

This would be really helpful. Would be awesome if it worked on MicroPython too (asking for a friend). 😉

@WebReflection
Copy link
Contributor Author

WebReflection commented Feb 26, 2024

@ntoll based on lazy loaded zip.js library, it is now possible to un-archive also in MicroPython.

My only thoughts around this are:

  1. I should change this issue topic - done
  2. I don't know if it's worth it to disable other formats in Pyodide and tell everyone we only support zip as in Pyodide everything is pretty straight forward
  3. I don't know if it makes sense for me to find a better way to lazy load the module because on PyScript we provide also 3rd party packages as part of the dist and everything can work offline but in here we have both TOML and zip.js as foreign dependencies and that might be an issue for offline, although in PyScript it will also be an issue if the code is MicroPython and a zip file is required for the offline test ...

My quick answers to myself:

  1. definitively - done
  2. I don't think so, it's rather a matter of documentation
  3. I think we cna improve later on and leave it lazy loaded (or simple) for the time being ... a quick workaround via import maps would also save the day, eventually only when/if needed ... thoughts?

@WebReflection WebReflection changed the title Pyodide unzip/untar abilities Pyodide unzip/untar abilities + MicroPython unzip Feb 26, 2024
@WebReflection WebReflection force-pushed the unzip branch 3 times, most recently from 2c8472c to 938e378 Compare February 26, 2024 13:38
@ntoll
Copy link
Member

ntoll commented Feb 27, 2024

@WebReflection this is great. When I mentioned this feature to folks working on Invent they simply said, "oh yes please".

Given MicroPython has mip (IIRC) - sort of micropython "pip" module - and I believe it understands wheels (which are basically zip files), I wonder if there are already compression capabilities built into MicroPython. Best ask @dpgeorge when next we speak. 😉

I agree with your concerns about "offline" or isolated dependencies. If stuff is built into MicroPython I think we may be in a better place WRT this aspect of the work.

@WebReflection
Copy link
Contributor Author

About offline we still have the TOML parser that is lazy and requires a valid URL to point at ... I am thinking more and more to improve the dist packaging of this module exactly the same we did with PyScript as that would work out of the box for anything 3rd party we'd like to lazy load. This shouldn't take a lot of effort and things should be easier to implement in the near to next future.

@WebReflection
Copy link
Contributor Author

Apparently we have already tar.gz ability via MicroPython out of the box so what's missing here is the way to support natively tar.gz files too.

/cc @ntoll @dpgeorge

@WebReflection
Copy link
Contributor Author

after a discussion we said that .zip and .tar.gz are good if normalized across interpreters

@dpgeorge
Copy link

We do actually have support for reading uncompressed .zip files in MicroPython (although not currently exposed in any user Python module). But I guess the reason for using zip is to get compression and save bandwidth downloading the file?

@WebReflection
Copy link
Contributor Author

@dpgeorge

I guess the reason for using zip is to get compression and save bandwidth downloading the file?

it's rather a way to simplify both hosting and config instructions:

<!-- before -->
<py-config>
[files]
"./a/b.py" = "./dest/b.py"
"./a/b/c.py" = "./dest/b/c.py"
"./a/d.json" = "./dest/d.json"
"./a/b/x/y/z.py" = "./dest/b/x/y/z.py"
</py-config>

<!-- after -->
<py-config>
[files]
"./thing.zip" = "./dest/*"
</py-config>

We do actually have support for reading uncompressed .zip files

I believe we don't want to confuse users about what is a valid zip and what isn't for MicroPython so unless the .zip implementation is capable of understanding default zip functionality we're likely better off using the 3rd party library.

@WebReflection WebReflection force-pushed the unzip branch 4 times, most recently from 6c9e40b to a56d6e1 Compare February 28, 2024 15:57
@WebReflection
Copy link
Contributor Author

WebReflection commented Feb 28, 2024

@dpgeorge I am really having hard time making utarfile working ... I import it correctly and everything is fine, but I always get errors ... I've tried to create the file with default tar.gz and crickets, it breaks on:

micropython.mjs:5532 MicroPython exception: Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/lib/utarfile/__init__.py", line 128, in __next__
  File "/lib/utarfile/__init__.py", line 118, in next
UnicodeError: 

then I've used just tar -cf and also crickets ... is there any way to untar files with gz?

# the "./_.tar.gz" file is a valid tar file Pyodide can handle with ease
# the extractDir is `./packages/` in my smoke test ... nothing works
import os
import utarfile
tar = utarfile.TarFile("./_.tar.gz")

# this loop breaks right away no matter what's in it
for f in tar:
    name = f'${extractDir}{f.name[2:]}'
    if f.type == utarfile.DIRTYPE:
        if f.name != './':
            os.mkdir(name.strip('/'))
    else:
        sub_file = tar.extractfile(f)
        with open(name, "wb") as dest:
            copyfileobj(sub_file, dest)
            dest.close()
tar.close()
os.remove('./_.tar.gz')

@JeffersGlass
Copy link

_.tar.gz is a tat file that’s been compressed, but I think utarfile only deals with the tar part. uzip seems to handle the zip compression part, but maybe not the file headers? Either way, probably needs another library to handle the decompression.

@dpgeorge
Copy link

Indeed you'll need to do the decompression explicitly. Eg:

import sys
import gzip
import tarfile

file = tarfile.TarFile(fileobj=gzip.GzipFile(fileobj=open(sys.argv[1], "rb")))
for elem in file:
    print(elem.name)

That works in both CPython and MicroPython (although in older version of MicroPython you need to use utarfile instead of tarfile).

@ntoll
Copy link
Member

ntoll commented Feb 29, 2024

Yeah, tar = "tape archive" format. No compression involved in tar (that's why we need the gzip bit).

@WebReflection
Copy link
Contributor Author

awesome folks, I'll give it a try in some boring free time while on vacation 🙏

@WebReflection
Copy link
Contributor Author

@dpgeorge tarfile not available out of the box but either tarfile or utarfile still give me errors:

tar = utarfile.TarFile(fileobj=gzip.GzipFile(fileobj=open("./_.tar.gz","rb")))
for f in tar:
    print(f.name)

I think luike there's eomthing obvious I am missing ...

error ValueError: invalid syntax for integer with base 8: '00000000437\x00'

@dpgeorge
Copy link

dpgeorge commented Mar 1, 2024

ValueError: invalid syntax for integer with base 8: '00000000437\x00'

You are using an old version of utarfile that has a bug!

Please try with the latest MicroPython wasm from here: micropython/micropython#13583

@WebReflection
Copy link
Contributor Author

@dpgeorge tried latest (literally 2 minutes ago and build) ... the error is now different but still erroring:

ValueError: invalid syntax for integer with base 8: '00000001060\x00'

I think I might wait for your latest PR to land and then take it from there with this untar-ability ... there is a chance I am doing something wrong but I want to be sure latest is latest.

@dpgeorge
Copy link

dpgeorge commented Mar 6, 2024

@WebReflection are you using import tarfile (and not import utarfile)?

What version does it give you when you run import sys; sys.version?

@WebReflection
Copy link
Contributor Author

@dpgeorge

3.4.0; MicroPython v1.20.0-1070.gb0b1e95a6 on 2024-03-05

is that wrong? 🤔 I just rebased and run make within the ports/webassembly as I did before and most things seem to be aligned with current code

@dpgeorge
Copy link

dpgeorge commented Mar 6, 2024

is that wrong?

That's correct. But now that I try it out myself, I realise that I have not frozen tarfile into the build! Sorry, that's my mistake, which is now fixed.

Please try the latest build on that PR. You should get something like this:

MicroPython v1.23.0-preview.214.gf7debc1a0 on 2024-03-07; JS with Emscripten
Type "help()" for more information.
>>> import sys
>>> sys.version
'3.4.0; MicroPython v1.23.0-preview.214.gf7debc1a0 on 2024-03-07'
>>> import tarfile
>>> tarfile.__version__
'0.4.1'

(You may get a different version string, but still the hash of f7debc1a0 should be the same. To get the right version string you'll need to do git pull --tags.)

@ntoll
Copy link
Member

ntoll commented Mar 20, 2024

Hi folks,

Things went quiet here. Just wondering what the current status of progress is..? (Asking for a friend 😉 )

@WebReflection
Copy link
Contributor Author

WebReflection commented Mar 20, 2024

@ntoll I am waiting for latest release on npm of MicroPython to finalize this MR (currently broken in there) and more ... if this happen soon with promise fixes in we can release tomorrow with latest greatest including this MR (as long as I manage to make it work on MicroPython) ... otherwise let me know if we should ship this half-backed on MicroPython side, no strong opinion here.

edit reason it's on hold: #84 (comment)

@dpgeorge
Copy link

As far as I'm aware, the latest npm release of MicroPython should work for this untar case.

@WebReflection
Copy link
Contributor Author

WebReflection commented Mar 20, 2024

As far as I'm aware, the latest npm release of MicroPython should work for this untar case.

I think I've fully missed this notification as I was in Boston micropython/micropython#13583 (comment) (actually I was not ... I just missed that!)

I will then try again with latest and see where it goes from there, thank you! (and I'll keep you posted if something ain't right)

@dpgeorge
Copy link

I have just made another release which should also fix the thenable issue: https://www.npmjs.com/package/@micropython/micropython-webassembly-pyscript/v/1.22.0-269

@WebReflection
Copy link
Contributor Author

@dpgeorge we're here ... everything broken now pyscript/pyscript#2001 (comment)

@WebReflection
Copy link
Contributor Author

@dpgeorge never mind, I can workaround that, I have Wasmoon shim that works out of just FS, we're good (but I really would like to have all Emscripten VFS utilities exposed not behind a _xxx field, if possible)!

@dpgeorge
Copy link

The PATH/PATH_FS issue is fixed in the latest version: https://www.npmjs.com/package/@micropython/micropython-webassembly-pyscript/v/1.22.0-272

@WebReflection
Copy link
Contributor Author

@dpgeorge I am almost there but having headaches I don't understand ... so ...

const TMP = './_.tar.gz';
writeFile(fs, TMP, buffer);
interpreter.runPython(`
    import os, gzip, tarfile
    tar = tarfile.TarFile(fileobj=gzip.GzipFile(fileobj=open("${TMP}", "rb")))
    for f in tar:
        name = f"${extractDir}{f.name[2:]}"
        if f.type == tarfile.DIRTYPE:
            if f.name != "./":
                os.mkdir(name.strip("/"))
        else:
            source = tar.extractfile(f)
            with open(name, "wb") as dest:
                dest.write(source.read())
                dest.close()
    tar.close()
    os.remove("${TMP}")
`);

This code works 🥳 ... meaning within that code I can then read every single written file and it's just fine.

On the browser though ...

  <script type="micropython" config="pyodide.toml" worker>
    from polyscript import xworker
    f = open("./package/micropython.html", "r")
    lines = f.readlines()
    xworker.window.document.documentElement.classList.add(
      lines[0] == "<!DOCTYPE html>\n"
    )
  </script>

The open(...) lines produces the following error:

Screenshot from 2024-03-21 15-36-50

Please note everything is fine if I unzip instead but the zip feature is 100% JS related, there's no runPython involved, although it uses exact same FS reference to either mkdir or writeFile ... I can't test with previous MicroPython because the tarfile is not there, I start wondering if there is an issue with operations performed via Python instead of using the JS FFI to read or write things ... please keep in mind if in that script I do instead:

import os
print(os.listdir()) # the package is there
print(os.listdir("package")) # the index.html is there

Do you have any idea what could possibly go wrong? I feel so close yet so far from shipping this tar.gz thing ... I am updating regardless latest effort to move from there, thank you!

@WebReflection
Copy link
Contributor Author

@dpgeorge OK, this is awkward ... I have now MicroPython working with .zip and Pyodide working with .tar.gz but not vice-versa ... it looks like the OS Error 44 (also produced by Pyodide) is about the missing file ... I don't understand why two archives that work in other envs wouldn't work in current env (and vice-versa) but I've no idea what's going on ... do you know if there's any sync thing the Emscripten wants me to do before bothering you further? I've managed to have the easy thing on Pyodide now failing and the harder thing on MicroPython now working (3rd party library) while the easy one that works as I deal with Python code fails as I deal with JS code ... I going a bit bananas around this VFS thing because from os module everything is there but the open fails in unexpected ways all over!

@WebReflection
Copy link
Contributor Author

@dpgeorge I am an idiot ... I am going to fix this ASAP, it was me messing around with the files, apologies for all the pings ... I think I am really close to fix this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants