Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression Improvements #2

Open
6 of 9 tasks
solonovamax opened this issue May 1, 2024 · 8 comments
Open
6 of 9 tasks

Compression Improvements #2

solonovamax opened this issue May 1, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@solonovamax
Copy link

solonovamax commented May 1, 2024

Hi,
just found your project and this looks rather interesting.

Here's a couple of improvements/changes that would be amazing if they could be added:

  • Explode nested jars and recursively repack them. however, in those nested jars only ever STORE files. then add DEFLATE compression at the final level. Separate issue: Jar-in-jar minifier #3
  • Sort files by their extension. When files of the same type are beside each other in a nested jar file, it will be able to be better compressed. (based on some rudementary testing, this combined with the previous item, can drastically improve the compression at times) Separate issue: Jar-in-jar minifier #3
  • Ability to specify oxipng level
  • Recompress nbt files (they are just gzip compressed, so uncompress then attempt to recompress at level 9), then STORE them. Separate issue: NBT minifier #4
  • Don't include zip entries for the directories. fun fact: you don't need zip entries for directories!

Based on a quick little command I ran, I recorded the frequencies of different file types present in jar files.
The command used was

mkdir ./tmp
# copy a bunch of jars into this temporary directory
for i in *.jar; do
    unzd "$i"
done

find . -type f \( -iname '*.jar' -o -iname '*.zip' \) -print0 \
    | xargs -0 -n1 unzip -qqql \
    | perl -0777 -C -pe 's/.*?\/?(.*\.(.*))?/\2/g' \
    | sed '/^$/d' \
    | sort \
    | uniq -c \
    | sort -n \
    | awk '{s+=$1; print $0} END {print s}'
# note: the last line is the total count of all files. this isn't by any means perfect, but whatever.

Here are the results from that:

% of files in jar file extension
51.13% .class
31.34% .json
10.01% .png
2.5% .nbt
1.69% .ogg
0.51% .MF
0.4% .jar
0.28% .mcmeta
0.23% .xdelta (I have no clue what this file format is)
0.23% .md5
0.21% .properties
0.14% .at
0.1% .txt
0.1% .accesswidener
0.09% .xml
0.07% .md
0.06% .js

Based on this, I think it might be reasonable to consider adding optimization processes for the following files:

  • .ogg files. These can possibly be optimized using ffmpeg. (you could possibly consider decreasing the quality to increase compression. a cli arg for this would be nice.) (there seem to be a couple of crates that provide ffmpeg support, so those could be considered)
  • .nbt files. They are just gzip compressed binary data, so uncompress then recompress with level 9, then STORE them.
  • .properties files. Blank lines can be stripped from these. (it doesn't make up a large amount of the files, but it's an easy addition)
  • .accesswidener files. Blank lines can be stripped from these, as well as any lines starting with a hash (#) are comments. (it doesn't make up a large amount of the files, but it's an easy addition)

It's probably not worth it to consider additional compressors for files not in that table as they appear so infrequently it just won't make much of a difference.

Do note however, that this is not based off of the size of the files in the jar, rather just their count. may make something basic to calculate this using the size later.

@solonovamax
Copy link
Author

To elaborate on

  • Explode nested jars and recursively repack them. however, in those nested jars only ever STORE files. then add DEFLATE compression at the final level.
  • Sort files by their extension. When files of the same type are beside each other in a nested jar file, it will be able to be better compressed. (based on some rudementary testing, this combined with the previous item, can drastically improve the compression at times)

here are a few jars which had significant gains from this (using Detonater) (this was after having been processed with mc-repack):

Mod Version % saved Size after mc-repack Size after detonater
Fabric Language Kotlin 1.10.10+kotlin.1.9.10 34.83% 6.40M 4.17M
Farsight 1.20.1-4.1 33.84% 361.15K 238.92K
Boat Item View 1.20.1-0.0.5 33.57% 997.18K 662.36K
Quartz Elevator 2.2.5+1.20 32.78% 1019.87K 685.46K
Appleskin mc1.20.1-2.5.1 32.35% 1.02M 699.90K
Cardinal Components API 5.2.2 31.71% 215.76K 147.34K
Dawn 5.0.0 29.11% 1.34M 966.34K
Fabric API 0.90.7+1.20.1 28.13% 1.96M 1.41M
BeaconOverhaul 1.8.4+1.20 27.11% 355.19K 258.89K
Blur 3.1.0 26.30% 155.12K 114.32K
Highlight 1.20-2.0.1 25.92% 274.70K 203.48K
Graves 3.0.0+1.20.1 25.80% 1.76M 1.31M
Create 0.5.1-d+mc1.20.1 (Prominent fork) 9.13% 22.01M 20.00M
LibZ 1.0.2 21.56% 2.00M 1.57M
Industrial Revolution 1.16.5-BETA 7.73% 4.65M 4.29M
CC Tweaked 1.20.1-fabric-1.108.3 10.85% 3.14M 2.80M
Tom's Simple Storage 1.20-1.6.5 23.58% 1.38M 1.06M
Zenith Attributes 0.0.6 24.44% 6.61M 4.99M

so, there are definitely significant savings to be had here by doing this.
and, I didn't even run this for all my mods, just a smaller subset, as detonater is kinda slow lol

the mods that primarily benefit from this change are the ones which bundle many libraries in them.

@szeweq szeweq added the enhancement New feature or request label May 2, 2024
@szeweq
Copy link
Owner

szeweq commented May 2, 2024

Thanks for the details! Your proposal will definitely help improving mod(pack) sizes.

I will work on optional directory skips in ZIP/JAR files. This removal will already make the mods smaller. This would make things simpler because the library (mc-repack-core) also works with a file system, where directories must be created before saving a minified file.

The new file types you mentioned can be easily added for minification or recompression. I was testing mainly on Forge mods so I may overlook file types like .at, .accesswidener or .md. There is a lot of mods to check and I must determine the most used file formats to be supported.

The jar-in-jar repacking situation is very tricky. It may need a completely new minifier with customizible options. I will try to make it possible.

New separate issues will be made. Thanks again for using MC-Repack!

This was referenced May 2, 2024
@solonovamax
Copy link
Author

solonovamax commented May 3, 2024

The new file types you mentioned can be easily added for minification or recompression. I was testing mainly on Forge mods so I may overlook file types like .at, .accesswidener or .md. There is a lot of mods to check and I must determine the most used file formats to be supported.

the mods I used to generate that list were mainly fabric mods, as I just used the modpacks on my laptop. I have a few more instances on my desktop, so I can re-run it on there when I get back home.


also, I don't think that .jpeg/.jpg, .webp, or .avif files are particularly common in mods, as they mostly stick to pngs, however you could possibly add a compressor for those, doing smth similar to what rimage is doing. Assuming it's not a large amount of work and doesn't bloat the binary too much. tbh, if it's anything more than like 30 mins or anything past like 500kb-1mb, it's not worth it imo lol (going from 1.9mb to like 5mb for some compressors that will be rarely used isn't particularly worth it, tbh. just useless bloat that almost never gets run)


The jar-in-jar repacking situation is very tricky. It may need a completely new minifier with customizible options. I will try to make it possible.

you could do smth similar to the following:
have a method called for jars called smth like repack_jar, which takes an optional parameter, deflate, defaulting to true. then, if you're already in a jar, just invoke it with false lol. that's similar to detonater did (when not in a jar, when in a jar), though unsure how feasible this is given your code structure. may need to pass some kind of a context around indicating if you're in a jar or in an fs.

but for the sorting, you could just sort all of them.


Also, I'm interested in using this as part of a gradle plugin I'm making (I was originally planning to do all these things myself, and was looking at making an oxipng kotlin/java jni wrapper, but then I found this and this is honestly basically what I need)
would you be willing to work with me to make a jni wrapper for this (not asking you to do it all on your own lol, bc that's cringe), so that instead of having to bundle the cli & invoke the cli, I could just use it via jni.

@solonovamax
Copy link
Author

yo, unsure if you saw this or not bc it was a weekend, so imma bump it lol

@szeweq
Copy link
Owner

szeweq commented May 8, 2024

Sorry for late response.

I would rather not make a library for a specific environment. I will keep maintaining CLI and core library. There is another project I am working on and mc-repack may not receive a release for a couple of weeks.

@solonovamax
Copy link
Author

Sorry for late response.

I would rather not make a library for a specific environment. I will keep maintaining CLI and core library. There is another project I am working on and mc-repack may not receive a release for a couple of weeks.

wdym a library for a specific environment?

also, no worries, take your time 👍

@szeweq
Copy link
Owner

szeweq commented May 8, 2024

I meant JNI "glue" for JVM.

@solonovamax
Copy link
Author

I meant JNI "glue" for JVM.

yeah, I just mainly want to make a jni interface bc it's a lot more convenient to have this applied as a gradle plugin rather than a cli library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants