Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early edge predictions #74

Merged
merged 77 commits into from
May 29, 2024
Merged

Conversation

Melirius
Copy link
Collaborator

@Melirius Melirius commented May 8, 2024

Closes #70

DCT coefficients are stored now in transposed raster order that is more suitable for IDCT (one transposition is excluded). Edge coefficients predictions are produced along with edge DCT coefficients. Overall performance gain on my machine (Zen3, x5950) is ~1.5 %.

@mcroomp
Copy link
Collaborator

mcroomp commented May 25, 2024

Merge the last changes and it looks like it's ready to checkin

@mcroomp
Copy link
Collaborator

mcroomp commented May 27, 2024

Hi there, if you get a chance to merge with the master I can approve. Thanks!

@Melirius
Copy link
Collaborator Author

Hi there, if you get a chance to merge with the master I can approve. Thanks!

Today evening then :)

@Melirius Melirius added the enhancement New feature or request label May 27, 2024
@Melirius Melirius requested a review from mcroomp May 27, 2024 19:50
@mcroomp mcroomp merged commit a222605 into microsoft:main May 29, 2024
2 checks passed
Melirius added a commit to Melirius/lepton_jpeg_rust that referenced this pull request Jun 11, 2024
* use bitscan to shortcut zero searching

* fastest so far

* cleaned up code a bit

* more optimizations

* clarified changes

* added comments

* use aligned block as input

* add unroll dependency

* work in progress

* work in progress

* update cargo.lock

* minor fixes

* make envli 32 bit

* update z in 16 increments to avoid extra shifts

* working

* remove bogus change

* clean up envli

* added comments=

* add comments

* improved comments for envli

* update wide library

* precalculate abs value

* Update src/structs/jpeg_write.rs

Co-authored-by: Ivan Siutsou <47280527+Melirius@users.noreply.github.com>
Signed-off-by: Kristof Roomp <kristofr@gmail.com>

* remove unused field in HuffCodes

* Store length with code

* Alternative VLI encoding

* Cosmetics

* Working early prediction - to clean

* Cleared

* Early edge prediction

* Nonzero mask
TODO: move raster update into decode_one_edge

* Start of work on encoder

* WIP: get rid of first transposition in IDCT

* WIP: Nest step

* No transposition in decoder

* Code clear and formatting

* More masks

* Use transposed block all the way in decoding

* Transposed nonzero mask in decoder

* Simplified test

* Working early prediction in encode, to clean

* Code cleared, unused arrays removed

* Shortened min_noise_threshold, more unified encoder/decoder code

* Code clear

* fill raster in at the same time as coordinate

* missing updated cargo.lock since widen_mul was added later

* RUSTy NeighborSummary

* attempt 1 merge

* remove extranious changes

* remove extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* avoid casting i32x8 using bytemuck if not strictly necessary as wide also provides typesafe casts

* Some comments (by code review)

* Comment elaboration

* Typo fix

* Unification of DC predictors calculations

* fixed transpose

* fixed mul

* fix warnings

* incorrect upcast of quantization table value

* update dependencies and use from_u16x8 which is new in the wide crate

* Revert "Merge remote-tracking branch 'MS/idctmul' into fasterjpeg_simd_variation"

This reverts commit f2511d8, reversing
changes made to 438a1a1.

* Shorter FREQ_MAX

* Correct checks of quantization tables
Initial Lepton implementation tests all the DC coefficients on the edges to be nonzero

* Formatting

---------

Signed-off-by: Kristof Roomp <kristofr@gmail.com>
Co-authored-by: Kristof <kristofr@gmail.com>
mcroomp added a commit that referenced this pull request Jul 25, 2024
* use bitscan to shortcut zero searching

* fastest so far

* cleaned up code a bit

* more optimizations

* clarified changes

* added comments

* use aligned block as input

* add unroll dependency

* work in progress

* work in progress

* update cargo.lock

* minor fixes

* make envli 32 bit

* update z in 16 increments to avoid extra shifts

* working

* remove bogus change

* clean up envli

* added comments=

* add comments

* improved comments for envli

* update wide library

* precalculate abs value

* Update src/structs/jpeg_write.rs

Co-authored-by: Ivan Siutsou <47280527+Melirius@users.noreply.github.com>
Signed-off-by: Kristof Roomp <kristofr@gmail.com>

* remove unused field in HuffCodes

* Store length with code

* Alternative VLI encoding

* Cosmetics

* Working early prediction - to clean

* Cleared

* Early edge prediction

* Nonzero mask
TODO: move raster update into decode_one_edge

* Start of work on encoder

* WIP: get rid of first transposition in IDCT

* WIP: Nest step

* No transposition in decoder

* Code clear and formatting

* More masks

* Use transposed block all the way in decoding

* Transposed nonzero mask in decoder

* Simplified test

* Working early prediction in encode, to clean

* Code cleared, unused arrays removed

* Shortened min_noise_threshold, more unified encoder/decoder code

* Code clear

* fill raster in at the same time as coordinate

* missing updated cargo.lock since widen_mul was added later

* RUSTy NeighborSummary

* attempt 1 merge

* remove extranious changes

* remove extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* removing extranious changes

* avoid casting i32x8 using bytemuck if not strictly necessary as wide also provides typesafe casts

* Some comments (by code review)

* Comment elaboration

* Typo fix

* Unification of DC predictors calculations

* fixed transpose

* fixed mul

* fix warnings

* incorrect upcast of quantization table value

* update dependencies and use from_u16x8 which is new in the wide crate

* Revert "Merge remote-tracking branch 'MS/idctmul' into fasterjpeg_simd_variation"

This reverts commit f2511d8, reversing
changes made to 438a1a1.

* Shorter FREQ_MAX

* Correct checks of quantization tables
Initial Lepton implementation tests all the DC coefficients on the edges to be nonzero

* Formatting

---------

Signed-off-by: Kristof Roomp <kristofr@gmail.com>
Co-authored-by: Kristof <kristofr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants