Add FSQ implementation #74

sekstini · 2023-09-28T20:43:06Z

TODO:

Verify correctness
Add usage example

Notes:

Torch doesn't support uint32 yet, so we use int32. Should be fine.
Part of the offset calculation (in the bound function) is missing. Just took copilot's suggestion for now.
Fixed some grammatical errors (indexes -> indices, and incorrect docstring)

lucidrains · 2023-09-28T21:05:27Z

nice! give it a test drive with the cifar script in the examples folder

@kashif ^

sekstini · 2023-09-28T21:58:05Z

Still not 100% sure this is correct, but initial results seem promising.

 vq :: rec loss: 0.114 | cmt loss: 0.001 | active %: 21.094

fsq :: rec loss: 0.111 | active %: 58.333

Not parameter matched, so losses aren't really representative, but interesting to see the higher codebook usage.

lucidrains · 2023-09-28T22:01:21Z

do you have the training curves for each?

sekstini · 2023-09-28T22:25:21Z

edit: removed the images that were here because the test was broken

sekstini · 2023-09-28T22:27:22Z

Probably isn't representative of FSQ performance in general, but it does seem to be working on some level at least ^^

sekstini · 2023-09-28T22:37:11Z

Ok wait, strike that. Reading the paper a bit more closely, the offsets should be flipped relative to what I put there initially. Doing so nets a much higher active %.

vector_quantize_pytorch/finite_scalar_quantization.py

examples/autoencoder_fsq.py

sekstini · 2023-09-28T23:30:29Z

Okay, this seems fine for an example. Curious to see if it works in more realistic scenarios too.

lucidrains · 2023-09-28T23:32:56Z

@sekstini getting there! we can let @kashif do a review before getting it merged! thanks for beasting through this

fab-jul · 2023-09-29T06:08:45Z

vector_quantize_pytorch/finite_scalar_quantization.py

+    return z + (zhat - z).detach()
+
+
+class FSQ(nn.Module):


Thanks so much for porting this! Do you mind if we link this repo in the next version and our own public code release?

LMK if you are also planning to update the README and I can send some figs.

please do! and i believe @lucidrains and @sekstini will appreciate it

Yup, go ahead 👍

@fab-jul

LMK if you are also planning to update the README and I can send some figs.

That would be great 🙏

fab-jul · 2023-09-29T06:58:21Z

Okay, this seems fine for an example. Curious to see if it works in more realistic scenarios too.

Cool to see the high util out of the box :) I assume this is a fairly shallow AE? Probably the loss gap could be a bit smaller but looks like it's WAI! nice

kashif · 2023-09-29T07:24:33Z

@sekstini can you also kindly add a section in the README and the appropriate bibtex entry as #73 had started?

sekstini · 2023-09-29T07:35:34Z

Okay, this seems fine for an example. Curious to see if it works in more realistic scenarios too.

Cool to see the high util out of the box :) I assume this is a fairly shallow AE? Probably the loss gap could be a bit smaller but looks like it's WAI! nice

Yeah, the test models are both shallow, and tiny at ~10k parameters. Don't expect this to realistically reflect performance at all. This is the FSQ network.

@sekstini can you also kindly add a section in the README and the appropriate bibtex entry as #73 had started?

Will do 👍

fab-jul · 2023-09-29T08:23:08Z

Yeah, the test models are both shallow, and tiny at ~10k parameters. Don't expect this to realistically reflect performance at all. This is the FSQ network.

Makes sense! Yeah one thing with FSQ is that because your codebook is fixed, you need some decent number of layers before and after the quantizer. Usually this is the case (eg the VQ-GANs people train are fairly deep).

Anyway, I think the results you see are what we expect! Good stuff.

For the figures, I revisited your README and it seems it is usually mostly the code, so maybe simply using the cube (PDF here) would be enough or already too much. Your call :)

In the README of our upcoming mini-repo, I also added the table, but again, this might be out of place in your README:

	VQ	FSQ
Quantization	argmin_c \|\| z-c \|\|	round(f(z))
Gradients	Straight Through Estimation (STE)	STE
Auxiliary Losses	Commitment, codebook, entropy loss, ...	N/A
Tricks	EMA on codebook, codebook splitting, projections, ...	N/A
Parameters	Codebook	N/A

source:

|                  | VQ | FSQ |
|------------------|----|-----|
| Quantization     | argmin_c \|\| z-c \|\| | round(f(z)) |
| Gradients        | Straight Through Estimation (STE) | STE |
| Auxiliary Losses | Commitment, codebook, entropy loss, ... | N/A |
| Tricks           | EMA on codebook, codebook splitting, projections, ...| N/A |
| Parameters       | Codebook | N/A |

lucidrains · 2023-09-29T17:11:09Z

lgtm! releasing

thank you @sekstini !

sekstini · 2023-09-29T17:14:57Z

@lucidrains Oh, apparently I pointed this as the fsq branch, so you might need to merge it into master

lucidrains · 2023-09-29T17:17:57Z

@sekstini yup, no problem, thank you! 🙏

fab-jul · 2023-09-29T18:41:08Z

Thanks everyone! I'll add a link to this repo in our next revision

Added a link to the official README now.

dribnet · 2023-10-01T10:26:11Z

Appreciate this quick port and example code. Just thought I would add that checking reconstructions from the FashionMNIST example appear reasonable

However a bit of surprise noticing it fails if switching back to MNIST

Changing levels/seeds didn't help, so perhaps as @fab-jul mentioned it's just a case of needing more layers before/after quantizer for some cases.

sekstini · 2023-10-01T11:21:51Z

@dribnet Interesting. Not sure why we would see this particular failure mode here, but I made a toy example of residual vector quantization where it's working: https://gist.github.com/sekstini/7f089f71d4b975ec8bde37d878b514d0.

lucidrains · 2023-10-05T18:11:10Z

@dribnet adding more layers seems testable

lucidrains · 2023-10-10T17:12:56Z

LFQ https://arxiv.org/abs/2310.05737 looks simliar?

lucidrains · 2023-10-10T18:59:59Z

nevermind, it is slightly different, will be adding

fab-jul · 2023-10-10T19:12:44Z

nevermind, it is slightly different, will be adding

It‘s FSQ with levels = 2 plus an entropy maximization loss, IIUC.

lucidrains · 2023-10-10T19:39:54Z

yes, FSQ generalizes it, save for the entropy loss. no ablation of the entropy loss, so unsure how necessary it is, but i'll add it. perhaps that will be their contribution

lucidrains · 2023-10-10T19:40:20Z

@fab-jul congrats either way! this could be a winner

fab-jul · 2023-10-10T20:01:45Z

@lucidrains thanks!

for us, levels = 2 was suboptimal but I can see how an entropy maximization would fix that. I wonder if that also improves FSQ (although ofc the goal of our paper was to get rid of all aux losses haha)

sekstini · 2023-10-10T20:32:11Z

@sekstini you should pair up with an audio researcher and think hard about this (residual fsq), maybe do a tweetstorm or blogpost if you see anything but do not have bandwidth to write a paper. i'm sure many will be thinking along these lines, looking at the new paper

Funny you mention it, as I'm experimenting with a vocoder based on this idea in this very moment ^^
I should definitely improve my tweeting game though.

lucidrains · 2023-10-15T16:33:09Z

@sekstini are you seeing good results?

thinking about a soundstream variation with multi-headed LFQ

sekstini · 2023-10-16T08:25:36Z

@lucidrains

@sekstini are you seeing good results?

Not really, but definitely not because of FSQ.

I have exclusively been toying with various parallel encoding schemes, which I'm guessing are difficult for the model to learn, and I suspect residual quantization would work a lot better.

lucidrains · 2023-10-16T16:56:42Z

@sekstini ah got it, thanks!

yea, to do residual, i think the codes will need to be scaled down an order of magnitude, or inputs scaled up (cube within cubes, if you can imagine it), but i haven't worked it out. it is probably a 3 month research project. somebody will def end up trying it..

lucidrains · 2023-10-16T17:02:54Z

@sekstini yea, if LFQ pans out over at magvit2, i'll do some improvisation here and maybe someone can do the hard experimental work.

lucidrains · 2023-10-20T20:56:49Z

@sekstini almost done! #80

sekstini · 2023-10-21T00:59:56Z

@sekstini almost done! #80

Neat! I made some decent progress on my vocoder with "parallel FSQ", but I'd be interested in swapping this in to compare performance. Feel free to tag me when it's done.

lucidrains · 2023-10-22T15:14:19Z

ok it is done, integrated in soundstream over here

lucidrains · 2023-11-03T16:29:32Z

may be of interest! lucidrains/magvit2-pytorch#4

mueller-franzes · 2024-01-18T14:14:05Z

Hi,
Thanks for the implementation!
@sekstini or @fab-jul Could you perhaps briefly explain what the purpose of the "shift/offset" is?
The reason why I am asking: For levels=2 "shift" becomes infinite. Should it be (1 + eps) instead of (1-eps)?

sekstini · 2024-01-18T14:46:58Z

@mueller-franzes You may find this comment by Fabian interesting.

Should it be (1 + eps) instead of (1-eps)?

Makes sense to me. I copied it directly from the paper, but at the time there was a tan instead of atanh there, so I didn't notice any issues while testing levels = 2.

As a side note, you may want to check out LFQ if you're interested in the levels = 2 case in particular.

lucidrains · 2024-01-18T15:00:54Z

@mueller-franzes @sekstini maybe we should just enforce odd levels for now for FSQ until a correction to the paper comes out?

lucidrains · 2024-01-18T15:01:46Z

and yea, agreed with checking out LFQ. so many groups seeing success with it

sekstini · 2024-01-18T15:05:47Z

@mueller-franzes @sekstini maybe we should just enforce odd levels for now for FSQ until a correction to the paper comes out?

Other than the asymmetry being weird, levels > 2 seems fine (actually most of my code has been using even values).

I think switching to (1 + eps) or enforcing levels > 2 makes sense.

mueller-franzes · 2024-01-18T15:08:15Z

Thank you both for the super quick response! That's a good tip, I'll have a look at LFQ next.

lucidrains · 2024-01-18T15:13:44Z

@sekstini oh you are right, it is a levels == 2 problem

ok let's just go with 1 + eps! thank you both

Add FSQ implementation

6b1ca27

sekstini added 5 commits September 29, 2023 00:03

Make it importable

d373463

Register buffers and add forward method

79aea46

Add FSQ Autoencoder example

01d9a1f

Remove print

1245c29

That one too...

e7549b2

Add offset to even instead of odd

a832fc9

lucidrains reviewed Sep 28, 2023

View reviewed changes

vector_quantize_pytorch/finite_scalar_quantization.py Outdated Show resolved Hide resolved

Return zhat in forward

c989d30

sekstini commented Sep 28, 2023

View reviewed changes

vector_quantize_pytorch/finite_scalar_quantization.py Show resolved Hide resolved

sekstini commented Sep 28, 2023

View reviewed changes

examples/autoencoder_fsq.py Outdated Show resolved Hide resolved

Fix gradient flow and reduce param count on FSQ model

d403248

fab-jul reviewed Sep 29, 2023

View reviewed changes

kashif mentioned this pull request Sep 29, 2023

[feature] Finite Scalar Quantization: VQ-VAE Made Simple #72

Closed

sekstini marked this pull request as ready for review September 29, 2023 07:20

sekstini added 3 commits September 29, 2023 17:32

Add dim and n_codes properties

0e66f53

Bump version to 1.8.0

1f68476

Add FSQ entry to README

64ce9e9

lucidrains merged commit 0cce037 into lucidrains:fsq Sep 29, 2023

Add FSQ implementation #74

Add FSQ implementation #74

Conversation

sekstini commented Sep 28, 2023 • edited by lucidrains Loading

lucidrains commented Sep 28, 2023

sekstini commented Sep 28, 2023

lucidrains commented Sep 28, 2023

sekstini commented Sep 28, 2023 • edited Loading

sekstini commented Sep 28, 2023 • edited Loading

sekstini commented Sep 28, 2023

sekstini commented Sep 28, 2023 • edited Loading

lucidrains commented Sep 28, 2023

fab-jul Sep 29, 2023

Choose a reason for hiding this comment

fab-jul Sep 29, 2023

Choose a reason for hiding this comment

kashif Sep 29, 2023

Choose a reason for hiding this comment

sekstini Sep 29, 2023

Choose a reason for hiding this comment

sekstini Sep 29, 2023

Choose a reason for hiding this comment

fab-jul commented Sep 29, 2023 • edited Loading

kashif commented Sep 29, 2023

sekstini commented Sep 29, 2023 • edited Loading

fab-jul commented Sep 29, 2023

lucidrains commented Sep 29, 2023

sekstini commented Sep 29, 2023

lucidrains commented Sep 29, 2023

fab-jul commented Sep 29, 2023 • edited Loading

dribnet commented Oct 1, 2023 • edited Loading

sekstini commented Oct 1, 2023 • edited Loading

lucidrains commented Oct 5, 2023

lucidrains commented Oct 10, 2023

lucidrains commented Oct 10, 2023

fab-jul commented Oct 10, 2023

lucidrains commented Oct 10, 2023

lucidrains commented Oct 10, 2023

fab-jul commented Oct 10, 2023

sekstini commented Oct 10, 2023

lucidrains commented Oct 15, 2023

sekstini commented Oct 16, 2023

lucidrains commented Oct 16, 2023 • edited Loading

lucidrains commented Oct 16, 2023

lucidrains commented Oct 20, 2023

sekstini commented Oct 21, 2023

lucidrains commented Oct 22, 2023

lucidrains commented Nov 3, 2023

mueller-franzes commented Jan 18, 2024

sekstini commented Jan 18, 2024

lucidrains commented Jan 18, 2024

lucidrains commented Jan 18, 2024

sekstini commented Jan 18, 2024

mueller-franzes commented Jan 18, 2024

lucidrains commented Jan 18, 2024

sekstini commented Sep 28, 2023 •

edited by lucidrains

Loading

sekstini commented Sep 28, 2023 •

edited

Loading

sekstini commented Sep 28, 2023 •

edited

Loading

sekstini commented Sep 28, 2023 •

edited

Loading

fab-jul commented Sep 29, 2023 •

edited

Loading

sekstini commented Sep 29, 2023 •

edited

Loading

fab-jul commented Sep 29, 2023 •

edited

Loading

dribnet commented Oct 1, 2023 •

edited

Loading

sekstini commented Oct 1, 2023 •

edited

Loading

lucidrains commented Oct 16, 2023 •

edited

Loading