Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow other types of visual SSL when initiating CLIP #5

Closed
Froskekongen opened this issue Feb 20, 2022 · 4 comments
Closed

Allow other types of visual SSL when initiating CLIP #5

Froskekongen opened this issue Feb 20, 2022 · 4 comments

Comments

@Froskekongen
Copy link

Froskekongen commented Feb 20, 2022

In the following code as part of CLIP.__init__

        if use_visual_ssl:
            if visual_ssl_type == 'simsiam':
                ssl_type = SimSiam
            elif visual_ssl_type == 'simclr':
                ssl_type = partial(SimCLR, temperature = simclr_temperature)
            else:
                raise ValueError(f'unknown visual_ssl_type')

            self.visual_ssl = ssl_type(
                self.visual_transformer,
                image_size = visual_image_size,
                hidden_layer = visual_ssl_hidden_layer
            )

the visual self-supervised learning is hardcoded. I would suggest changing this to accept the visual SSL module as an argument when instantiating CLIP to allow flexibility in the same manner as it does for the image encoder and text encoder.

Example:

barlow = BarlowTwins(augmentatation_fns)
clip = CLIP(..., visual_ssl=barlow)
@lucidrains
Copy link
Owner

@Froskekongen Hi Erlend! Took up your suggestion here https://github.com/lucidrains/x-clip/tree/0.2.4#custom-vision-self-supervised-learning-module let me know if that works for you

@lucidrains
Copy link
Owner

How is your experience with Barlow? Does it work?

@Froskekongen
Copy link
Author

Thanks a lot!

BarlowTwins was just an example. Personally, I work with frameworks that are more akin to VICReg (https://arxiv.org/abs/2105.04906) and VIbCreg (https://arxiv.org/abs/2109.00783).

And I am investigating CLIP with other modalities than images and words, with less data.

@lucidrains
Copy link
Owner

@Froskekongen awesome! hope this feature is fruitful for you then! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants