Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try this to increase resolution w/o finetuning (Instruction) #62

Open
KyunHwan opened this issue Mar 28, 2024 · 4 comments
Open

Try this to increase resolution w/o finetuning (Instruction) #62

KyunHwan opened this issue Mar 28, 2024 · 4 comments

Comments

@KyunHwan
Copy link

KyunHwan commented Mar 28, 2024

Using the default setup, large input images were being resized to 512 x 384 (using DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth). But I wanted results with higher resolution (1024 x 768). So I followed "Extending Context Window of Large Language Models via Position Interpolation" by Meta and changed only the default image_size value of 512 to 1024 inside demo.py and multiplied the variable t inside get_cos_sin method of RoPE2D of croco/models/pos_embed.py by (512/1024). This gave pretty good results, though finetuning is most likely required for better results.

@KyunHwan
Copy link
Author

KyunHwan commented Mar 28, 2024

Another resolution (1536 x 1152) was tested with multiplication factor of (512/1536) for t with appropriate results. So far this works well for objects that have "good number" of features.

@KyunHwan KyunHwan changed the title How to finetune w/ resolution increase (Instruction) Try this to increase resolution w/o finetuning (Instruction) Mar 28, 2024
@hdzys
Copy link

hdzys commented Apr 1, 2024

How to modify the parameter t?

def get_cos_sin(self, D, seq_len, device, dtype):
if (D,seq_len,device,dtype) not in self.cache:
inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D))
t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype)
freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype)
freqs = torch.cat((freqs, freqs), dim=-1)
cos = freqs.cos() # (Seq, Dim)
sin = freqs.sin()
self.cache[D,seq_len,device,dtype] = (cos,sin)
return self.cache[D,seq_len,device,dtype]

@KyunHwan
Copy link
Author

KyunHwan commented Apr 1, 2024

How to modify the parameter t?

def get_cos_sin(self, D, seq_len, device, dtype): if (D,seq_len,device,dtype) not in self.cache: inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D)) t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype) freqs = torch.cat((freqs, freqs), dim=-1) cos = freqs.cos() # (Seq, Dim) sin = freqs.sin() self.cache[D,seq_len,device,dtype] = (cos,sin) return self.cache[D,seq_len,device,dtype]

if you're going from 512 to 1024, you would do:
t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) * (512/1024)

@hdzys
Copy link

hdzys commented Apr 2, 2024

@KyunHwan thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants