Try this to increase resolution w/o finetuning (Instruction) #62

KyunHwan · 2024-03-28T09:35:56Z

Using the default setup, large input images were being resized to 512 x 384 (using DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth). But I wanted results with higher resolution (1024 x 768). So I followed "Extending Context Window of Large Language Models via Position Interpolation" by Meta and changed only the default image_size value of 512 to 1024 inside demo.py and multiplied the variable t inside get_cos_sin method of RoPE2D of croco/models/pos_embed.py by (512/1024). This gave pretty good results, though finetuning is most likely required for better results.

KyunHwan · 2024-03-28T09:37:23Z

Another resolution (1536 x 1152) was tested with multiplication factor of (512/1536) for t with appropriate results. So far this works well for objects that have "good number" of features.

hdzys · 2024-04-01T11:24:29Z

How to modify the parameter t？

def get_cos_sin(self, D, seq_len, device, dtype):
if (D,seq_len,device,dtype) not in self.cache:
inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D))
t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype)
freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype)
freqs = torch.cat((freqs, freqs), dim=-1)
cos = freqs.cos() # (Seq, Dim)
sin = freqs.sin()
self.cache[D,seq_len,device,dtype] = (cos,sin)
return self.cache[D,seq_len,device,dtype]

KyunHwan · 2024-04-01T11:41:29Z

How to modify the parameter t？

def get_cos_sin(self, D, seq_len, device, dtype): if (D,seq_len,device,dtype) not in self.cache: inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D)) t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype) freqs = torch.cat((freqs, freqs), dim=-1) cos = freqs.cos() # (Seq, Dim) sin = freqs.sin() self.cache[D,seq_len,device,dtype] = (cos,sin) return self.cache[D,seq_len,device,dtype]

if you're going from 512 to 1024, you would do:
t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) * (512/1024)

hdzys · 2024-04-02T01:08:41Z

@KyunHwan thanks

KyunHwan changed the title ~~How to finetune w/ resolution increase (Instruction)~~ Try this to increase resolution w/o finetuning (Instruction) Mar 28, 2024

rwn17 mentioned this issue Sep 3, 2024

[QUESTION] of inference with the image shape of 256 naver/mast3r#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try this to increase resolution w/o finetuning (Instruction) #62

Try this to increase resolution w/o finetuning (Instruction) #62

KyunHwan commented Mar 28, 2024 •

edited

Loading

KyunHwan commented Mar 28, 2024 •

edited

Loading

hdzys commented Apr 1, 2024

KyunHwan commented Apr 1, 2024

hdzys commented Apr 2, 2024

Try this to increase resolution w/o finetuning (Instruction) #62

Try this to increase resolution w/o finetuning (Instruction) #62

Comments

KyunHwan commented Mar 28, 2024 • edited Loading

KyunHwan commented Mar 28, 2024 • edited Loading

hdzys commented Apr 1, 2024

KyunHwan commented Apr 1, 2024

hdzys commented Apr 2, 2024

KyunHwan commented Mar 28, 2024 •

edited

Loading

KyunHwan commented Mar 28, 2024 •

edited

Loading