Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you planning on training? #5

Open
zsxkib opened this issue Apr 24, 2024 · 14 comments
Open

Are you planning on training? #5

zsxkib opened this issue Apr 24, 2024 · 14 comments

Comments

@zsxkib
Copy link

zsxkib commented Apr 24, 2024

Lmk if you're planning on training, i could maybe help

@johndpope
Copy link
Owner

johndpope commented Apr 25, 2024

I want the voxceleb2 dataset - doesn’t seem available anymore - torrents are dead. - i did do some data loader on my emote-hack repo - i will wire it up in a few days

in the diffusion transformer architecture- from what I understand- they use patches - and I don’t see those in code (spat out by Claude)

realistically- better to press for voodoo code to be released. This is kinda just academic excercise.

@francqz31
Copy link

Hey, I just found your repo , tell me if those work for you: this should be full voxceleb2 dataset
1-URLs and timestamps:
https://fex.net/s/lmaobde

2-Audio files:
Dev A: Download
Dev B: Download
Dev C: Download
Dev D: Download
Dev E: Download
Dev F: Download
Dev G: Download
Dev H: Download
dev: Concatenated
Test: Download
Download all parts and concatenate the files using the command cat vox2_dev_aac* > vox2_aac.zip.

Video files:
Dev A: Download
Dev B: Download
Dev C: Download
Dev D: Download
Dev E: Download
Dev F: Download
Dev G: Download
Dev H: Download
Dev I: Download
Dev: Concatenated
Test: Download
Download all parts and concatenate the files using the command cat vox2_dev_mp4* > vox2_mp4.zip.

@johndpope
Copy link
Owner

johndpope commented Apr 27, 2024

actually on holiday - away from workstation with cuda - so can't run this

these edits were from chatgpt - nowdays almost exclusively using claude

#         # Generate holistic facial dynamics using the diffusion transformer
#         audio_features = batch['audio']
#         gaze_direction = batch['gaze']
#         head_distance = batch['distance']
#         emotion_offset = batch['emotion']

if you look here - claude spat this out - and it seems more closely aligned to VASA paper. 

https://github.com/johndpope/VASA-1-hack/blob/main/train.py
```python
# # Extract keypoints from the generated dynamics
# kp_s = generated_dynamics[:, :, :3]  # Source keypoints
# kp_d = generated_dynamics[:, :, 3:]  # Driving keypoints

# # Compute the rotation matrices
# Rs = torch.eye(3).unsqueeze(0).repeat(kp_s.shape[0], 1, 1)  # Source rotation matrix
# Rd = torch.eye(3).unsqueeze(0).repeat(kp_d.shape[0], 1, 1)  # Driving rotation matrix

# # Call the MotionFieldEstimator
# deformation, occlusion, occlusion_2 = motion_field_estimator(appearance_volume, kp_s, kp_d, Rs, Rd)

have to plug this back in as context to claude.

self.motion_field_estimator = MotionFieldEstimator(model_scale)

@trithucxx
Copy link

I want the voxceleb2 dataset - doesn’t seem available anymore - torrents are dead. - i did do some data loader on my emote-hack repo - i will wire it up in a few days

in the diffusion transformer architecture- from what I understand- they use patches - and I don’t see those in code (spat out by Claude)

realistically- better to press for voodoo code to be released. This is kinda just academic excercise.

I have been downloaded, still work. Anw, how can I start/run your project bro?

@johndpope
Copy link
Owner

johndpope commented Apr 29, 2024

Hi @trithucxx -

I'm looking at booting up MegaPortrait by upgrading the training for this repo - https://github.com/johndpope/MegaPortrait/
@Kevinfringe had used a couple of directories - but I have some code in the works with decord / mp4s
https://github.com/johndpope/Emote-hack/blob/main/Net.py#L1085

For now - this model Eapp1 needs to be 100% - otherwise everything else isn't going to work.
Or maybe this volumetric can be sourced from other repo?
can this do it? IDK - https://real3dportrait.github.io/

This is the first part of the Appearance Encoder. To generate a 4D tensor of volumetric features vs.
https://github.com/johndpope/MegaPortrait/blob/master/model.py#L82

UPDATE
Im pretty sure we can piggy back off the VOODOO3D paper (code in june)

Screenshot 2024-04-29 at 10 14 18 pm

@trithucxx
Copy link

Hi @trithucxx -

I'm looking at booting up MegaPortrait by upgrading the training for this repo - https://github.com/johndpope/MegaPortrait/ @Kevinfringe had used a couple of directories - but I have some code in the works with decord / mp4s https://github.com/johndpope/Emote-hack/blob/main/Net.py#L1085

For now - this model Eapp1 needs to be 100% - otherwise everything else isn't going to work. Or maybe this volumetric can be sourced from other repo? can this do it? IDK - https://real3dportrait.github.io/

This is the first part of the Appearance Encoder. To generate a 4D tensor of volumetric features vs. https://github.com/johndpope/MegaPortrait/blob/master/model.py#L82

UPDATE Im pretty sure we can piggy back off the VOODOO3D paper (code in june)

Screenshot 2024-04-29 at 10 14 18 pm

I tested real3dportrait, it's seems to be inaccurate and the video take 3h for 2m talking vid completion time (too long). How about torrent you can not download. Hope to see your project run.

@johndpope
Copy link
Owner

johndpope commented May 5, 2024

so a few days ago i was looking at some other code - basically claude thinks there's enough to avoid needing the megaportrait code - specifically the 4D tensor of volumetric features
this supposedly handles it.

self.appearance_extractor = AppearanceFeatureExtractor()

class AppearanceFeatureExtractor(nn.Module):
    # 3D appearance features extractor
    # [N,3,256,256]
    # [N,64,256,256]
    # [N,128,128,128]
    # [N,256,64,64]
    # [N,512,64,64]
    # [N,32,16,64,64]
    def __init__(self, model_scale='standard'):
        super().__init__()
        use_weight_norm = False
        down_seq = [64, 128, 256]
        n_res = 6
        C = 32
        D = 16
        self.in_conv = ConvBlock2D("CNA", 3, down_seq[0], 7, 1, 3, use_weight_norm)
        self.down = nn.Sequential(*[DownBlock2D(down_seq[i], down_seq[i + 1], use_weight_norm) for i in range(len(down_seq) - 1)])
        self.mid_conv = nn.Conv2d(down_seq[-1], C * D, 1, 1, 0)
        self.res = nn.Sequential(*[ResBlock3D(C, use_weight_norm) for _ in range(n_res)])

        self.C, self.D = C, D

    def forward(self, x):
        x = self.in_conv(x)
        x = self.down(x)
        x = self.mid_conv(x)
        N, _, H, W = x.shape
        x = x.view(N, self.C, self.D, H, W)
        x = self.res(x)
        return x

image

@francqz31
Copy link

with all do respect , but i don't actually believe that OPUS or even current llms (gpt-4 turbo-Opus-google's latest thing whatever the name-llama 3 400B) etc can accurately implement machine learning papers , I tried it multiple times and it just misses so many points ,and makes really simple mistakes like as if it doesn't even have a clue what it is writing , good thing Mr john that you document every step . I think your best shot will be with gpt-5 , i think in order to have an advanced llm implement a machine learning paper , you gotta have some kind of agentic thing like devin but with a reasoning of gpt-5 for example , you provide the paper and a similar code to the paper you wanna implement (for example you upload vasa-1 paper and make it fully read Audio2Head code) and then it will start developing off of it , just like professional software engineers ? what do you think Mr John?

@francqz31
Copy link

if gpt-5 can't do that , then good luck having any kind of llm implement any machine learning paper before 2026

@johndpope
Copy link
Owner

johndpope commented May 9, 2024

@francqz31 i agree with most of your thoughts. The world will be different place when gpt5 drops. I’d add - don’t use chatgpt4 / use opus - and if the code it’s spitting out is / or feels off / discard the chat and start a fresh with updates. Eg base code + paper / increment logic / llm goes off on wrong tangent / discard chat / feed it updated code and even given it more context / header files or relevant code form other repos etc.

I completely rebuild megaPortrait codebase -
https://github.com/johndpope/megaPortrait-hack
need to wire up the dataloaders. can't decide on best approach.
johndpope/MegaPortrait-hack#2

UPDATE - I found some loss functions from SamsungLabs in Rome repo

  • I think this is probably close to what MS are mentioning here.
    Screenshot from 2024-05-13 08-22-38

This work at SamsungLabs - would flow on from MegaPortraits.
Screenshot from 2024-05-13 08-24-07

UPDATE
@francqz31 - maybe too early to call it - but I just start training MegaPortrait
https://github.com/johndpope/MegaPortrait-hack

Screenshot from 2024-05-13 23-12-38

@johndpope
Copy link
Owner

johndpope commented May 30, 2024

ok - so took me a month - but i believe i got the dependent paper MegaPortraits implmented.
https://github.com/johndpope/MegaPortrait-hack/tree/main
there's actually going to be a new code upgrade/ with video data from FB dropping in July 24 - https://github.com/neeek2303/EMOPortraits

i am running local training on a couple of videos
johndpope/MegaPortrait-hack#21

the interesting thing with this paper is - there's no keypoints - it's all resnet feature maps with warping.
UPDATE
running some numbers passed chatgpt - 250 seconds / epoch - 200,000 epochs will take like 2 years on 3090. / 2 months on a h100.

UPDATE 2 - some warping code is taking long time - I chop it out for now.

johndpope/MegaPortrait-hack#28

@fenghe12
Copy link

do u still need talking head video dataset?we collected some

@johndpope
Copy link
Owner

hi @fenghe12 -
sorry for late reply - been distracted recreating code for this paper - https://arxiv.org/pdf/2405.07257
https://github.com/johndpope/SPEAK-hack

I would appreciate any help in cross checking code with paper.
I include some test inference code

if you want to share link to videos - happy to grab them.

@johndpope
Copy link
Owner

johndpope commented Jul 28, 2024

this paper by Microsoft - Implicit Motion Function
https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Implicit_Motion_Function_CVPR_2024_paper.pdf

I recreate here
https://github.com/johndpope/IMF

(assume it's all wrong -i had to switch in ResNets as feature extractor (it's not mentioned in paper) yet it seems to be converging)
https://wandb.ai/snoozie/IMF/runs/f9o9vvje?nw=nwusersnoozie

UPDATE - sorry - this needs completely redoing -
https://github.com/johndpope/IMF/tree/v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants