Are you planning on training? #5

zsxkib · 2024-04-24T15:21:07Z

Lmk if you're planning on training, i could maybe help

johndpope · 2024-04-25T00:15:47Z

I want the voxceleb2 dataset - doesn’t seem available anymore - torrents are dead. - i did do some data loader on my emote-hack repo - i will wire it up in a few days

https://github.com/johndpope/Emote-hack

in the diffusion transformer architecture- from what I understand- they use patches - and I don’t see those in code (spat out by Claude)

realistically- better to press for voodoo code to be released. This is kinda just academic excercise.

francqz31 · 2024-04-25T11:55:50Z

Hey, I just found your repo , tell me if those work for you: this should be full voxceleb2 dataset
1-URLs and timestamps:
https://fex.net/s/lmaobde

2-Audio files:
Dev A: Download
Dev B: Download
Dev C: Download
Dev D: Download
Dev E: Download
Dev F: Download
Dev G: Download
Dev H: Download
dev: Concatenated
Test: Download
Download all parts and concatenate the files using the command cat vox2_dev_aac* > vox2_aac.zip.

Video files:
Dev A: Download
Dev B: Download
Dev C: Download
Dev D: Download
Dev E: Download
Dev F: Download
Dev G: Download
Dev H: Download
Dev I: Download
Dev: Concatenated
Test: Download
Download all parts and concatenate the files using the command cat vox2_dev_mp4* > vox2_mp4.zip.

johndpope · 2024-04-27T07:56:52Z

actually on holiday - away from workstation with cuda - so can't run this

but made tiny bit more progress in necessary calls
63c0f44
https://github.com/johndpope/VASA-1-hack/blob/main/train_2.py

these edits were from chatgpt - nowdays almost exclusively using claude

#         # Generate holistic facial dynamics using the diffusion transformer
#         audio_features = batch['audio']
#         gaze_direction = batch['gaze']
#         head_distance = batch['distance']
#         emotion_offset = batch['emotion']

if you look here - claude spat this out - and it seems more closely aligned to VASA paper. 

https://github.com/johndpope/VASA-1-hack/blob/main/train.py
```python
# # Extract keypoints from the generated dynamics
# kp_s = generated_dynamics[:, :, :3]  # Source keypoints
# kp_d = generated_dynamics[:, :, 3:]  # Driving keypoints

# # Compute the rotation matrices
# Rs = torch.eye(3).unsqueeze(0).repeat(kp_s.shape[0], 1, 1)  # Source rotation matrix
# Rd = torch.eye(3).unsqueeze(0).repeat(kp_d.shape[0], 1, 1)  # Driving rotation matrix

# # Call the MotionFieldEstimator
# deformation, occlusion, occlusion_2 = motion_field_estimator(appearance_volume, kp_s, kp_d, Rs, Rd)

have to plug this back in as context to claude.

VASA-1-hack/modules/real3d/facev2v_warp/model.py

Line 137 in 5532d1d

self.motion_field_estimator = MotionFieldEstimator(model_scale)

trithucxx · 2024-04-29T11:42:36Z

I want the voxceleb2 dataset - doesn’t seem available anymore - torrents are dead. - i did do some data loader on my emote-hack repo - i will wire it up in a few days

https://github.com/johndpope/Emote-hack

in the diffusion transformer architecture- from what I understand- they use patches - and I don’t see those in code (spat out by Claude)

realistically- better to press for voodoo code to be released. This is kinda just academic excercise.

I have been downloaded, still work. Anw, how can I start/run your project bro?

johndpope · 2024-04-29T12:00:07Z

Hi @trithucxx -

I'm looking at booting up MegaPortrait by upgrading the training for this repo - https://github.com/johndpope/MegaPortrait/
@Kevinfringe had used a couple of directories - but I have some code in the works with decord / mp4s
https://github.com/johndpope/Emote-hack/blob/main/Net.py#L1085

For now - this model Eapp1 needs to be 100% - otherwise everything else isn't going to work.
Or maybe this volumetric can be sourced from other repo?
can this do it? IDK - https://real3dportrait.github.io/

This is the first part of the Appearance Encoder. To generate a 4D tensor of volumetric features vs.
https://github.com/johndpope/MegaPortrait/blob/master/model.py#L82

UPDATE
Im pretty sure we can piggy back off the VOODOO3D paper (code in june)

even though it's for NERF.
https://arxiv.org/pdf/2312.04651

trithucxx · 2024-05-01T07:29:01Z

Hi @trithucxx -

I'm looking at booting up MegaPortrait by upgrading the training for this repo - https://github.com/johndpope/MegaPortrait/ @Kevinfringe had used a couple of directories - but I have some code in the works with decord / mp4s https://github.com/johndpope/Emote-hack/blob/main/Net.py#L1085

For now - this model Eapp1 needs to be 100% - otherwise everything else isn't going to work. Or maybe this volumetric can be sourced from other repo? can this do it? IDK - https://real3dportrait.github.io/

This is the first part of the Appearance Encoder. To generate a 4D tensor of volumetric features vs. https://github.com/johndpope/MegaPortrait/blob/master/model.py#L82

UPDATE Im pretty sure we can piggy back off the VOODOO3D paper (code in june)

even though it's for NERF.
https://arxiv.org/pdf/2312.04651

I tested real3dportrait, it's seems to be inaccurate and the video take 3h for 2m talking vid completion time (too long). How about torrent you can not download. Hope to see your project run.

johndpope · 2024-05-05T10:09:15Z

so a few days ago i was looking at some other code - basically claude thinks there's enough to avoid needing the megaportrait code - specifically the 4D tensor of volumetric features
this supposedly handles it.

self.appearance_extractor = AppearanceFeatureExtractor()

class AppearanceFeatureExtractor(nn.Module):
    # 3D appearance features extractor
    # [N,3,256,256]
    # [N,64,256,256]
    # [N,128,128,128]
    # [N,256,64,64]
    # [N,512,64,64]
    # [N,32,16,64,64]
    def __init__(self, model_scale='standard'):
        super().__init__()
        use_weight_norm = False
        down_seq = [64, 128, 256]
        n_res = 6
        C = 32
        D = 16
        self.in_conv = ConvBlock2D("CNA", 3, down_seq[0], 7, 1, 3, use_weight_norm)
        self.down = nn.Sequential(*[DownBlock2D(down_seq[i], down_seq[i + 1], use_weight_norm) for i in range(len(down_seq) - 1)])
        self.mid_conv = nn.Conv2d(down_seq[-1], C * D, 1, 1, 0)
        self.res = nn.Sequential(*[ResBlock3D(C, use_weight_norm) for _ in range(n_res)])

        self.C, self.D = C, D

    def forward(self, x):
        x = self.in_conv(x)
        x = self.down(x)
        x = self.mid_conv(x)
        N, _, H, W = x.shape
        x = x.view(N, self.C, self.D, H, W)
        x = self.res(x)
        return x

francqz31 · 2024-05-05T14:28:56Z

with all do respect , but i don't actually believe that OPUS or even current llms (gpt-4 turbo-Opus-google's latest thing whatever the name-llama 3 400B) etc can accurately implement machine learning papers , I tried it multiple times and it just misses so many points ,and makes really simple mistakes like as if it doesn't even have a clue what it is writing , good thing Mr john that you document every step . I think your best shot will be with gpt-5 , i think in order to have an advanced llm implement a machine learning paper , you gotta have some kind of agentic thing like devin but with a reasoning of gpt-5 for example , you provide the paper and a similar code to the paper you wanna implement (for example you upload vasa-1 paper and make it fully read Audio2Head code) and then it will start developing off of it , just like professional software engineers ? what do you think Mr John?

francqz31 · 2024-05-05T14:30:58Z

if gpt-5 can't do that , then good luck having any kind of llm implement any machine learning paper before 2026

johndpope · 2024-05-09T23:16:06Z

@francqz31 i agree with most of your thoughts. The world will be different place when gpt5 drops. I’d add - don’t use chatgpt4 / use opus - and if the code it’s spitting out is / or feels off / discard the chat and start a fresh with updates. Eg base code + paper / increment logic / llm goes off on wrong tangent / discard chat / feed it updated code and even given it more context / header files or relevant code form other repos etc.

I completely rebuild megaPortrait codebase -
https://github.com/johndpope/megaPortrait-hack
need to wire up the dataloaders. can't decide on best approach.
johndpope/MegaPortrait-hack#2

UPDATE - I found some loss functions from SamsungLabs in Rome repo

I think this is probably close to what MS are mentioning here.

This work at SamsungLabs - would flow on from MegaPortraits.

UPDATE
@francqz31 - maybe too early to call it - but I just start training MegaPortrait
https://github.com/johndpope/MegaPortrait-hack

johndpope · 2024-05-30T06:30:53Z

ok - so took me a month - but i believe i got the dependent paper MegaPortraits implmented.
https://github.com/johndpope/MegaPortrait-hack/tree/main
there's actually going to be a new code upgrade/ with video data from FB dropping in July 24 - https://github.com/neeek2303/EMOPortraits

i am running local training on a couple of videos
johndpope/MegaPortrait-hack#21

the interesting thing with this paper is - there's no keypoints - it's all resnet feature maps with warping.
UPDATE
running some numbers passed chatgpt - 250 seconds / epoch - 200,000 epochs will take like 2 years on 3090. / 2 months on a h100.

UPDATE 2 - some warping code is taking long time - I chop it out for now.

johndpope/MegaPortrait-hack#28

fenghe12 · 2024-06-20T07:17:11Z

do u still need talking head video dataset？we collected some

johndpope · 2024-06-27T14:27:43Z

hi @fenghe12 -
sorry for late reply - been distracted recreating code for this paper - https://arxiv.org/pdf/2405.07257
https://github.com/johndpope/SPEAK-hack

I would appreciate any help in cross checking code with paper.
I include some test inference code

if you want to share link to videos - happy to grab them.

johndpope · 2024-07-28T04:34:44Z

this paper by Microsoft - Implicit Motion Function
https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Implicit_Motion_Function_CVPR_2024_paper.pdf

I recreate here
https://github.com/johndpope/IMF

(assume it's all wrong -i had to switch in ResNets as feature extractor (it's not mentioned in paper) yet it seems to be converging)
https://wandb.ai/snoozie/IMF/runs/f9o9vvje?nw=nwusersnoozie

UPDATE - sorry - this needs completely redoing -
https://github.com/johndpope/IMF/tree/v1

johndpope mentioned this issue Apr 27, 2024

academic torrent and availability of missing files CelebV-HQ/CelebV-HQ#21

Open

johndpope mentioned this issue May 16, 2024

follow up paper - FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features johndpope/MegaPortrait-hack#6

Closed

johndpope mentioned this issue May 30, 2024

Reachitect base model to make faster training / use less ram johndpope/MegaPortrait-hack#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you planning on training? #5

Are you planning on training? #5

zsxkib commented Apr 24, 2024

johndpope commented Apr 25, 2024 •

edited

Loading

francqz31 commented Apr 25, 2024

johndpope commented Apr 27, 2024 •

edited

Loading

trithucxx commented Apr 29, 2024

johndpope commented Apr 29, 2024 •

edited

Loading

trithucxx commented May 1, 2024

johndpope commented May 5, 2024 •

edited

Loading

francqz31 commented May 5, 2024

francqz31 commented May 5, 2024

johndpope commented May 9, 2024 •

edited

Loading

johndpope commented May 30, 2024 •

edited

Loading

fenghe12 commented Jun 20, 2024

johndpope commented Jun 27, 2024

johndpope commented Jul 28, 2024 •

edited

Loading

Are you planning on training? #5

Are you planning on training? #5

Comments

zsxkib commented Apr 24, 2024

johndpope commented Apr 25, 2024 • edited Loading

francqz31 commented Apr 25, 2024

johndpope commented Apr 27, 2024 • edited Loading

trithucxx commented Apr 29, 2024

johndpope commented Apr 29, 2024 • edited Loading

trithucxx commented May 1, 2024

johndpope commented May 5, 2024 • edited Loading

francqz31 commented May 5, 2024

francqz31 commented May 5, 2024

johndpope commented May 9, 2024 • edited Loading

johndpope commented May 30, 2024 • edited Loading

fenghe12 commented Jun 20, 2024

johndpope commented Jun 27, 2024

johndpope commented Jul 28, 2024 • edited Loading

johndpope commented Apr 25, 2024 •

edited

Loading

johndpope commented Apr 27, 2024 •

edited

Loading

johndpope commented Apr 29, 2024 •

edited

Loading

johndpope commented May 5, 2024 •

edited

Loading

johndpope commented May 9, 2024 •

edited

Loading

johndpope commented May 30, 2024 •

edited

Loading

johndpope commented Jul 28, 2024 •

edited

Loading