-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are you planning on training? #5
Comments
I want the voxceleb2 dataset - doesn’t seem available anymore - torrents are dead. - i did do some data loader on my emote-hack repo - i will wire it up in a few days in the diffusion transformer architecture- from what I understand- they use patches - and I don’t see those in code (spat out by Claude) realistically- better to press for voodoo code to be released. This is kinda just academic excercise. |
Hey, I just found your repo , tell me if those work for you: this should be full voxceleb2 dataset 2-Audio files: Video files: |
actually on holiday - away from workstation with cuda - so can't run this
these edits were from chatgpt - nowdays almost exclusively using claude
have to plug this back in as context to claude.
|
I have been downloaded, still work. Anw, how can I start/run your project bro? |
Hi @trithucxx - I'm looking at booting up MegaPortrait by upgrading the training for this repo - https://github.com/johndpope/MegaPortrait/ For now - this model Eapp1 needs to be 100% - otherwise everything else isn't going to work. This is the first part of the Appearance Encoder. To generate a 4D tensor of volumetric features vs. UPDATE
![]() |
I tested real3dportrait, it's seems to be inaccurate and the video take 3h for 2m talking vid completion time (too long). How about torrent you can not download. Hope to see your project run. |
so a few days ago i was looking at some other code - basically claude thinks there's enough to avoid needing the megaportrait code - specifically the 4D tensor of volumetric features
class AppearanceFeatureExtractor(nn.Module):
# 3D appearance features extractor
# [N,3,256,256]
# [N,64,256,256]
# [N,128,128,128]
# [N,256,64,64]
# [N,512,64,64]
# [N,32,16,64,64]
def __init__(self, model_scale='standard'):
super().__init__()
use_weight_norm = False
down_seq = [64, 128, 256]
n_res = 6
C = 32
D = 16
self.in_conv = ConvBlock2D("CNA", 3, down_seq[0], 7, 1, 3, use_weight_norm)
self.down = nn.Sequential(*[DownBlock2D(down_seq[i], down_seq[i + 1], use_weight_norm) for i in range(len(down_seq) - 1)])
self.mid_conv = nn.Conv2d(down_seq[-1], C * D, 1, 1, 0)
self.res = nn.Sequential(*[ResBlock3D(C, use_weight_norm) for _ in range(n_res)])
self.C, self.D = C, D
def forward(self, x):
x = self.in_conv(x)
x = self.down(x)
x = self.mid_conv(x)
N, _, H, W = x.shape
x = x.view(N, self.C, self.D, H, W)
x = self.res(x)
return x |
with all do respect , but i don't actually believe that OPUS or even current llms (gpt-4 turbo-Opus-google's latest thing whatever the name-llama 3 400B) etc can accurately implement machine learning papers , I tried it multiple times and it just misses so many points ,and makes really simple mistakes like as if it doesn't even have a clue what it is writing , good thing Mr john that you document every step . I think your best shot will be with gpt-5 , i think in order to have an advanced llm implement a machine learning paper , you gotta have some kind of agentic thing like devin but with a reasoning of gpt-5 for example , you provide the paper and a similar code to the paper you wanna implement (for example you upload vasa-1 paper and make it fully read Audio2Head code) and then it will start developing off of it , just like professional software engineers ? what do you think Mr John? |
if gpt-5 can't do that , then good luck having any kind of llm implement any machine learning paper before 2026 |
@francqz31 i agree with most of your thoughts. The world will be different place when gpt5 drops. I’d add - don’t use chatgpt4 / use opus - and if the code it’s spitting out is / or feels off / discard the chat and start a fresh with updates. Eg base code + paper / increment logic / llm goes off on wrong tangent / discard chat / feed it updated code and even given it more context / header files or relevant code form other repos etc. I completely rebuild megaPortrait codebase - UPDATE - I found some loss functions from SamsungLabs in Rome repo This work at SamsungLabs - would flow on from MegaPortraits. UPDATE |
ok - so took me a month - but i believe i got the dependent paper MegaPortraits implmented. i am running local training on a couple of videos the interesting thing with this paper is - there's no keypoints - it's all resnet feature maps with warping. UPDATE 2 - some warping code is taking long time - I chop it out for now. |
do u still need talking head video dataset?we collected some |
hi @fenghe12 - I would appreciate any help in cross checking code with paper. if you want to share link to videos - happy to grab them. |
this paper by Microsoft - Implicit Motion Function I recreate here (assume it's all wrong -i had to switch in ResNets as feature extractor (it's not mentioned in paper) yet it seems to be converging) UPDATE - sorry - this needs completely redoing - |
Lmk if you're planning on training, i could maybe help
The text was updated successfully, but these errors were encountered: