-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CELEB DF #38
Comments
You probably downloaded a wrong version of CelebDF v2 or there is some file missing. That file should be in the root folder of the dataset, as pointed in their official github page https://github.com/yuezunli/celeb-deepfakeforensics |
Right, I didn't notice this file in my folder. I copied only videos to my another computer few months ago, without this one file. Thank you very much :) |
Hey @MartaUch , have you tried playing with the parameter Edoardo |
Hi @CrohnEngineer, When I set --source as the folder of 800 videos the errors appeared: 'FileNotFoundError: Unable to find C:\Users\seminarium\Marta_U\praktyka\DataSet\Celeb_DF_model6\Celeb-real\id33_0000.mp4. Are you sure that C:\Users\seminarium\Marta_U\praktyka\DataSet\Celeb_DF_model6 is the correct source directory for the video you indexed in the dataframe?" |
Hey @MartaUch , just a quick check in order to verify if you made all the necessary steps (and remember, please refer always to the scripts if you have any doubt on how to use our code: in this case we will look at make_dataset.sh ):
Sorry if I sound niggling, but I'm not 100% sure about your situation. You have a reduced version of the Celeb-DF dataset right? Edoardo |
Hi @CrohnEngineer, I understand your doubts about my problem, so I will try to explain that better. These were my steps: If I may, I have one more question about next step- training. I want to split my dataset to 80% for training and 20% for validation. I understand that ----traindb and --valdb should refer to two different folders.
Also, after training step, my test set doesn't have to contain videos from this 800 videos, does it? Because if this is necessary, then I need to split my dataset differently (for example 70% for training, 15 % for validation and 15% for test). Sorry for so many questions. I'm writting my master's dissertation about deepfakes and I want to be sure of each step that I make. I'm really gratefull for your help. Bests, |
Hey @MartaUch , thank you for all the info, now I have a clearer picture in mind!
Perfect, that's good to know :)
Actually, the parameters
This is true, so what I would suggest you is to create a 75-15-15 split modifying the function get_split_df from line 80.
Good luck with your dissertation 💪 and let us know how it goes :) Edoardo |
Hi @CrohnEngineer, Bests, |
Hey @MartaUch ,
We take the fake videos starting from line 93. Edoardo |
@CrohnEngineer, Bests, |
Hey @MartaUch , there was a small bug in the type of argument required by the script for Edoardo |
Hi @CrohnEngineer, I've just got my results from the test and I'm a little bit confused, because number of testing videos was equal:
My dataset accounts 800 videos and in split.py I split it to 0.7 (train), 0.15 (val) and 0.15 (test). On the training step I've got correct values:
This is a peace of my code from split.py: I've also calculated the avg score for these testing videos and I just want to ask for advice, whether this results seems to be so bad because I was training my model for too short? I set --maxiter to 100, because I thought it would be enough. Bests, |
Hello @CrohnEngineer, I've already figured out why number of testing videos didn't contain all videos, especially the fake ones. I think it is because I selected only 400 fake wideos, but they don't necessary fit to the originals videos which my dataset contains. Do you think that might be the reason? Ans still, I'm not sure about my results and whether I should train the model longer. Maybe my dataset is too small. Bests, |
Hey @MartaUch ,
That might actually be the reason. You should check if for any FAKE video in your small dataset, there exists a REAL counterpart, so be sure that your dataset in the end gets balanced in all splits.
Your dataset indeed is quite small, and 100 iterations definitevely is a small number for training your model. Edoardo |
Hi @CrohnEngineer, Thank you very much for your advices. I prepared my dataset once more and now each fake video has its real counterpart. As I see the split works properly. Now I have 540 real and 540 fakes videos. I also limited number of max iterations to 550 and I hope it will be enough. To be honest, I think changing the training procedure in train_model.py would be too difficult for me now, so I'll keep it unchanged. Bests, |
Hello,
I'm trying to create a model for Celeb dataset and I have a problem with creating indexes. I'm not sure what should the file "List_of_testing_videos.txt" contain. I only gave two paths: one to the file containing two folders with synthesis and real videos and the second, to save the video dataFrames.
Thank You in advance
The text was updated successfully, but these errors were encountered: