Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice on quality of output(not an issue) #259

Open
Fibonacci134 opened this issue May 6, 2022 · 18 comments
Open

Advice on quality of output(not an issue) #259

Fibonacci134 opened this issue May 6, 2022 · 18 comments

Comments

@Fibonacci134
Copy link

In case someone overlooked it, you can preserve quality a lot better by, changing your temp_directory files to png which is lossless, instead of jpg which is designed for compression and definitely loses quality. The folder will quickly grow in size because of the non compression nature of png, but you can always delete the folder once you get your output video.

@pidgeon777
Copy link

Interesting. How could you actually implement this? I mean the necessary code modifications to make use of PNG instead.

@Fibonacci134
Copy link
Author

Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png. The difference is quite noticable. You can also set up moviepy to not compress the video file by exporting it as a .mov file which is also lossless no matter how many times you edit it, it will never lose quality. I will upload the altered files to my github today, and give some instructions on a few things, like when the image is "not iterable" and what you can do to get passed that. Also how to use Gfgpan to quickly enhance the faces to hd and recompile the video at a higher resolution.

@pidgeon777
Copy link

@Fibonacci134 yours would be a tremendous contribution and I will eagerly await it.

Many thanks in advance 👍.

@Fibonacci134
Copy link
Author

No problem at all! i uploaded the videoswap.py and i will be posting a tutorial on how to use GFPGAN to enhance. will also post reverse2original.py that is a bit faster at inference im still working on it so it is still a work in progress.

@razr112
Copy link

razr112 commented May 9, 2022

So yesterday after reading this post I downloaded GFPGAN to try it out. All I can say is... Wow. The results are incredible. Thanks for the tip.

@pidgeon777
Copy link

@razr112 how does GFPGAN handle multiple faces in a picture?

Also, what if the face is not looking at the camera, but for example to the left or the right?

@Fibonacci134 when you'll have tested your improved output quality code and if you would'nt mind sharing it, please let us kindly know 👍.

@Fibonacci134
Copy link
Author

Yes GFGPAN is amazing!! quick tip, if you want to just enhance the face (its a lot quicker) you can add --bg_upsampler none and it'll only do the faces. you can also have it save to only "restored imgs" by altering the inference.py file to convert an entire video into sequential images :

ffmpeg -i (name of your video).mp4 -vf fps=25 out%d.png

convert back to video once images are enhanced :

ffmpeg -f image2 -framerate 25 -i %0d.png -vcodec libx264 -crf 22 video.mp4

model v 1.3 is really good but I prefer v 1

@razr112
Copy link

razr112 commented May 11, 2022

Where exactly do I place that code in the inference.py file?

@Fibonacci134

@Fibonacci134
Copy link
Author

Where exactly do I place that code in the inference.py file?

@Fibonacci134

Hey that code isn't meant to be put in inference.py , its just normal ffmpeg command. To make things simple, all you have to do is install FFmpeg and add the FFmpeg program to the Windows path using Environment variables. If using linux, then no need to do that, it will work without any other effort. The steps are as follows:
-Put video in a empty folder
-right click inside folder and open in terminal

  • type the command to get the pictures and then remove the video. Now just replace that folder with the whole_imgs folder.
  • to convert back to video, use th second code while in the restored_img folder.

Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol

@razr112
Copy link

razr112 commented May 11, 2022

Ah okay. I already use FFmpeg commands to extract the frames. I just couldn't figure out how to implement it into the inference file.

Now that you've mentioned it, i think i will add the script into the Gfgpan script to automate the process. Give me a couple of days lol

Awesome! Looking forward to it. Implementing the script is way above my level of beginner coding knowledge lol.

@Fibonacci134

@Fibonacci134
Copy link
Author

Ohh okay, awesome. Lol ots just that there are generally more windows users and most are usually not too familiar with ffmpeg. Its great that you use it, such a handy little tool. And no worries bro, we're all beginners in grand scheme of things lol im Guessing you too are a self teacher so you know there's never any structure, we just try and learn things as they come across 😂. Will hopefully get to work on some stuff this weekend.

@cheetahfightfx
Copy link

I use ffmpeg and gfpgan to get better output and the difference is just legendary. But GFPGAN is a resource intensive and slow algorithm, which is the only con that I have found.

@epicstar7
Copy link

Implementing GFPGAN or GPEN into the main swap pipeline would be an amazing improvement of this repo. If this could be implemented in a similar way to dot, it would be easy to use as an option before swapping. --gpen_type 512 for instance.

Are there any plans to implement something like this to SimSwap?

@fitzgeraldja
Copy link

@epicstar7 you can include gfpgan cleaning as a step only on the masked results, for minimal overhead cost, by adding just a few cells to the colab, at least for single face swaps in a video (I haven't tried for multi case but should be relatively straightforward to extend) - first get all the necessary packages:

# Clone GFPGAN and enter the GFPGAN folder
%cd /content
!rm -rf GFPGAN
!git clone https://github.com/TencentARC/GFPGAN.git
%cd GFPGAN

# Set up the environment
# Install basicsr - https://github.com/xinntao/BasicSR
# We use BasicSR for both training and inference
!BASICSR_JIT='True' BASICSR_EXT=True pip install basicsr
# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
!pip install facexlib
# Install other depencencies
!pip install -r requirements.txt
!python setup.py develop
!pip install realesrgan  # used for enhancing the background (non-face) regions
# Download the pre-trained model
# !wget https://github.com/TencentARC/GFPGAN/releases/download/v0.2.0/GFPGANCleanv1-NoCE-C2.pth -P experiments/pretrained_models
# Now we use the V1.3 model for the demo
!wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth -P experiments/pretrained_models

then write a new version of videoswap to use - the version below also saves frames as lossless pngs rather than jpg which can improve the quality of results at the cost of requiring additional memory

%%writefile /content/SimSwap/util/videoswap_gfpgan.py
import os 
import torch
from torchvision.transforms.functional import normalize
from torchvision.transforms import Resize, ToTensor, Normalize, Compose
from basicsr.utils import tensor2img

gfpgan_transform_upsample = Compose([
    Resize([int(512), int(512)]),
    # ToTensor(),
    Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])  

def gfpgan_downsampler(crop_size): 
  return Compose([
    Resize([int(crop_size), int(crop_size)]),
    # ToTensor(),
    # Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])  

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

os.chdir('/content/GFPGAN')
from gfpgan.archs.gfpganv1_clean_arch import GFPGANv1Clean

arch = 'clean'
channel_multiplier = 2
model_name = 'GFPGANv1.3'
model_path = os.path.join('experiments/pretrained_models', model_name + '.pth')

if arch == 'clean':
  gfpgan = GFPGANv1Clean(
      out_size=512,
      num_style_feat=512,
      channel_multiplier=channel_multiplier,
      decoder_load_path=None,
      fix_decoder=False,
      num_mlp=8,
      input_is_latent=True,
      different_w=True,
      narrow=1,
      sft_half=True)

loadnet = torch.load(model_path)
if 'params_ema' in loadnet:
    keyname = 'params_ema'
else:
    keyname = 'params'
gfpgan.load_state_dict(loadnet[keyname], strict=True)
gfpgan.eval()
gfpgan = gfpgan.to(device)


os.chdir('/content/SimSwap')

'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:19:52
Description: 
'''
import os 
import cv2
import glob
import torch
import shutil
import numpy as np
from tqdm import tqdm
from util.reverse2original import reverse2wholeimage
import moviepy.editor as mp
from moviepy.editor import AudioFileClip, VideoFileClip 
from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
import  time
from util.add_watermark import watermark_image
from util.norm import SpecificNorm
from parsing_model.model import BiSeNet



def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo = False,use_mask =False):
    video_forcheck = VideoFileClip(video_path)
    if video_forcheck.audio is None:
        no_audio = True
    else:
        no_audio = False

    del video_forcheck

    if not no_audio:
        video_audio_clip = AudioFileClip(video_path)

    video = cv2.VideoCapture(video_path)
    logoclass = watermark_image('./simswaplogo/simswaplogo.png')
    ret = True
    frame_index = 0

    frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

    # video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))

    # video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    fps = video.get(cv2.CAP_PROP_FPS)
    if  os.path.exists(temp_results_dir):
            shutil.rmtree(temp_results_dir)

    spNorm =SpecificNorm()
    if use_mask:
        n_classes = 19
        net = BiSeNet(n_classes=n_classes)
        net.cuda()
        save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
        net.load_state_dict(torch.load(save_pth))
        net.eval()
    else:
        net =None

    def _totensor(array):
        tensor = torch.from_numpy(array)
        img = tensor.transpose(0, 1).transpose(0, 2).contiguous()
        return img.float().div(255)

    gfpgan_transform_downsample = gfpgan_downsampler(crop_size)

    def gfpgan_enhance(img_tensor): 
      # gfp_t = normalize(img_tensor, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=False)
      # gfp_t = gfp_t.unsqueeze(0).to(device)
      img_tensor = gfpgan_transform_upsample(img_tensor.unsqueeze(0))
      output = gfpgan(img_tensor,return_rgb=True, weight=0.5)[0]
      # restored_face = tensor2img(output.squeeze(0), rgb2bgr=False, min_max=(-1, 1))
      down_img = tensor2img(gfpgan_transform_downsample(output).squeeze(0), rgb2bgr=True, min_max=(-1, 1))[:,:,[2,1,0]]
      return _totensor(down_img)


    # while ret:
    for frame_index in tqdm(range(frame_count)): 
        ret, frame = video.read()
        if  ret:
            detect_results = detect_model.get(frame,crop_size)

            if detect_results is not None:
                # print(frame_index)
                if not os.path.exists(temp_results_dir):
                        os.mkdir(temp_results_dir)
                frame_align_crop_list = detect_results[0]
                frame_mat_list = detect_results[1]
                swap_result_list = []
                frame_align_crop_tenor_list = []
                for frame_align_crop in frame_align_crop_list:

                    # BGR TO RGB
                    # frame_align_crop_RGB = frame_align_crop[...,::-1]

                    frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda()

                    swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0]
                    # print(swap_result.shape)
                    # input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
                    # img = cv2.resize(img, (512, 512))
                    # restore faces and background if necessary
                    
                    # these steps I think are identical to before
                    # cropped_face_t = img2tensor(cropped_face / 255., bgr2rgb=True, float32=True)
                    # normalize(cropped_face_t, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
                    # cropped_face_t = cropped_face_t.unsqueeze(0).to(self.device)
                    # output = self.gfpgan(cropped_face_t, return_rgb=False, weight=weight)[0]
                    # # convert to image
                    # restored_face = tensor2img(output.squeeze(0), rgb2bgr=True, min_max=(-1, 1))
                    swap_result = gfpgan_enhance(swap_result)
                    # print(swap_result.shape)

                    cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
                    swap_result_list.append(swap_result)
                    frame_align_crop_tenor_list.append(frame_align_crop_tenor)

                    

                reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\
                    os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm)

            else:
                if not os.path.exists(temp_results_dir):
                    os.mkdir(temp_results_dir)
                frame = frame.astype(np.uint8)
                if not no_simswaplogo:
                    frame = logoclass.apply_frames(frame)
                cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.png'.format(frame_index)), frame)
        else:
            break

    video.release()

    # image_filename_list = []
    path = os.path.join(temp_results_dir,'*.png')
    image_filenames = sorted(glob.glob(path))

    clips = ImageSequenceClip(image_filenames,fps = fps)

    if not no_audio:
        clips = clips.set_audio(video_audio_clip)


    clips.write_videofile(save_path,audio_codec='aac')

then finally run a slightly modified version of the original script, where this also removes the watermark placed by default, and lets you choose whether to use the 224 or 512 crop size version of the model (NB if choosing 512, you will also need to add a !wget https://github.com/neuralchen/SimSwap/releases/download/512_beta/512.zip !unzip ./512.zip -d ./checkpoints line above the original checkpoint download)

%cd /content/SimSwap
import cv2
import torch
import fractions
import numpy as np
from PIL import Image
import torch.nn.functional as F
from torchvision import transforms
from models.models import create_model
from options.test_options import TestOptions
from insightface_func.face_detect_crop_single import Face_detect_crop
from util.videoswap_gfpgan import video_swap
import os

def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer = transforms.Compose([
        transforms.ToTensor(),
        #transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

transformer_Arcface = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

opt = TestOptions()
opt.initialize()
opt.parser.add_argument('-f') ## dummy arg to avoid bug
opt = opt.parse()
opt.pic_a_path = '/path/to/desired/face.png' ## or replace it with image from your own google drive
opt.video_path = '/path/to/target/video.mp4' ## or replace it with video from your own google drive
opt.output_path = '/path/to/output/video.mp4'
opt.temp_path = './tmp'
opt.Arc_path = './arcface_model/arcface_checkpoint.tar'
opt.isTrain = False
opt.use_mask = True  ## new feature up-to-date
opt.no_simswaplogo = True

start_epoch, epoch_iter = 1, 0
crop_sizes = [224,512]
opt.crop_size = crop_sizes[0]
crop_size = opt.crop_size

torch.nn.Module.dump_patches = True
if crop_size == 512:
    opt.which_epoch = 550000
    opt.name = '512'
    mode = 'ffhq'
else:
    mode = 'None'

model = create_model(opt)
model.eval()

app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode)
with torch.no_grad():
    pic_a = opt.pic_a_path
    # img_a = Image.open(pic_a).convert('RGB')
    img_a_whole = cv2.imread(pic_a)
    img_a_align_crop, _ = app.get(img_a_whole,crop_size)
    img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) 
    img_a = transformer_Arcface(img_a_align_crop_pil)
    img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

    # pic_b = opt.pic_b_path
    # img_b_whole = cv2.imread(pic_b)

 
 
 
    # img_b_align_crop, b_mat = app.get(img_b_whole,crop_size)
    # img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB)) 
    # img_b = transformer(img_b_align_crop_pil)
    # img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2])

    # convert numpy to tensor
    img_id = img_id.cuda()
    # img_att = img_att.cuda()

    #create latent id
    img_id_downsample = F.interpolate(img_id, size=(112,112))
    latend_id = model.netArc(img_id_downsample)
    latend_id = F.normalize(latend_id, p=2, dim=1)

    video_swap(opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,\
        no_simswaplogo=opt.no_simswaplogo,use_mask=opt.use_mask,crop_size=crop_size)

hope that helps!

@DrBlou
Copy link

DrBlou commented Nov 9, 2022

@fitzgeraldja Oh my god man ! You're my hero, thank you

@zecretaccount
Copy link

@fitzgeraldja This is really awesome. I can't get this version to work on my pc though, only https://github.com/mike9251/simswap-inference-pytorch
any chance you would fix the code for that repo? :)

@ziko2222
Copy link

Its quite simple, go to videoswap.py and on the bottom where you see jpg , just replace it with png.

I did that then I got this error:

Traceback (most recent call last):
File "test_video_swapsingle.py", line 86, in
no_simswaplogo=opt.no_simswaplogo,use_mask=opt.use_mask,crop_size=crop_size)
File "C:\SimSwap\SimSwap-main\util\videoswap.py", line 115, in video_swap
clips = ImageSequenceClip(image_filenames,fps = fps)
File "C:\Users\amt\anaconda3\envs\simswap\lib\site-packages\moviepy\video\io\ImageSequenceClip.py", line 64, in init
if isinstance(sequence[0], str):
IndexError: list index out of range

@Solenyalyl
Copy link

@ziko2222 ,have you solved it, I have received the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants