Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed model inference speed is slower than original pytorch pix2pix #62

Open
mpottinger opened this issue Dec 27, 2020 · 7 comments

Comments

@mpottinger
Copy link

Hello, I am just curious, I have adapted test.py to do realtime inference on webcam, as well as the exact same modifications to https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix, and set it to do inference on CPU only, since my goal is to run it on a mobile phone.

While the memory usage of the compressed model is much smaller, only 5MB vs 200MB, inference time seems to actually be slower on the compressed model. I am getting 4 FPS on test_compressed.sh for edges2shoes_r, and in the original pytorch-CycleGAN-and-pix2pix code I am getting 8 FPS. Is this normal? I am sure the modifications I did in the inference code is the same, and that it looks like actual model latency.

Having 5MB instead of 200MB ram usage is a dramatic decrease, but is there also supposed to be a dramatic speedup?

@mpottinger
Copy link
Author

Update: One thing I missed was the Pytorch version. The other repo had a newer version. I upgraded to 1.7.1 for this one and inference speed is identical @ 8 FPS for compressed. So at least not slower, but no speed benefit. 40x memory use reduction though!

@lmxyy
Copy link
Collaborator

lmxyy commented Dec 28, 2020

Hi! I am wondering which model did you use? I think there should be some latency reduction as suggested in our paper. We've also released the code for measuring the latency in our repo. Could you double-check if there are some differences between your latency script and our latency script?

@mpottinger
Copy link
Author

mpottinger commented Dec 28, 2020

I am simply timing the inference within test.py like this:

`start_time = time.time()

model.test() # run inference

print("test FPS: ", 1 / (time.time() - start_time)) # FPS `

I also overlooked that I was using my own custom model in the original Pytorch code, but since the model is significantly larger than the compressed edgestoshoes_r, I expected some speedup even between different models.

I also am finding Pix2Pix particularly difficult to convert to tflite or Pytorch mobile for inference on Android, so I might be forced to use it in Python on Android anyway, which can't be multithreaded to take advantage of the lower memory footprint.

It would be great on an embedded device such as a Jetson Nano, but unfortunately I am targeting Android only.

@seekingdeep
Copy link

@mpottinger upload the original and compressed models, along with some test images

@mpottinger
Copy link
Author

Ok thank you. I am currently working more on my app that will make use of the models. I have figured out how to convert the model to mobile via onnx and ran inference successfully, so as I do more tests if I keep seeing the same results I will upload the models.

@seekingdeep
Copy link

seekingdeep commented Jan 2, 2021

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size?
Also, share with me your email so i can contact you, i am also working to utilize this repository.

@mpottinger
Copy link
Author

mpottinger commented Jan 3, 2021

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size?
Also, share with me your email so i can contact you, i am also working to utilize this repository.

onnx models are definitely faster than in Pytorch, that is due to being able to use frameworks optimized just for inference. Model size is about the same when converted.

I am able to use onnxruntime which is much faster for CPU inference, I have also successfully run inference in OpenCV dnn, which is slower but easy to implement on multiple platforms including Android.

I have also tried Alibaba MNN which is supposed to be very fast on mobile, speeds on mobile are comparable to CPU speeds on my fast desktop PC. 5-10 FPS with uncompressed models.

I have found that this issue can probably be closed. My mistake was comparing a custom trained model in one repo to the edges2shoes in the other repo. I thought inference time should be constant, but apparently not and it seems the trained model makes a difference?

I modified the jupyter notebook in this repo to do webcam inference on CPU only and comparing only the edges2shoes full & compressed in this repo. There I was able to see the speed difference on a live webcam stream.

Approx 3 FPS for the uncompressed full model, ~6 FPS for the compressed model. So I am assuming I will get a similar 2x speedup on my own custom models. So my initial comparison was flawed.

Here is the inference code I am using to test on the webcam:

`#!/usr/bin/env python

import os

import pickle
import time
import tqdm
import cv2

import numpy as np
import torch
import torchvision.transforms as transforms

from utils.util import tensor2im

from models import create_model

Get our model

filename = 'opts/opt_compressed.pkl'

filename = 'opts/opt_full.pkl'

with open(filename, 'rb') as f:
opt = pickle.load(f)

opt.gpu_ids = []
model = create_model(opt, verbose=False)
model.setup(opt, verbose=False)
model = model

transform_list = [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
transform = transforms.Compose(transform_list)
cap = cv2.VideoCapture(0)
start_time = time.time()
frameCount = 0
while True:
ret, frame = cap.read()
frameCount += 1
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = cv2.resize(frame,(256,256))
input = transform(frame).to('cpu:0')
input = input.reshape([1, 3, 256, 256])

model_start = time.time()
output_ours = model.netG(input).cpu()
image_numpy = tensor2im(output_ours)
print("FPS: ", 1 / (time.time() - model_start))  # FPS = 1 / time to process loop
print("Avg FPS: ", frameCount / (time.time() - start_time))  # FPS = framecount / time to process loop
if len(image_numpy.shape) == 4:
    image_numpy = image_numpy[0]
if len(image_numpy.shape) == 2:
    image_numpy = np.expand_dims(image_numpy, axis=2)
if image_numpy.shape[2] == 1:
    image_numpy = np.repeat(image_numpy, 3, 2)
cv2.imshow("input", frame)
cv2.imshow("result", image_numpy)
cv2.waitKey(1)

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants