Compressed model inference speed is slower than original pytorch pix2pix #62

mpottinger · 2020-12-27T22:08:46Z

Hello, I am just curious, I have adapted test.py to do realtime inference on webcam, as well as the exact same modifications to https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix, and set it to do inference on CPU only, since my goal is to run it on a mobile phone.

While the memory usage of the compressed model is much smaller, only 5MB vs 200MB, inference time seems to actually be slower on the compressed model. I am getting 4 FPS on test_compressed.sh for edges2shoes_r, and in the original pytorch-CycleGAN-and-pix2pix code I am getting 8 FPS. Is this normal? I am sure the modifications I did in the inference code is the same, and that it looks like actual model latency.

Having 5MB instead of 200MB ram usage is a dramatic decrease, but is there also supposed to be a dramatic speedup?

mpottinger · 2020-12-27T22:57:56Z

Update: One thing I missed was the Pytorch version. The other repo had a newer version. I upgraded to 1.7.1 for this one and inference speed is identical @ 8 FPS for compressed. So at least not slower, but no speed benefit. 40x memory use reduction though!

lmxyy · 2020-12-28T02:09:49Z

Hi! I am wondering which model did you use? I think there should be some latency reduction as suggested in our paper. We've also released the code for measuring the latency in our repo. Could you double-check if there are some differences between your latency script and our latency script?

mpottinger · 2020-12-28T15:07:38Z

I am simply timing the inference within test.py like this:

`start_time = time.time()

model.test() # run inference

print("test FPS: ", 1 / (time.time() - start_time)) # FPS `

I also overlooked that I was using my own custom model in the original Pytorch code, but since the model is significantly larger than the compressed edgestoshoes_r, I expected some speedup even between different models.

I also am finding Pix2Pix particularly difficult to convert to tflite or Pytorch mobile for inference on Android, so I might be forced to use it in Python on Android anyway, which can't be multithreaded to take advantage of the lower memory footprint.

It would be great on an embedded device such as a Jetson Nano, but unfortunately I am targeting Android only.

seekingdeep · 2021-01-01T19:01:00Z

@mpottinger upload the original and compressed models, along with some test images

mpottinger · 2021-01-02T03:42:45Z

Ok thank you. I am currently working more on my app that will make use of the models. I have figured out how to convert the model to mobile via onnx and ran inference successfully, so as I do more tests if I keep seeing the same results I will upload the models.

seekingdeep · 2021-01-02T11:04:18Z

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size?
Also, share with me your email so i can contact you, i am also working to utilize this repository.

mpottinger · 2021-01-03T21:24:02Z

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size?
Also, share with me your email so i can contact you, i am also working to utilize this repository.

onnx models are definitely faster than in Pytorch, that is due to being able to use frameworks optimized just for inference. Model size is about the same when converted.

I am able to use onnxruntime which is much faster for CPU inference, I have also successfully run inference in OpenCV dnn, which is slower but easy to implement on multiple platforms including Android.

I have also tried Alibaba MNN which is supposed to be very fast on mobile, speeds on mobile are comparable to CPU speeds on my fast desktop PC. 5-10 FPS with uncompressed models.

I have found that this issue can probably be closed. My mistake was comparing a custom trained model in one repo to the edges2shoes in the other repo. I thought inference time should be constant, but apparently not and it seems the trained model makes a difference?

I modified the jupyter notebook in this repo to do webcam inference on CPU only and comparing only the edges2shoes full & compressed in this repo. There I was able to see the speed difference on a live webcam stream.

Approx 3 FPS for the uncompressed full model, ~6 FPS for the compressed model. So I am assuming I will get a similar 2x speedup on my own custom models. So my initial comparison was flawed.

Here is the inference code I am using to test on the webcam:

`#!/usr/bin/env python

import os

import pickle
import time
import tqdm
import cv2

import numpy as np
import torch
import torchvision.transforms as transforms

from utils.util import tensor2im

from models import create_model

Get our model

filename = 'opts/opt_compressed.pkl'

filename = 'opts/opt_full.pkl'

with open(filename, 'rb') as f:
opt = pickle.load(f)

opt.gpu_ids = []
model = create_model(opt, verbose=False)
model.setup(opt, verbose=False)
model = model

transform_list = [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
transform = transforms.Compose(transform_list)
cap = cv2.VideoCapture(0)
start_time = time.time()
frameCount = 0
while True:
ret, frame = cap.read()
frameCount += 1
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = cv2.resize(frame,(256,256))
input = transform(frame).to('cpu:0')
input = input.reshape([1, 3, 256, 256])

model_start = time.time()
output_ours = model.netG(input).cpu()
image_numpy = tensor2im(output_ours)
print("FPS: ", 1 / (time.time() - model_start))  # FPS = 1 / time to process loop
print("Avg FPS: ", frameCount / (time.time() - start_time))  # FPS = framecount / time to process loop
if len(image_numpy.shape) == 4:
    image_numpy = image_numpy[0]
if len(image_numpy.shape) == 2:
    image_numpy = np.expand_dims(image_numpy, axis=2)
if image_numpy.shape[2] == 1:
    image_numpy = np.repeat(image_numpy, 3, 2)
cv2.imshow("input", frame)
cv2.imshow("result", image_numpy)
cv2.waitKey(1)

`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed model inference speed is slower than original pytorch pix2pix #62

Compressed model inference speed is slower than original pytorch pix2pix #62

mpottinger commented Dec 27, 2020

mpottinger commented Dec 27, 2020

lmxyy commented Dec 28, 2020

mpottinger commented Dec 28, 2020 •

edited

seekingdeep commented Jan 1, 2021

mpottinger commented Jan 2, 2021

seekingdeep commented Jan 2, 2021 •

edited

mpottinger commented Jan 3, 2021 •

edited

Compressed model inference speed is slower than original pytorch pix2pix #62

Compressed model inference speed is slower than original pytorch pix2pix #62

Comments

mpottinger commented Dec 27, 2020

mpottinger commented Dec 27, 2020

lmxyy commented Dec 28, 2020

mpottinger commented Dec 28, 2020 • edited

seekingdeep commented Jan 1, 2021

mpottinger commented Jan 2, 2021

seekingdeep commented Jan 2, 2021 • edited

mpottinger commented Jan 3, 2021 • edited

Get our model

filename = 'opts/opt_full.pkl'

mpottinger commented Dec 28, 2020 •

edited

seekingdeep commented Jan 2, 2021 •

edited

mpottinger commented Jan 3, 2021 •

edited