有关yolov4-tiny chunk部分的代码 #51

beizhengren · 2020-09-03T10:03:57Z

@enazoe 作者您好,
关于yolov4-tiny部分, 我想把IPluginV2IOExt 换成IPluginV2, 然后支持tensorrt5. 如下:
trt7:

	class Chunk : public IPluginV2IOExt{...}

trt5:

	class Chunk : public IPluginV2{...}

请问这样换完之后(版本暂且成为trt5), 是不是需要用trt5生成新的engine之后, 才能做推断.
trt5直接加载原始的trt7的已经转好的模型推断会不会有问题呢?
谢谢!

enazoe · 2020-09-03T10:14:16Z

@beizhengren 应该会有问题，你改过重新build下就好

beizhengren · 2020-09-04T02:11:40Z

@enazoe 好嘞, 我先试试

beizhengren · 2020-09-04T07:29:39Z

@enazoe 搞定了,谢谢!

beizhengren · 2020-09-10T08:38:13Z

@enazoe
作者您好,
请问在创建half的engine的时候相比float的engine创建有啥特别的操作吗?
我把接口改成IPluginV2之后, 可以成功创建float engine,
但是创建 half 的engine的时候会执行下面的语句报错:

yolo-tensorrt/modules/chunk.cu

Line 68 in cc405b0

    
           NV_CUDA_CHECK(cudaMemcpy((char*)outputs[0] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice));

最大的改动就是把 void Chunk::configurePlugin 中的代码移到了 configureWithFormat中. 完整的chunk_V2.cpp如下:

Click to expand

#include <cmath>
#include <stdio.h>
#include <cassert>
#include <iostream>
#include "chunk_V2.h"
#include <cuda_runtime.h>
#define ASSERT(assertion)                                        \
    {                                                            \
        if (!(assertion))                                        \
        {                                                        \
            std::cout<<"ASSERTION FAILED in "                    \
                     <<__FILE__<<":"<<__LINE__                   \
                     <<std::endl;                                \
            abort();                                             \
        }                                                        \
    }

#define NV_CUDA_CHECK(status)                                                                      \
    {                                                                                              \
        if (status != 0)                                                                           \
        {                                                                                          \
            std::cout << "Cuda failure: " << cudaGetErrorString(status) << " in file " << __FILE__ \
                      << " at line " << __LINE__ << std::endl;                                     \
            abort();                                                                               \
        }                                                                                          \
    }

namespace nvinfer1
{
	ChunkV2::ChunkV2()
	{

	}
	ChunkV2::ChunkV2(const void* buffer, size_t size) 
	{
		assert(size == sizeof(_n_size_split));
		_n_size_split = *reinterpret_cast<const int*>(buffer);
	}
	ChunkV2::~ChunkV2()
	{

	}

	int ChunkV2::getNbOutputs() const
	{
		return 2;
	}

	Dims ChunkV2::getOutputDimensions(int index, const Dims* inputs, int nbInputDims)
	{
		assert(nbInputDims == 1);
		assert(index == 0 || index == 1);
		return Dims3(inputs[0].d[0] / 2, inputs[0].d[1], inputs[0].d[2]);
	}

	int ChunkV2::initialize()
	{
		return 0;
	}

	void ChunkV2::terminate()
	{
	}

	size_t ChunkV2::getWorkspaceSize(int maxBatchSize) const
	{
		return 0;
	}

	size_t ChunkV2::getSerializationSize() const
	{
		return sizeof(_n_size_split);
	}

	void ChunkV2::serialize(void *buffer)const
	{
		*reinterpret_cast<int*>(buffer) = _n_size_split;
	}

	bool ChunkV2::supportsFormat(DataType type, PluginFormat format) const
	{
		return ((type == DataType::kFLOAT || type == DataType::kHALF || type == DataType::kINT8) &&
				(format == PluginFormat::kNCHW));
	}

	// Set plugin namespace
	void ChunkV2::setPluginNamespace(const char* pluginNamespace)
	{
		_s_plugin_namespace = pluginNamespace;
	}

	const char* ChunkV2::getPluginNamespace() const
	{
		return _s_plugin_namespace.c_str();
	}

	// Configure the layer with input and output data types.
	void ChunkV2::configureWithFormat(
			const Dims* inputDims, int nbInputs, const Dims* outputDims, int nbOutputs,
			DataType type, PluginFormat format, int maxBatchSize){
		_n_size_split = inputDims->d[0] / 2 * inputDims->d[1] * inputDims->d[2] *sizeof(float);
        std::cerr << _n_size_split << std::endl;
		ASSERT(format == PluginFormat::kNCHW);
		ASSERT(type == DataType::kFLOAT || type == DataType::kHALF);
		//mDataType = type;
		ASSERT(inputDims[0].nbDims >= 1); // number of dimensions of the input tensor must be >=1

	}

	const char* ChunkV2::getPluginType()const
	{
		return "CHUNK_TRT";
	}

	const char* ChunkV2::getPluginVersion() const
	{
		return "1.0";
	}

	void ChunkV2::destroy()
	{
		delete this;
	}


	// Clone the plugin
	IPluginV2* ChunkV2::clone() const
	{
		ChunkV2 *p = new ChunkV2();
		p->_n_size_split = _n_size_split;
		p->setPluginNamespace(_s_plugin_namespace.c_str());
		return p;
	}

	int ChunkV2::enqueue(int batchSize,
		const void* const* inputs,
		void** outputs,
		void* workspace,
		cudaStream_t stream)
	{
		for (int b = 0; b < batchSize; ++b)
		{
			NV_CUDA_CHECK(cudaMemcpy((char*)outputs[0] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice));
			NV_CUDA_CHECK(cudaMemcpy((char*)outputs[1] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split + _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice));
		}
		return 0;
	}


    PluginFieldCollection ChunkV2PluginCreator::_fc{};
    std::vector<PluginField> ChunkV2PluginCreator::_vec_plugin_attributes;

    ChunkV2PluginCreator::ChunkV2PluginCreator()
    {
        _vec_plugin_attributes.clear();
        _fc.nbFields = _vec_plugin_attributes.size();
        _fc.fields = _vec_plugin_attributes.data();
    }

    const char* ChunkV2PluginCreator::getPluginName() const
    {
        return "CHUNK_TRT";
    }

    const char* ChunkV2PluginCreator::getPluginVersion() const
    {
        return "1.0";
    }

    const PluginFieldCollection* ChunkV2PluginCreator::getFieldNames()
    {
        return &_fc;
    }

	IPluginV2* ChunkV2PluginCreator::createPlugin(const char* name, const PluginFieldCollection* fc)
    {
		ChunkV2* obj = new ChunkV2();
        obj->setPluginNamespace(_s_name_space.c_str());
        return obj;
    }

	IPluginV2* ChunkV2PluginCreator::deserializePlugin(const char* name, const void* serialData, size_t serialLength)
    {
		ChunkV2* obj = new ChunkV2(serialData,serialLength);
        obj->setPluginNamespace(_s_name_space.c_str());
        return obj;
    }

    void ChunkV2PluginCreator::setPluginNamespace(const char* libNamespace)
    {
        _s_name_space = libNamespace;
    }

    const char* ChunkV2PluginCreator::getPluginNamespace() const
    {
        return _s_name_space.c_str();
    }

	REGISTER_TENSORRT_PLUGIN(ChunkV2PluginCreator);
}//namespace nvinfer1

非常感谢!

enazoe · 2020-09-10T09:42:09Z

@beizhengren 额，这个我不太清楚，fp16精度不用特意实现吧，fp32实现，序列化的时候会自动设成fp16

beizhengren · 2020-09-10T10:08:12Z

@enazoe 哦哦哦, 这样啊. 那int8应该和fp16的实现过程差不多吧? 只是多了一步setInt8Calibrator

enazoe · 2020-09-10T13:47:11Z

@beizhengren 是的，目前来看是这样的

beizhengren · 2020-09-11T01:30:16Z

@enazoe
非常感谢!

beizhengren · 2020-09-18T09:21:33Z

@enazoe
还得请教下~
为了兼容trt5, 我在这里用了setFp16Mode(true)

yolo-tensorrt/modules/yolo.cpp

Line 462 in 1110190

// m_Builder->setHalf2Mode(true);

创建 half 的engine的时候会执行下面的语句报错

yolo-tensorrt/modules/chunk.cu

Line 68 in cc405b0

    
           NV_CUDA_CHECK(cudaMemcpy((char*)outputs[0] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice));

您有空能试一下吗?我尝试了很多办法,没有搞定.
非常感谢!

seungtaek94 · 2021-09-13T04:55:59Z

@beizhengren @enazoe

Hi. Anyone can explain this issue as English ?

Actually, I have same problem in below line :(

yolo-tensorrt/modules/chunk.cu

Line 68 in cc405b0

    
           NV_CUDA_CHECK(cudaMemcpy((char*)outputs[0] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice));

My env;

cuda - 11.3
tensorrt - 8.0.1.6
opencv - 4.2.0
vs2019
windows10

beizhengren closed this as completed Sep 4, 2020

beizhengren reopened this Sep 10, 2020

beizhengren closed this as completed Sep 11, 2020

beizhengren reopened this Sep 18, 2020

Nuzhny007 mentioned this issue Sep 20, 2020

compile error with CUDA 10.1, cuDNN 7.5, TensorRT 5.1.5 Smorodov/Multitarget-tracker#249

Closed

beizhengren closed this as completed Sep 21, 2020

seungtaek94 mentioned this issue Sep 13, 2021

TRT8 on Windows #141

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

有关yolov4-tiny chunk部分的代码 #51

有关yolov4-tiny chunk部分的代码 #51

beizhengren commented Sep 3, 2020

enazoe commented Sep 3, 2020

beizhengren commented Sep 4, 2020

beizhengren commented Sep 4, 2020

beizhengren commented Sep 10, 2020 •

edited

enazoe commented Sep 10, 2020

beizhengren commented Sep 10, 2020

enazoe commented Sep 10, 2020

beizhengren commented Sep 11, 2020

beizhengren commented Sep 18, 2020

seungtaek94 commented Sep 13, 2021

有关yolov4-tiny chunk部分的代码 #51

有关yolov4-tiny chunk部分的代码 #51

Comments

beizhengren commented Sep 3, 2020

enazoe commented Sep 3, 2020

beizhengren commented Sep 4, 2020

beizhengren commented Sep 4, 2020

beizhengren commented Sep 10, 2020 • edited

enazoe commented Sep 10, 2020

beizhengren commented Sep 10, 2020

enazoe commented Sep 10, 2020

beizhengren commented Sep 11, 2020

beizhengren commented Sep 18, 2020

seungtaek94 commented Sep 13, 2021

beizhengren commented Sep 10, 2020 •

edited