Why is the TRT model of yolov7 not as fast as the PT model #41

YFforever2022 · 2022-08-22T00:15:56Z

Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error

YFforever2022 · 2022-08-22T00:22:34Z

Linaom1214 · 2022-08-22T01:00:03Z

Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error

which model?

Linaom1214 · 2022-08-22T01:00:52Z

Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error

your device ?

YFforever2022 · 2022-08-22T01:01:29Z

The same model was obtained using the official yolov7 tiny training

YFforever2022 · 2022-08-22T01:01:39Z

GTX 1080

Linaom1214 · 2022-08-22T01:03:34Z

GTX 1080

python export.py -o xxx.onnx -e xxx.trt -p p32

try FP32 precsion

YFforever2022 · 2022-08-22T01:11:08Z

This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end

Linaom1214 · 2022-08-22T01:16:29Z

This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end

maybe you should delect the image save section,

I think the image save is slowly.

Linaom1214 · 2022-08-22T01:17:05Z

This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end

you can provide more details of your test script.

YFforever2022 · 2022-08-22T01:19:59Z

Linaom1214 · 2022-08-22T01:22:20Z

your trt version?

YFforever2022 · 2022-08-22T01:22:55Z

你的trt版本？

TensorRT-8.4.1.5

Linaom1214 · 2022-08-22T01:24:03Z

show me the pytorch code?

YFforever2022 · 2022-08-22T01:26:16Z

show me the pytorch code?

Linaom1214 · 2022-08-22T01:28:58Z

show me the pytorch code?

Could you do this exp in the colab env? [use the T4 ]

Maybe the 1080 is too old.

YFforever2022 · 2022-08-22T01:30:05Z

I'll try

Linaom1214 · 2022-08-22T01:31:08Z

I'll try

thanks, expect your report!

lxzatwowone1 · 2022-08-22T01:50:31Z

我测了确实快很多，就是置信度结果不对，您的对么？

YFforever2022 · 2022-08-22T03:20:27Z

我测了确实快很多，就是置信度结果不对，您的对么？

如果您使用pred.get_fps() 获取FPS会得到180-200左右，相当于5ms左右耗时，但这并不是整个识别的流程耗时

Linaom1214 · 2022-08-22T03:33:38Z

我测了确实快很多，就是置信度结果不对，您的对么？

如果您使用pred.get_fps() 获取FPS会得到180-200左右，相当于5ms左右耗时，但这并不是整个识别的流程耗时

是的，目前大部分汇报FPS 都是指推理时间耗时

YFforever2022 · 2022-08-22T03:43:33Z

我测了确实快很多，就是置信度结果不对，您的对么？

如果您使用pred.get_fps() 获取FPS会得到180-200左右，相当于5ms左右耗时，但这并不是整个识别的流程耗时

是的，目前大部分汇报FPS 都是指推理时间耗时
是的
不过我自己统计的是，将图片传入推理的那一刻开始计时，直到返回推理结果，期间的耗时。
无论是pt模型还是trt模型，结果都是正确的，只是trt模型的这个流程耗时较pt模型久一些

Linaom1214 · 2022-08-22T04:00:25Z

我测了确实快很多，就是置信度结果不对，您的对么？

如果您使用pred.get_fps() 获取FPS会得到180-200左右，相当于5ms左右耗时，但这并不是整个识别的流程耗时

是的，目前大部分汇报FPS 都是指推理时间耗时
是的
不过我自己统计的是，将图片传入推理的那一刻开始计时，直到返回推理结果，期间的耗时。
无论是pt模型还是trt模型，结果都是正确的，只是trt模型的这个流程耗时较pt模型久一些

T4上也是如此吗？

YFforever2022 · 2022-08-22T08:51:01Z

我测了确实快很多，就是置信度结果不对，您的对么？

如果您使用pred.get_fps() 获取FPS会得到180-200左右，相当于5ms左右耗时，但这并不是整个识别的流程耗时

是的，目前大部分汇报FPS 都是指推理时间耗时
是的
不过我自己统计的是，将图片传入推理的那一刻开始计时，直到返回推理结果，期间的耗时。
无论是pt模型还是trt模型，结果都是正确的，只是trt模型的这个流程耗时较pt模型久一些

T4上也是如此吗？

这是Colab环境的推理速度，稍后我将直接推理pt模型

YFforever2022 · 2022-08-22T09:04:03Z

YFforever2022 · 2022-08-22T09:09:06Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

Linaom1214 · 2022-08-22T09:11:39Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

v7 的预处理流程和本仓库的不一致，建议将预处理统一，重新测试
https://github.com/WongKinYiu/yolov7/blob/064c71e7c261172dd8d7250444c4f5375bebdc66/utils/datasets.py#L984

Linaom1214 · 2022-08-22T09:27:37Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)):
    image = np.array(image, np.float32)
    image = image[:, :, ::-1]
    oh, ow = image.shape[:2]
    dh, dw = input_size
    scale = min(dw / ow, dh / oh)

    M = np.array([
        [scale, 0, 0],
        [0, scale, 0]
    ])
    padded_img = cv2.warpAffine(image, M, input_size)
    padded_img /= 255.
    if mean is not None:
        padded_img -= mean
    if std is not None:
        padded_img /= std
    padded_img = padded_img.transpose(swap)
    padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
    return padded_img, scale

期待您的测试，建议您使用这个预处理方法重新测试，如果有效的话，我们将在以后版本中使用该预处理方法

YFforever2022 · 2022-08-22T09:32:38Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)):
    image = np.array(image, np.float32)
    image = image[:, :, ::-1]
    oh, ow = image.shape[:2]
    dh, dw = input_size
    scale = min(dw / ow, dh / oh)

    M = np.array([
        [scale, 0, 0],
        [0, scale, 0]
    ])
    padded_img = cv2.warpAffine(image, M, input_size)
    padded_img /= 255.
    if mean is not None:
        padded_img -= mean
    if std is not None:
        padded_img /= std
    padded_img = padded_img.transpose(swap)
    padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
    return padded_img, scale

期待您的测试，建议您使用这个预处理方法重新测试，如果有效的话，我们将在以后版本中使用该预处理方法

感谢Linaom1214老师的耐心解答，这份新的代码结果看起来更糟
trt best.trt
[08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0
[08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0
loading model...ok
202.96470545799218 FPS 4.929918999550864 ms
22.16935157775879 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

22.135257720947266 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

20.000696182250977 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

19.888877868652344 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

20.226716995239258 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

20.84493637084961 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

20.862102508544922 ms
0,293,320,154,115,0.9926884174346924
1,446,381,175,95,0.9879250526428223
1,460,296,166,82,0.9829330444335938

Linaom1214 · 2022-08-22T09:37:55Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近
def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)):
    image = np.array(image, np.float32)
    image = image[:, :, ::-1]
    oh, ow = image.shape[:2]
    dh, dw = input_size
    scale = min(dw / ow, dh / oh)

    M = np.array([
        [scale, 0, 0],
        [0, scale, 0]
    ])
    padded_img = cv2.warpAffine(image, M, input_size)
    padded_img /= 255.
    if mean is not None:
        padded_img -= mean
    if std is not None:
        padded_img /= std
    padded_img = padded_img.transpose(swap)
    padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
    return padded_img, scale
期待您的测试，建议您使用这个预处理方法重新测试，如果有效的话，我们将在以后版本中使用该预处理方法
感谢Linaom1214老师的耐心解答，这份新的代码结果看起来更糟 trt best.trt [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 loading model...ok 202.96470545799218 FPS 4.929918999550864 ms 22.16935157775879 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

22.135257720947266 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

20.000696182250977 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

19.888877868652344 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

20.226716995239258 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

20.84493637084961 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

20.862102508544922 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938

hhhhhhh，等我们尝试一些更稳定的办法，目前来看时间差异应该还是在数据预处理部分， pytorch 的数据加载和一些后处理都在GPU上实现，我们的数据处理是完全基于CPU的，还需要一段时间的优化。感谢您的测试数据

Linaom1214 · 2022-08-22T09:51:05Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

我刚刚意识到一个问题， V7是RePVGG啊

YFforever2022 · 2022-08-22T09:54:58Z

通过以上测试，我认为官方yolov7的pt模型和您的trt模型，推理时间相近

我刚刚意识到一个问题， V7是RePVGG啊

:)表示不懂，初学连个框架都不清楚

YFforever2022 · 2022-08-22T09:58:30Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

Linaom1214 · 2022-08-22T10:00:19Z

具体是哪个代码呢？端到端代码吗？

Linaom1214 · 2022-08-22T10:00:45Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

可以尝试更稳定的V5

Linaom1214 · 2022-08-22T10:01:39Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

目前v7 使用onnx-> trt 精度也存在损失的问题

YFforever2022 · 2022-08-22T10:01:41Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

可以尝试更稳定的V5

好的，抽时间尝试一下，今天第一次使用Colab ，体验不错

Linaom1214 · 2022-08-22T10:02:30Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

YFforever2022 · 2022-08-22T10:02:59Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

end2end这个

YFforever2022 · 2022-08-22T10:04:25Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

写的是读取共享内存，C++编译出来的是个通信程序，通过发送WMCOPYDATA信息，得到数据长度，然后读取共享内存里的图片，进行识别，完成后将识别结果写到共享内存，关闭共享内存映射，返回结果的数据长度，另一个程序就可以读取共享内存获得结果

Linaom1214 · 2022-08-22T10:06:52Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

写的是读取共享内存，C++编译出来的是个通信程序，通过发送WMCOPYDATA信息，得到数据长度，然后读取共享内存里的图片，进行识别，完成后将识别结果写到共享内存，关闭共享内存映射，返回结果的数据长度，另一个程序就可以读取共享内存获得结果

可以用不包含nms的方式测试一下吗？仓库提供的代码也都比较简单，如果有什么BUG欢迎反馈

YFforever2022 · 2022-08-22T10:08:15Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

171ms
1 , 458 , 248 , 156 , 82 , 0.919060 , None
1 , 477 , 336 , 140 , 61 , 0.809349 , None
0 , 309 , 319 , 154 , 104 , 0.643329 , None
172ms
1 , 458 , 248 , 156 , 82 , 0.918909 , None
1 , 477 , 336 , 140 , 61 , 0.808587 , None
0 , 309 , 319 , 154 , 104 , 0.643854 , None
172ms
1 , 458 , 248 , 156 , 82 , 0.918983 , None
1 , 477 , 336 , 140 , 61 , 0.808988 , None
0 , 309 , 319 , 154 , 104 , 0.644179 , None
171ms
1 , 458 , 248 , 156 , 82 , 0.919059 , None
1 , 477 , 336 , 140 , 61 , 0.809353 , None
0 , 309 , 319 , 154 , 104 , 0.643329 , None
172ms
1 , 458 , 248 , 156 , 82 , 0.919059 , None
1 , 477 , 336 , 140 , 61 , 0.809353 , None
0 , 309 , 319 , 154 , 104 , 0.643329 , None
172ms
1 , 458 , 248 , 156 , 82 , 0.919059 , None
1 , 477 , 336 , 140 , 61 , 0.809353 , None
0 , 309 , 319 , 154 , 104 , 0.643329 , None

这个耗时是C++启动nms trt模型消耗的
同样的代码启动pt模型耗时在10+ms

YFforever2022 · 2022-08-22T10:10:54Z

顺便一提，昨天，使用C++部署了一下，结果耗时久到吓人，直接用pt模型，通信截图识别耗时在15-31ms，使用trt模型耗时60-120ms，暂时不清楚哪里搞错，都是来自0延时循环高速请求识别

C++ 这个可以详细说说吗？

写的是读取共享内存，C++编译出来的是个通信程序，通过发送WMCOPYDATA信息，得到数据长度，然后读取共享内存里的图片，进行识别，完成后将识别结果写到共享内存，关闭共享内存映射，返回结果的数据长度，另一个程序就可以读取共享内存获得结果

可以用不包含nms的方式测试一下吗？仓库提供的代码也都比较简单，如果有什么BUG欢迎反馈

好的，这需要一些时间，缺少dirent.h文件，并且部分配置未完成

Linaom1214 · 2022-08-22T10:12:20Z

这也太夸张了，pt是用libtorch调用吗？

YFforever2022 · 2022-08-22T10:13:22Z

这也太夸张了，pt是用libtorch调用吗？

C++没有使用pt，是使用pyinstaller编译的一套大文件框架，使用的pytorh，这在多台计算机上移动不太方便，4G+文件空间

YFforever2022 · 2022-08-22T10:16:40Z

这也太夸张了，pt是用libtorch调用吗？

补充一下我说的同样的C++代码，是因为之前使用C++编译了yolov4调用weights模型，使用的同样的代码，yolov4耗时在50ms内

Linaom1214 · 2022-08-22T10:22:33Z

单张图片推理
C++ end2end yolov6s

trtuser@0dee88e59c94:/workspace/TensorRT/TensorRT-For-YOLO-Series/TensorRT-For-YOLO-Series/end2end/build$ ./yolo  -model_path ../../yolov6s.trt -image_path  ../../src/1.jpg
model size: 173714332
Registered plugin creator - ::GridAnchor_TRT version 1
Registered plugin creator - ::GridAnchorRect_TRT version 1
Registered plugin creator - ::NMS_TRT version 1
Registered plugin creator - ::Reorg_TRT version 1
Registered plugin creator - ::Region_TRT version 1
Registered plugin creator - ::Clip_TRT version 1
Registered plugin creator - ::LReLU_TRT version 1
Registered plugin creator - ::PriorBox_TRT version 1
Registered plugin creator - ::Normalize_TRT version 1
Registered plugin creator - ::ScatterND version 1
Registered plugin creator - ::RPROI_TRT version 1
Registered plugin creator - ::BatchedNMS_TRT version 1
Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
Registered plugin creator - ::FlattenConcat_TRT version 1
Registered plugin creator - ::CropAndResize version 1
Registered plugin creator - ::DetectionLayer_TRT version 1
Registered plugin creator - ::EfficientNMS_TRT version 1
Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1
Registered plugin creator - ::Proposal version 1
Registered plugin creator - ::ProposalLayer_TRT version 1
Registered plugin creator - ::PyramidROIAlign_TRT version 1
Registered plugin creator - ::ResizeNearest_TRT version 1
Registered plugin creator - ::Split version 1
Registered plugin creator - ::SpecialSlice_TRT version 1
Registered plugin creator - ::InstanceNormalization_TRT version 1
Using cublas as a tactic source
TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.6.1
Using cuDNN as a tactic source
Deserialization required 505462 microseconds.
Using cublas as a tactic source
TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.6.1
Using cuDNN as a tactic source
Total per-runner device persistent memory is 169579008
Total per-runner host persistent memory is 64960
Allocated activation device memory of size 26913280
7ms

YFforever2022 · 2022-08-22T10:28:17Z

你的v6好快，我的yolov7-tiny.trt end2end 直接使用您的C++文件编译出来，耗时20ms

Linaom1214 · 2022-08-22T11:39:09Z

你的v6好快，我的yolov7-tiny.trt end2end 直接使用您的C++文件编译出来，耗时20ms

v7 的end2end c++ 我还真没试过，我的TRT版本是8.2的 onnx模型有一个节点识别不了

YFforever2022 · 2022-08-22T12:11:36Z

engine init finished
blob image
[08/22/2022-20:08:39] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[08/22/2022-20:08:39] [W] [TRT] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead.
[08/22/2022-20:08:39] [W] [TRT] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead.
15ms
num of boxes before nms: 62
num of boxes: 6
0 = 0.90573 at 53.16 398.86 189.65 x 500.30
5 = 0.90219 at 13.93 234.68 770.86 x 508.74
0 = 0.89119 at 220.63 412.31 128.91 x 446.84
0 = 0.88738 at 666.77 394.24 142.23 x 481.20
0 = 0.61789 at 0.00 558.58 75.63 x 327.18
11 = 0.23620 at 0.41 252.30 33.84 x 71.59
save vis file
yolo destroy

yolov7-tiny.trt normal竟然比end2end更快

yolo.hpp开头需要增加#define NOMINMAX
以及代码中的363-364行改为如下
const char* INPUT_BLOB_NAME = "images";//image_arrays
const char* OUTPUT_BLOB_NAME = "output";

还有自己新建一个dirent.h文件

YFforever2022 · 2022-08-22T12:12:34Z

dirent.h文件内容

/*
 `*` Dirent interface for Microsoft Visual Studio
 *
 * Copyright (C) 1998-2019 Toni Ronkko
 * This file is part of dirent.  Dirent may be freely distributed
 * under the MIT license.  For all details and documentation, see
 * https://github.com/tronkko/dirent
 */

#define DIRENT_H

/* Hide warnings about unreferenced local functions */
#if defined(__clang__)
#   pragma clang diagnostic ignored "-Wunused-function"
#elif defined(_MSC_VER)
#   pragma warning(disable:4505)
#elif defined(__GNUC__)
#   pragma GCC diagnostic ignored "-Wunused-function"
#endif

/*
 * Include windows.h without Windows Sockets 1.1 to prevent conflicts with
 * Windows Sockets 2.0.
 */
#ifndef WIN32_LEAN_AND_MEAN
#   define WIN32_LEAN_AND_MEAN
#endif
#include <windows.h>

#include <stdio.h>
#include <stdarg.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

/* Indicates that d_type field is available in dirent structure */
#define _DIRENT_HAVE_D_TYPE


#define _DIRENT_HAVE_D_NAMLEN

/* Entries missing from MSVC 6.0 */
#if !defined(FILE_ATTRIBUTE_DEVICE)
#   define FILE_ATTRIBUTE_DEVICE 0x40
#endif

/* File type and permission flags for stat(), general mask */
#if !defined(S_IFMT)
#   define S_IFMT _S_IFMT
#endif

/* Directory bit */
#if !defined(S_IFDIR)
#   define S_IFDIR _S_IFDIR
#endif

/* Character device bit */
#if !defined(S_IFCHR)
#   define S_IFCHR _S_IFCHR
#endif

/* Pipe bit */
#if !defined(S_IFFIFO)
#   define S_IFFIFO _S_IFFIFO
#endif

/* Regular file bit */
#if !defined(S_IFREG)
#   define S_IFREG _S_IFREG
#endif

/* Read permission */
#if !defined(S_IREAD)
#   define S_IREAD _S_IREAD
#endif

/* Write permission */
#if !defined(S_IWRITE)
#   define S_IWRITE _S_IWRITE
#endif

/* Execute permission */
#if !defined(S_IEXEC)
#   define S_IEXEC _S_IEXEC
#endif

/* Pipe */
#if !defined(S_IFIFO)
#   define S_IFIFO _S_IFIFO
#endif

/* Block device */
#if !defined(S_IFBLK)
#   define S_IFBLK 0
#endif

/* Link */
#if !defined(S_IFLNK)
#   define S_IFLNK 0
#endif

/* Socket */
#if !defined(S_IFSOCK)
#   define S_IFSOCK 0
#endif

/* Read user permission */
#if !defined(S_IRUSR)
#   define S_IRUSR S_IREAD
#endif

/* Write user permission */
#if !defined(S_IWUSR)
#   define S_IWUSR S_IWRITE
#endif

/* Execute user permission */
#if !defined(S_IXUSR)
#   define S_IXUSR 0
#endif

/* Read group permission */
#if !defined(S_IRGRP)
#   define S_IRGRP 0
#endif

/* Write group permission */
#if !defined(S_IWGRP)
#   define S_IWGRP 0
#endif

/* Execute group permission */
#if !defined(S_IXGRP)
#   define S_IXGRP 0
#endif

/* Read others permission */
#if !defined(S_IROTH)
#   define S_IROTH 0
#endif

/* Write others permission */
#if !defined(S_IWOTH)
#   define S_IWOTH 0
#endif

/* Execute others permission */
#if !defined(S_IXOTH)
#   define S_IXOTH 0
#endif

/* Maximum length of file name */
#if !defined(PATH_MAX)
#   define PATH_MAX MAX_PATH
#endif
#if !defined(FILENAME_MAX)
#   define FILENAME_MAX MAX_PATH
#endif
#if !defined(NAME_MAX)
#   define NAME_MAX FILENAME_MAX
#endif

/* File type flags for d_type */
#define DT_UNKNOWN 0
#define DT_REG S_IFREG
#define DT_DIR S_IFDIR
#define DT_FIFO S_IFIFO
#define DT_SOCK S_IFSOCK
#define DT_CHR S_IFCHR
#define DT_BLK S_IFBLK
#define DT_LNK S_IFLNK

/* Macros for converting between st_mode and d_type */
#define IFTODT(mode) ((mode) & S_IFMT)
#define DTTOIF(type) (type)

/*
 * File type macros.  Note that block devices, sockets and links cannot be
 * distinguished on Windows and the macros S_ISBLK, S_ISSOCK and S_ISLNK are
 * only defined for compatibility.  These macros should always return false
 * on Windows.
 */
#if !defined(S_ISFIFO)
#   define S_ISFIFO(mode) (((mode) & S_IFMT) == S_IFIFO)
#endif
#if !defined(S_ISDIR)
#   define S_ISDIR(mode) (((mode) & S_IFMT) == S_IFDIR)
#endif
#if !defined(S_ISREG)
#   define S_ISREG(mode) (((mode) & S_IFMT) == S_IFREG)
#endif
#if !defined(S_ISLNK)
#   define S_ISLNK(mode) (((mode) & S_IFMT) == S_IFLNK)
#endif
#if !defined(S_ISSOCK)
#   define S_ISSOCK(mode) (((mode) & S_IFMT) == S_IFSOCK)
#endif
#if !defined(S_ISCHR)
#   define S_ISCHR(mode) (((mode) & S_IFMT) == S_IFCHR)
#endif
#if !defined(S_ISBLK)
#   define S_ISBLK(mode) (((mode) & S_IFMT) == S_IFBLK)
#endif

/* Return the exact length of the file name without zero terminator */
#define _D_EXACT_NAMLEN(p) ((p)->d_namlen)

/* Return the maximum size of a file name */
#define _D_ALLOC_NAMLEN(p) ((PATH_MAX)+1)


#ifdef __cplusplus
extern "C" {
#endif


/* Wide-character version */
struct _wdirent {
    /* Always zero */
    long d_ino;

    /* File position within stream */
    long d_off;

    /* Structure size */
    unsigned short d_reclen;

    /* Length of name without \0 */
    size_t d_namlen;

    /* File type */
    int d_type;

    /* File name */
    wchar_t d_name[PATH_MAX+1];
};
typedef struct _wdirent _wdirent;

struct _WDIR {
    /* Current directory entry */
    struct _wdirent ent;

    /* Private file data */
    WIN32_FIND_DATAW data;

    /* True if data is valid */
    int cached;

    /* Win32 search handle */
    HANDLE handle;

    /* Initial directory name */
    wchar_t *patt;
};
typedef struct _WDIR _WDIR;

/* Multi-byte character version */
struct dirent {
    /* Always zero */
    long d_ino;

    /* File position within stream */
    long d_off;

    /* Structure size */
    unsigned short d_reclen;

    /* Length of name without \0 */
    size_t d_namlen;

    /* File type */
    int d_type;

    /* File name */
    char d_name[PATH_MAX+1];
};
typedef struct dirent dirent;

struct DIR {
    struct dirent ent;
    struct _WDIR *wdirp;
};
typedef struct DIR DIR;


/* Dirent functions */
static DIR *opendir (const char *dirname);
static _WDIR *_wopendir (const wchar_t *dirname);

static struct dirent *readdir (DIR *dirp);
static struct _wdirent *_wreaddir (_WDIR *dirp);

static int readdir_r(
    DIR *dirp, struct dirent *entry, struct dirent **result);
static int _wreaddir_r(
    _WDIR *dirp, struct _wdirent *entry, struct _wdirent **result);

static int closedir (DIR *dirp);
static int _wclosedir (_WDIR *dirp);

static void rewinddir (DIR* dirp);
static void _wrewinddir (_WDIR* dirp);

static int scandir (const char *dirname, struct dirent ***namelist,
    int (*filter)(const struct dirent*),
    int (*compare)(const struct dirent**, const struct dirent**));

static int alphasort (const struct dirent **a, const struct dirent **b);

static int versionsort (const struct dirent **a, const struct dirent **b);


/* For compatibility with Symbian */
#define wdirent _wdirent
#define WDIR _WDIR
#define wopendir _wopendir
#define wreaddir _wreaddir
#define wclosedir _wclosedir
#define wrewinddir _wrewinddir


/* Internal utility functions */
static WIN32_FIND_DATAW *dirent_first (_WDIR *dirp);
static WIN32_FIND_DATAW *dirent_next (_WDIR *dirp);

static int dirent_mbstowcs_s(
    size_t *pReturnValue,
    wchar_t *wcstr,
    size_t sizeInWords,
    const char *mbstr,
    size_t count);

static int dirent_wcstombs_s(
    size_t *pReturnValue,
    char *mbstr,
    size_t sizeInBytes,
    const wchar_t *wcstr,
    size_t count);

static void dirent_set_errno (int error);


/*
 * Open directory stream DIRNAME for read and return a pointer to the
 * internal working area that is used to retrieve individual directory
 * entries.
 */
static _WDIR*
_wopendir(
    const wchar_t *dirname)
{
    _WDIR *dirp;
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
    /* Desktop */
    DWORD n;
#else
    /* WinRT */
    size_t n;
#endif
    wchar_t *p;

    /* Must have directory name */
    if (dirname == NULL  ||  dirname[0] == '\0') {
        dirent_set_errno (ENOENT);
        return NULL;
    }

    /* Allocate new _WDIR structure */
    dirp = (_WDIR*) malloc (sizeof (struct _WDIR));
    if (!dirp) {
        return NULL;
    }

    /* Reset _WDIR structure */
    dirp->handle = INVALID_HANDLE_VALUE;
    dirp->patt = NULL;
    dirp->cached = 0;

    /*
     * Compute the length of full path plus zero terminator
     *
     * Note that on WinRT there's no way to convert relative paths
     * into absolute paths, so just assume it is an absolute path.
     */
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
    /* Desktop */
    n = GetFullPathNameW (dirname, 0, NULL, NULL);
#else
    /* WinRT */
    n = wcslen (dirname);
#endif

    /* Allocate room for absolute directory name and search pattern */
    dirp->patt = (wchar_t*) malloc (sizeof (wchar_t) * n + 16);
    if (dirp->patt == NULL) {
        goto exit_closedir;
    }

    /*
     * Convert relative directory name to an absolute one.  This
     * allows rewinddir() to function correctly even when current
     * working directory is changed between opendir() and rewinddir().
     *
     * Note that on WinRT there's no way to convert relative paths
     * into absolute paths, so just assume it is an absolute path.
     */
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
    /* Desktop */
    n = GetFullPathNameW (dirname, n, dirp->patt, NULL);
    if (n <= 0) {
        goto exit_closedir;
    }
#else
    /* WinRT */
    wcsncpy_s (dirp->patt, n+1, dirname, n);
#endif

    /* Append search pattern \* to the directory name */
    p = dirp->patt + n;
    switch (p[-1]) {
    case '\\':
    case '/':
    case ':':
        /* Directory ends in path separator, e.g. c:\temp\ */
        /*NOP*/;
        break;

    default:
        /* Directory name doesn't end in path separator */
        *p++ = '\\';
    }
    *p++ = '*';
    *p = '\0';

    /* Open directory stream and retrieve the first entry */
    if (!dirent_first (dirp)) {
        goto exit_closedir;
    }

    /* Success */
    return dirp;

    /* Failure */
exit_closedir:
    _wclosedir (dirp);
    return NULL;
}

/*
 * Read next directory entry.
 *
 * Returns pointer to static directory entry which may be overwritten by
 * subsequent calls to _wreaddir().
 */
static struct _wdirent*
_wreaddir(
    _WDIR *dirp)
{
    struct _wdirent *entry;

    /*
     * Read directory entry to buffer.  We can safely ignore the return value
     * as entry will be set to NULL in case of error.
     */
    (void) _wreaddir_r (dirp, &dirp->ent, &entry);

    /* Return pointer to statically allocated directory entry */
    return entry;
}

/*
 * Read next directory entry.
 *
 * Returns zero on success.  If end of directory stream is reached, then sets
 * result to NULL and returns zero.
 */
static int
_wreaddir_r(
    _WDIR *dirp,
    struct _wdirent *entry,
    struct _wdirent **result)
{
    WIN32_FIND_DATAW *datap;

    /* Read next directory entry */
    datap = dirent_next (dirp);
    if (datap) {
        size_t n;
        DWORD attr;

        /*
         * Copy file name as wide-character string.  If the file name is too
         * long to fit in to the destination buffer, then truncate file name
         * to PATH_MAX characters and zero-terminate the buffer.
         */
        n = 0;
        while (n < PATH_MAX  &&  datap->cFileName[n] != 0) {
            entry->d_name[n] = datap->cFileName[n];
            n++;
        }
        entry->d_name[n] = 0;

        /* Length of file name excluding zero terminator */
        entry->d_namlen = n;

        /* File type */
        attr = datap->dwFileAttributes;
        if ((attr & FILE_ATTRIBUTE_DEVICE) != 0) {
            entry->d_type = DT_CHR;
        } else if ((attr & FILE_ATTRIBUTE_DIRECTORY) != 0) {
            entry->d_type = DT_DIR;
        } else {
            entry->d_type = DT_REG;
        }

        /* Reset dummy fields */
        entry->d_ino = 0;
        entry->d_off = 0;
        entry->d_reclen = sizeof (struct _wdirent);

        /* Set result address */
        *result = entry;

    } else {

        /* Return NULL to indicate end of directory */
        *result = NULL;

    }

    return /*OK*/0;
}

/*
 * Close directory stream opened by opendir() function.  This invalidates the
 * DIR structure as well as any directory entry read previously by
 * _wreaddir().
 */
static int
_wclosedir(
    _WDIR *dirp)
{
    int ok;
    if (dirp) {

        /* Release search handle */
        if (dirp->handle != INVALID_HANDLE_VALUE) {
            FindClose (dirp->handle);
        }

        /* Release search pattern */
        free (dirp->patt);

        /* Release directory structure */
        free (dirp);
        ok = /*success*/0;

    } else {

        /* Invalid directory stream */
        dirent_set_errno (EBADF);
        ok = /*failure*/-1;

    }
    return ok;
}

/*
 * Rewind directory stream such that _wreaddir() returns the very first
 * file name again.
 */
static void
_wrewinddir(
    _WDIR* dirp)
{
    if (dirp) {
        /* Release existing search handle */
        if (dirp->handle != INVALID_HANDLE_VALUE) {
            FindClose (dirp->handle);
        }

        /* Open new search handle */
        dirent_first (dirp);
    }
}

/* Get first directory entry (internal) */
static WIN32_FIND_DATAW*
dirent_first(
    _WDIR *dirp)
{
    WIN32_FIND_DATAW *datap;
    DWORD error;

    /* Open directory and retrieve the first entry */
    dirp->handle = FindFirstFileExW(
        dirp->patt, FindExInfoStandard, &dirp->data,
        FindExSearchNameMatch, NULL, 0);
    if (dirp->handle != INVALID_HANDLE_VALUE) {

        /* a directory entry is now waiting in memory */
        datap = &dirp->data;
        dirp->cached = 1;

    } else {

        /* Failed to open directory: no directory entry in memory */
        dirp->cached = 0;
        datap = NULL;

        /* Set error code */
        error = GetLastError ();
        switch (error) {
        case ERROR_ACCESS_DENIED:
            /* No read access to directory */
            dirent_set_errno (EACCES);
            break;

        case ERROR_DIRECTORY:
            /* Directory name is invalid */
            dirent_set_errno (ENOTDIR);
            break;

        case ERROR_PATH_NOT_FOUND:
        default:
            /* Cannot find the file */
            dirent_set_errno (ENOENT);
        }

    }
    return datap;
}

/*
 * Get next directory entry (internal).
 *
 * Returns
 */
static WIN32_FIND_DATAW*
dirent_next(
    _WDIR *dirp)
{
    WIN32_FIND_DATAW *p;

    /* Get next directory entry */
    if (dirp->cached != 0) {

        /* A valid directory entry already in memory */
        p = &dirp->data;
        dirp->cached = 0;

    } else if (dirp->handle != INVALID_HANDLE_VALUE) {

        /* Get the next directory entry from stream */
        if (FindNextFileW (dirp->handle, &dirp->data) != FALSE) {
            /* Got a file */
            p = &dirp->data;
        } else {
            /* The very last entry has been processed or an error occurred */
            FindClose (dirp->handle);
            dirp->handle = INVALID_HANDLE_VALUE;
            p = NULL;
        }

    } else {

        /* End of directory stream reached */
        p = NULL;

    }

    return p;
}

/*
 * Open directory stream using plain old C-string.
 */
static DIR*
opendir(
    const char *dirname)
{
    struct DIR *dirp;

    /* Must have directory name */
    if (dirname == NULL  ||  dirname[0] == '\0') {
        dirent_set_errno (ENOENT);
        return NULL;
    }

    /* Allocate memory for DIR structure */
    dirp = (DIR*) malloc (sizeof (struct DIR));
    if (!dirp) {
        return NULL;
    }
    {
        int error;
        wchar_t wname[PATH_MAX + 1];
        size_t n;

        /* Convert directory name to wide-character string */
        error = dirent_mbstowcs_s(
            &n, wname, PATH_MAX + 1, dirname, PATH_MAX + 1);
        if (error) {
            /*
             * Cannot convert file name to wide-character string.  This
             * occurs if the string contains invalid multi-byte sequences or
             * the output buffer is too small to contain the resulting
             * string.
             */
            goto exit_free;
        }


        /* Open directory stream using wide-character name */
        dirp->wdirp = _wopendir (wname);
        if (!dirp->wdirp) {
            goto exit_free;
        }

    }

    /* Success */
    return dirp;

    /* Failure */
exit_free:
    free (dirp);
    return NULL;
}

/*
 * Read next directory entry.
 */
static struct dirent*
readdir(
    DIR *dirp)
{
    struct dirent *entry;

    /*
     * Read directory entry to buffer.  We can safely ignore the return value
     * as entry will be set to NULL in case of error.
     */
    (void) readdir_r (dirp, &dirp->ent, &entry);

    /* Return pointer to statically allocated directory entry */
    return entry;
}

/*
 * Read next directory entry into called-allocated buffer.
 *
 * Returns zero on success.  If the end of directory stream is reached, then
 * sets result to NULL and returns zero.
 */
static int
readdir_r(
    DIR *dirp,
    struct dirent *entry,
    struct dirent **result)
{
    WIN32_FIND_DATAW *datap;

    /* Read next directory entry */
    datap = dirent_next (dirp->wdirp);
    if (datap) {
        size_t n;
        int error;

        /* Attempt to convert file name to multi-byte string */
        error = dirent_wcstombs_s(
            &n, entry->d_name, PATH_MAX + 1, datap->cFileName, PATH_MAX + 1);

        /*
         * If the file name cannot be represented by a multi-byte string,
         * then attempt to use old 8+3 file name.  This allows traditional
         * Unix-code to access some file names despite of unicode
         * characters, although file names may seem unfamiliar to the user.
         *
         * Be ware that the code below cannot come up with a short file
         * name unless the file system provides one.  At least
         * VirtualBox shared folders fail to do this.
         */
        if (error  &&  datap->cAlternateFileName[0] != '\0') {
            error = dirent_wcstombs_s(
                &n, entry->d_name, PATH_MAX + 1,
                datap->cAlternateFileName, PATH_MAX + 1);
        }

        if (!error) {
            DWORD attr;

            /* Length of file name excluding zero terminator */
            entry->d_namlen = n - 1;

            /* File attributes */
            attr = datap->dwFileAttributes;
            if ((attr & FILE_ATTRIBUTE_DEVICE) != 0) {
                entry->d_type = DT_CHR;
            } else if ((attr & FILE_ATTRIBUTE_DIRECTORY) != 0) {
                entry->d_type = DT_DIR;
            } else {
                entry->d_type = DT_REG;
            }

            /* Reset dummy fields */
            entry->d_ino = 0;
            entry->d_off = 0;
            entry->d_reclen = sizeof (struct dirent);

        } else {

            /*
             * Cannot convert file name to multi-byte string so construct
             * an erroneous directory entry and return that.  Note that
             * we cannot return NULL as that would stop the processing
             * of directory entries completely.
             */
            entry->d_name[0] = '?';
            entry->d_name[1] = '\0';
            entry->d_namlen = 1;
            entry->d_type = DT_UNKNOWN;
            entry->d_ino = 0;
            entry->d_off = -1;
            entry->d_reclen = 0;

        }

        /* Return pointer to directory entry */
        *result = entry;

    } else {

        /* No more directory entries */
        *result = NULL;

    }

    return /*OK*/0;
}

/*
 * Close directory stream.
 */
static int
closedir(
    DIR *dirp)
{
    int ok;
    if (dirp) {

        /* Close wide-character directory stream */
        ok = _wclosedir (dirp->wdirp);
        dirp->wdirp = NULL;

        /* Release multi-byte character version */
        free (dirp);

    } else {

        /* Invalid directory stream */
        dirent_set_errno (EBADF);
        ok = /*failure*/-1;

    }
    return ok;
}

/*
 * Rewind directory stream to beginning.
 */
static void
rewinddir(
    DIR* dirp)
{
    /* Rewind wide-character string directory stream */
    _wrewinddir (dirp->wdirp);
}

/*
 * Scan directory for entries.
 */
static int
scandir(
    const char *dirname,
    struct dirent ***namelist,
    int (*filter)(const struct dirent*),
    int (*compare)(const struct dirent**, const struct dirent**))
{
    struct dirent **files = NULL;
    size_t size = 0;
    size_t allocated = 0;
    const size_t init_size = 1;
    DIR *dir = NULL;
    struct dirent *entry;
    struct dirent *tmp = NULL;
    size_t i;
    int result = 0;

    /* Open directory stream */
    dir = opendir (dirname);
    if (dir) {

        /* Read directory entries to memory */
        while (1) {

            /* Enlarge pointer table to make room for another pointer */
            if (size >= allocated) {
                void *p;
                size_t num_entries;

                /* Compute number of entries in the enlarged pointer table */
                if (size < init_size) {
                    /* Allocate initial pointer table */
                    num_entries = init_size;
                } else {
                    /* Double the size */
                    num_entries = size * 2;
                }

                /* Allocate first pointer table or enlarge existing table */
                p = realloc (files, sizeof (void*) * num_entries);
                if (p != NULL) {
                    /* Got the memory */
                    files = (dirent**) p;
                    allocated = num_entries;
                } else {
                    /* Out of memory */
                    result = -1;
                    break;
                }

            }

            /* Allocate room for temporary directory entry */
            if (tmp == NULL) {
                tmp = (struct dirent*) malloc (sizeof (struct dirent));
                if (tmp == NULL) {
                    /* Cannot allocate temporary directory entry */
                    result = -1;
                    break;
                }
            }

            /* Read directory entry to temporary area */
            if (readdir_r (dir, tmp, &entry) == /*OK*/0) {

                /* Did we get an entry? */
                if (entry != NULL) {
                    int pass;

                    /* Determine whether to include the entry in result */
                    if (filter) {
                        /* Let the filter function decide */
                        pass = filter (tmp);
                    } else {
                        /* No filter function, include everything */
                        pass = 1;
                    }

                    if (pass) {
                        /* Store the temporary entry to pointer table */
                        files[size++] = tmp;
                        tmp = NULL;

                        /* Keep up with the number of files */
                        result++;
                    }

                } else {

                    /*
                     * End of directory stream reached => sort entries and
                     * exit.
                     */
                    qsort (files, size, sizeof (void*),
                        (int (*) (const void*, const void*)) compare);
                    break;

                }

            } else {
                /* Error reading directory entry */
                result = /*Error*/ -1;
                break;
            }

        }

    } else {
        /* Cannot open directory */
        result = /*Error*/ -1;
    }

    /* Release temporary directory entry */
    free (tmp);

    /* Release allocated memory on error */
    if (result < 0) {
        for (i = 0; i < size; i++) {
            free (files[i]);
        }
        free (files);
        files = NULL;
    }

    /* Close directory stream */
    if (dir) {
        closedir (dir);
    }

    /* Pass pointer table to caller */
    if (namelist) {
        *namelist = files;
    }
    return result;
}

/* Alphabetical sorting */
static int
alphasort(
    const struct dirent **a, const struct dirent **b)
{
    return strcoll ((*a)->d_name, (*b)->d_name);
}

/* Sort versions */
static int
versionsort(
    const struct dirent **a, const struct dirent **b)
{
    /* FIXME: implement strverscmp and use that */
    return alphasort (a, b);
}

/* Convert multi-byte string to wide character string */
static int
dirent_mbstowcs_s(
    size_t *pReturnValue,
    wchar_t *wcstr,
    size_t sizeInWords,
    const char *mbstr,
    size_t count)
{
    int error;

#if defined(_MSC_VER)  &&  _MSC_VER >= 1400

    /* Microsoft Visual Studio 2005 or later */
    error = mbstowcs_s (pReturnValue, wcstr, sizeInWords, mbstr, count);

#else

    /* Older Visual Studio or non-Microsoft compiler */
    size_t n;

    /* Convert to wide-character string (or count characters) */
    n = mbstowcs (wcstr, mbstr, sizeInWords);
    if (!wcstr  ||  n < count) {

        /* Zero-terminate output buffer */
        if (wcstr  &&  sizeInWords) {
            if (n >= sizeInWords) {
                n = sizeInWords - 1;
            }
            wcstr[n] = 0;
        }

        /* Length of resulting multi-byte string WITH zero terminator */
        if (pReturnValue) {
            *pReturnValue = n + 1;
        }

        /* Success */
        error = 0;

    } else {

        /* Could not convert string */
        error = 1;

    }

#endif
    return error;
}

/* Convert wide-character string to multi-byte string */
static int
dirent_wcstombs_s(
    size_t *pReturnValue,
    char *mbstr,
    size_t sizeInBytes, /* max size of mbstr */
    const wchar_t *wcstr,
    size_t count)
{
    int error;

#if defined(_MSC_VER)  &&  _MSC_VER >= 1400

    /* Microsoft Visual Studio 2005 or later */
    error = wcstombs_s (pReturnValue, mbstr, sizeInBytes, wcstr, count);

#else

    /* Older Visual Studio or non-Microsoft compiler */
    size_t n;

    /* Convert to multi-byte string (or count the number of bytes needed) */
    n = wcstombs (mbstr, wcstr, sizeInBytes);
    if (!mbstr  ||  n < count) {

        /* Zero-terminate output buffer */
        if (mbstr  &&  sizeInBytes) {
            if (n >= sizeInBytes) {
                n = sizeInBytes - 1;
            }
            mbstr[n] = '\0';
        }

        /* Length of resulting multi-bytes string WITH zero-terminator */
        if (pReturnValue) {
            *pReturnValue = n + 1;
        }

        /* Success */
        error = 0;

    } else {

        /* Cannot convert string */
        error = 1;

    }

#endif
    return error;
}

/* Set errno variable */
static void
dirent_set_errno(
    int error)
{
#if defined(_MSC_VER)  &&  _MSC_VER >= 1400

    /* Microsoft Visual Studio 2005 and later */
    _set_errno (error);

#else

    /* Non-Microsoft compiler or older Microsoft compiler */
    errno = error;

#endif
}


#ifdef __cplusplus
}
#endif

Linaom1214 · 2022-08-22T12:34:09Z

engine init finished blob image [08/22/2022-20:08:39] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [08/22/2022-20:08:39] [W] [TRT] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [08/22/2022-20:08:39] [W] [TRT] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. 15ms num of boxes before nms: 62 num of boxes: 6 0 = 0.90573 at 53.16 398.86 189.65 x 500.30 5 = 0.90219 at 13.93 234.68 770.86 x 508.74 0 = 0.89119 at 220.63 412.31 128.91 x 446.84 0 = 0.88738 at 666.77 394.24 142.23 x 481.20 0 = 0.61789 at 0.00 558.58 75.63 x 327.18 11 = 0.23620 at 0.41 252.30 33.84 x 71.59 save vis file yolo destroy

yolov7-tiny.trt normal竟然比end2end更快

yolo.hpp开头需要增加#define NOMINMAX 以及代码中的363-364行改为如下 const char* INPUT_BLOB_NAME = "images";//image_arrays const char* OUTPUT_BLOB_NAME = "output";

还有自己新建一个dirent.h文件

看来end2end 这个代码还需要优化呀

YFforever2022 closed this as completed Aug 23, 2022

Why is the TRT model of yolov7 not as fast as the PT model #41

Why is the TRT model of yolov7 not as fast as the PT model #41

Comments

YFforever2022 commented Aug 22, 2022 • edited

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022 • edited

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

lxzatwowone1 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022 • edited

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022 • edited

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022 • edited

YFforever2022 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022

Linaom1214 commented Aug 22, 2022

YFforever2022 commented Aug 22, 2022 •

edited

Linaom1214 commented Aug 22, 2022 •

edited

Linaom1214 commented Aug 22, 2022 •

edited

Linaom1214 commented Aug 22, 2022 •

edited

Linaom1214 commented Aug 22, 2022 •

edited