Skip to content
Permalink
Branch: lm-opt
Commits on Jun 24, 2019
  1. Add AVX512 for 32x8,32x16,32x32,32x64,64x16,64x32 and 64x64 high bd s…

    kirithika7 committed Jun 24, 2019
    …mooth predictor Kernels and unit tests
    
    Following are the performance numbers compared against AVX2.
    
    aom_highbd_smooth_32x8            0.9232643x
    aom_highbd_smooth_32x16         1.1358859x
    aom_highbd_smooth_32x32         1.2735346x
    aom_highbd_smooth_32x64         1.3768220x
    aom_highbd_smooth_64x16         1.2992736x
    aom_highbd_smooth_64x32         1.3684765x
    aom_highbd_smooth_64x64         1.3934337x
Commits on Jun 20, 2019
  1. Add AVX512 full_distortion_kernel32_bits and full_distortion_kernel_c…

    srikanthkurapati authored and agopikrishna13 committed Jun 18, 2019
    …bf_zero32_bits
    
    Following are the results of Unit tests and performance tests compared
    against AVX2 averaged across all 27 size combinations
    
    full_distortion_kernel_cbf_zero32_bits     0.0095345         0.0065718         1.4508127x
    full_distortion_kernel32_bits                       0.0171904         0.0089507         1.9205691x
Commits on Jun 17, 2019
Commits on Jun 13, 2019
Commits on Jun 12, 2019
  1. Add AVX512 aom highbd h and v pedictors and corresponding Unit tests

    srikanthkurapati authored and agopikrishna13 committed Jun 12, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
    aom_highbd_h_pred_32x8       0.9854015x
    aom_highbd_h_pred_32x16    1.2469734x
    aom_highbd_h_pred_32x32    1.2372881x
    aom_highbd_h_pred_32x64    0.9310345x
    aom_highbd_h_pred_64x16    1.6453674x
    aom_highbd_h_pred_64x32    1.7559727x
    aom_highbd_h_pred_64x64    2.0761301x
    
    aom_highbd_v_pred_32x8     1.7662338x
    aom_highbd_v_pred_32x16    1.7945205x
    aom_highbd_v_pred_32x32    1.7986111x
    aom_highbd_v_pred_32x64    1.5029240x
    aom_highbd_v_pred_64x16    1.8309859x
    aom_highbd_v_pred_64x32   1.8267857x
    aom_highbd_v_pred_64x64   1.8388839x
Commits on Jun 11, 2019
  1. Compare only the computed values in forward transforms unit tests and…

    agopikrishna13 committed Jun 11, 2019
    … fix some coding guideline violations.
Commits on Jun 10, 2019
  1. Add AVX512 highbd dc top and dc predictors

    srikanthkurapati authored and kirithika7 committed May 31, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
    aom_highbd_dc_top_32x8                     1.2147239x
    aom_highbd_dc_top_32x16                   1.1875000x
    aom_highbd_dc_top_32x32                   1.7562327x
    aom_highbd_dc_top_32x64                   1.6613419x
    aom_highbd_dc_top_64x16                   1.5671642x
    aom_highbd_dc_top_64x32                   1.7555178x
    aom_highbd_dc_top_64x64                   1.7414966x
    
    aom_highbd_dc_pred_32x8                  1.1918605x
    aom_highbd_dc_pred_32x16                1.7355372x
    aom_highbd_dc_pred_32x32                1.4759207x
    aom_highbd_dc_pred_32x64                1.6085714x
    aom_highbd_dc_pred_64x16                1.4871194x
    aom_highbd_dc_pred_64x32                1.8536977x
    aom_highbd_dc_pred_64x64                1.8000000x
  2. Add AVX512 aom_highbd_dc_left_predictor (32x8, 32x16, 32x32, 32x64, 6…

    srikanthkurapati authored and kirithika7 committed May 29, 2019
    …4x16, 64x32, 64x64)
    
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
    aom_highbd_dc_left_32x8          0.9855072x
    aom_highbd_dc_left_32x16        1.9116279x
    aom_highbd_dc_left_32x32        1.4456233x
    aom_highbd_dc_left_32x64        1.6552262x
    aom_highbd_dc_left_64x16        1.9196429x
    aom_highbd_dc_left_64x32        1.7961336x
    aom_highbd_dc_left_64x64        1.7308622x
Commits on Jun 4, 2019
Commits on Jun 3, 2019
  1. Remove hme_l0 version of sad_loop_kernel()

    lzhangnj committed Jun 3, 2019
    The speed gain of keeping special hme_l0 (with search width to be times of 16) version is trival.
  2. Add sad_loop_kernel_avx512_intrin()

    lzhangnj committed Jun 3, 2019
    Speed up vs. avx2 with search area 64x64:
     4x 2: 1.41x  16x 4: 1.44x  32x 8: 1.62x
     4x 4: 1.51x  16x 8: 1.36x  32x16: 1.30x
     4x 8: 1.34x  16x12: 1.32x  32x24: 1.28x
     4x16: 1.23x  16x16: 1.34x  32x32: 1.30x
     8x 2: 1.42x  16x32: 1.25x  32x64: 1.34x
     8x 4: 1.37x  16x64: 1.26x  48x32: 1.63x
     8x 8: 1.24x  24x16: 1.27x  48x64: 1.71x
     8x16: 1.18x  24x32: 1.21x  64x16: 1.32x
     8x32: 1.12x                64x32: 1.32x
                                64x48: 1.33x
                                64x64: 1.33x
  3. Cherry pick bug fixes in SVT-HEVC

    lzhangnj committed May 29, 2019
    commit 1c3369bd:
    Author: cabirdme
    SadLoopKernel(AVX2) and SadLoopKernelHme0(AVX2/AVX512)
    
    commit 0e43ee38:
    Author: lzhangnj
    Fix a bug in SadLoopKernel_AVX2_HmeL0_INTRIN() and SadLoopKernel_AVX512_HmeL0_INTRIN()
Commits on May 30, 2019
Commits on May 28, 2019
  1. Add AVX512 16x64, 64x16, 32x64, 64x32,32x16,16x32 inverse transforms

    agopikrishna13 authored and kirithika7 committed May 28, 2019
    Following are the results of Unit tests and performance tests compared
    against C
    av1_inv_txfm_16x32 DCT_DCT     15.25x   15603.67        237954.98
    av1_inv_txfm_16x32 IDTX               12.42x   8920.20         110774.59
    av1_inv_txfm_32x16 DCT_DCT     15.99x   16076.80        257134.97
    av1_inv_txfm_32x16 IDTX               11.73x   8785.64         103047.20
    av1_inv_txfm_32x64 DCT_DCT     18.49x   73420.14        1357677.38
    av1_inv_txfm_64x32 DCT_DCT     18.65x   73010.11        1361869.38
    av1_inv_txfm_16x64 DCT_DCT     17.57x   33115.33        581717.94
    av1_inv_txfm_64x16 DCT_DCT     19.55x   32008.93        625640.31
    
    Commented out 64x16,64x32 temporarily due to output mismatch in Linux and Windows respectively.
Commits on May 27, 2019
Commits on May 24, 2019
Commits on May 22, 2019
  1. Add 32x16, 16x32 avx512 highbd fwd transform

    agopikrishna13 authored and kirithika7 committed May 22, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
    av1_fwd_txfm_16x32 DCT_DCT     1.60x    14757.76        23637.86
    av1_fwd_txfm_16x32 IDTX     1.53x    7734.84         11812.00
    av1_fwd_txfm_32x16 DCT_DCT     1.53x    14800.57        22688.14
    av1_fwd_txfm_32x16 IDTX     1.45x    7427.74         10802.36
  2. Add 32x64,64x16,16x64,64x32 avx512 highbd fwd txfm

    agopikrishna13 authored and kirithika7 committed May 20, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
     av1_fwd_txfm_64x16 DCT_DCT     1.50x    28018.70        42138.50
     av1_fwd_txfm_16x64 DCT_DCT     1.46x    28225.05        41247.71
     av1_fwd_txfm_64x32 DCT_DCT     1.72x    62297.01        107405.54
     av1_fwd_txfm_32x64 DCT_DCT     1.68x    63284.36        106119.36
  3. Add sad64x, sad128x and sad128xMx4d AVX512 optimization

    lzhangnj committed May 21, 2019
    For sad64xMx4d, AVX512 is Slower than AVX2 because of worse AVX512 compiler.
    
        sad64x16_AVX2()   :  16.19
        sad64x16_AVX512() :  11.82   (Comparison:  1.37x)
        sad64x32_AVX2()   :  31.37
        sad64x32_AVX512() :  21.43   (Comparison:  1.46x)
        sad64x64_AVX2()   :  61.70
        sad64x64_AVX512() :  44.03   (Comparison:  1.40x)
        sad64x128_AVX2()   : 139.16
        sad64x128_AVX512() :  70.04   (Comparison:  1.99x)
        sad128x64_AVX2()   : 124.19
        sad128x64_AVX512() :  75.34   (Comparison:  1.65x)
        sad128x128_AVX2()   : 463.05
        sad128x128_AVX512() : 396.70   (Comparison:  1.17x)
        sad128x64x4d_AVX2()   : 444.48
        sad128x64x4d_AVX512() : 389.95   (Comparison:  1.14x)
        sad128x128x4d_AVX2()   : 1067.52
        sad128x128x4d_AVX512() : 1039.10   (Comparison:  1.03x)
Commits on May 20, 2019
  1. Fix segmentation fault in linux and enable forward transform 32x32 an…

    agopikrishna13 authored and kirithika7 committed May 20, 2019
    …d 64x64 AVX512 kernels
Commits on May 17, 2019
  1. avx512 high bd inv transform 64x64 code

    srikanthkurapati authored and kirithika7 committed May 15, 2019
    Following are the results of Unit tests and performance tests compared
    against SSE4
    av1_inv_txfm_64x64 DCT_DCT     2.37x    149633.67       353918.03
  2. Add AVX512 intrinsics for highbd_inv_txfm_32x32 kernels

    kirithika7 committed May 14, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    av1_inv_txfm_32x32 DCT_DCT     1.39x    27430.28        38015.54
    av1_inv_txfm_32x32 IDTX     1.67x    3440.48         5732.86
  3. Add AVX 512 intrinsics for highbd_inv_16x16 all transform types

    kirithika7 committed May 3, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
     av1_inv_txfm_16x16 DCT_DCT     1.40x    6690.10         9398.26
            av1_inv_txfm_16x16 ADST_DCT     1.41x    7591.71         10691.76
            av1_inv_txfm_16x16 DCT_ADST     1.42x    7482.08         10608.75
           av1_inv_txfm_16x16 ADST_ADST     1.43x    8305.86         11885.31
        av1_inv_txfm_16x16 FLIPADST_DCT     1.45x    7529.86         10900.24
        av1_inv_txfm_16x16 DCT_FLIPADST     1.38x    8002.28         11015.85
    av1_inv_txfm_16x16 FLIPADST_FLIPADST    1.45x    8745.55         12642.18
       av1_inv_txfm_16x16 ADST_FLIPADST     1.42x    8965.24         12716.10
       av1_inv_txfm_16x16 FLIPADST_ADST     1.41x    8447.21         11938.94
                av1_inv_txfm_16x16 IDTX     2.12x    1633.17         3459.71
               av1_inv_txfm_16x16 V_DCT     1.70x    3371.94         5746.88
               av1_inv_txfm_16x16 H_DCT     1.49x    5003.06         7478.93
              av1_inv_txfm_16x16 V_ADST     1.67x    4273.57         7121.52
              av1_inv_txfm_16x16 H_ADST     1.47x    5752.89         8468.71
          av1_inv_txfm_16x16 V_FLIPADST     1.69x    4148.47         7011.73
          av1_inv_txfm_16x16 H_FLIPADST     1.47x    6057.87         8876.66
  4. avx512 support for highbd_fwd_txfm2d_64x64

    agopikrishna13 authored and kirithika7 committed May 14, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    av1_fwd_txfm_64x64 DCT_DCT     1.62x    148655.70       240683.86
    av1_fwd_txfm_64x64 IDTX     1.76x    60273.27        106022.88
  5. avx512 support for highbd_fwd_txfm2d_32x32

    srikanthkurapati authored and kirithika7 committed May 10, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    av1_fwd_txfm_32x32 DCT_DCT     2.05x    29575.91        60650.65
    av1_fwd_txfm_32x32 IDTX     1.84x    14192.11        26102.09
  6. avx 512 intrinsics support for highbd_fwd_txfm2d_16x16 all 16 transforms

    srikanthkurapati authored and kirithika7 committed May 10, 2019
    Following are the results of Unit tests and performance tests compared
    against AVX2
    
    av1_fwd_txfm_16x16 DCT_DCT     1.42x    6848.39         9744.99
            av1_fwd_txfm_16x16 ADST_DCT     1.34x    7617.16         10201.58
            av1_fwd_txfm_16x16 DCT_ADST     1.34x    7610.70         10164.90
           av1_fwd_txfm_16x16 ADST_ADST     1.31x    8604.63         11229.74
        av1_fwd_txfm_16x16 FLIPADST_DCT     1.52x    6945.68         10565.52
        av1_fwd_txfm_16x16 DCT_FLIPADST     1.30x    7902.89         10305.39
    av1_fwd_txfm_16x16 FLIPADST_FLIPADST    1.41x    8199.65         11537.69
       av1_fwd_txfm_16x16 ADST_FLIPADST     1.48x    8143.18         12073.94
       av1_fwd_txfm_16x16 FLIPADST_ADST     1.40x    8568.55         11957.64
                av1_fwd_txfm_16x16 IDTX     2.09x    1675.73         3510.61
               av1_fwd_txfm_16x16 V_DCT     1.71x    3324.54         5678.31
               av1_fwd_txfm_16x16 H_DCT     1.54x    4874.80         7523.28
              av1_fwd_txfm_16x16 V_ADST     1.69x    4161.85         7032.22
              av1_fwd_txfm_16x16 H_ADST     1.52x    5744.67         8732.08
          av1_fwd_txfm_16x16 V_FLIPADST     1.69x    4222.15         7123.12
          av1_fwd_txfm_16x16 H_FLIPADST     1.50x    6104.69         9129.65
Commits on May 16, 2019
Commits on May 3, 2019
  1. Add weiner filter avx512 optimizations

    lzhangnj committed May 3, 2019
    8% to 30% faster than avx2.
Older
You can’t perform that action at this time.