I created a new kind of mpohamiltonian for quantum chemistry hamiltonians, because you can gain some speed there by pre-summing the environments. I also invested a bunch of effort making the effective hamiltonian fast. In this notebook I try to benchmark the MPSKit effective hamiltonians against my (?optimized) implementation.

I don't make use of the pre-summation in this benchmark, both codes should do the exact same thing.

In [1]:
using Revise, MPSKit, MPSKitModels, TensorKit
using LinearAlgebra, Base.Threads, BenchmarkTools
using MPSKitExperimental:FusedMPOHamiltonian

In [2]:
BLAS.set_num_threads(1)

In [3]:
nthreads()

4

## heisenberg spin 1

In [4]:
fmps_len = 10;
middle_site = Int(round(fmps_len/2))

ts = FiniteMPS(rand,ComplexF64,fmps_len,Rep[SU₂](1=>1),Rep[SU₂](i => 20 for i in 0:10));

th_orig = heisenberg_XXX(SU2Irrep);
th_fused = convert(FusedMPOHamiltonian,repeat(th_orig,length(ts)));

env_orig = environments(ts,th_orig);
env_fused = environments(ts,th_fused);

ac = ts.AC[middle_site]
ac_eff_orig = MPSKit.∂∂AC(middle_site,ts,th_orig,env_orig);
ac_eff_fused = MPSKit.∂∂AC(middle_site,ts,th_fused,env_fused);


ac2 = ts.AC[middle_site]*MPSKit._transpose_tail(ts.AR[middle_site+1])
ac2_eff_orig = MPSKit.∂∂AC2(middle_site,ts,th_orig,env_orig);
ac2_eff_fused = MPSKit.∂∂AC2(middle_site,ts,th_fused,env_fused);

In [5]:
@benchmark ac_eff_orig(ac)

BenchmarkTools.Trial: 3429 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m835.424 μs[22m[39m … [35m13.222 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 77.98%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.178 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  1.451 ms[22m[39m ± [32m 1.580 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m15.25% ± 12.38%

  [39m█[39m▇[34m▅[39m[39m▂[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[34m█[39m[3

In [6]:
@benchmark ac_eff_fused(ac)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m33.590 μs[22m[39m … [35m 15.737 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 92.27%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m84.420 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m98.488 μs[22m[39m ± [32m382.876 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m9.83% ±  2.53%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▅[39m▇[39m▇[39m█[39m▆[39m▆[34m▇[39m[39m▇[39m▅[39m▄[39m▃[32m▂[39m[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▆[39m▆[39m▃[39

In [7]:
@benchmark ac2_eff_orig(ac2)

BenchmarkTools.Trial: 2475 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.300 ms[22m[39m … [35m27.964 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 85.46%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.489 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.014 ms[22m[39m ± [32m 2.939 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m20.46% ± 12.80%

  [39m█[34m▆[39m[32m▃[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[34m█[39m[32m█[39m[39m█[39m▆[39

In [8]:
@benchmark ac2_eff_fused(ac2)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m40.560 μs[22m[39m … [35m 17.590 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 94.11%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m76.340 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m87.938 μs[22m[39m ± [32m290.214 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m5.53% ±  1.68%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m▆[39m█[39m█[34m▁[39m[39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▂[39m▂[39m▂[39

## some weird hamiltonian

In [9]:
using MAT
include("haagerup/H3_anyon.jl")

In [10]:
fmps_len = 10;
middle_site = Int(round(fmps_len/2))

physical = Vect[H3](H3(4)=>1);
virtual = Vect[H3](sector => 20 for sector in TensorKit.SectorValues{H3}());
ts = FiniteMPS(rand,ComplexF64,fmps_len,physical,virtual);

mpotensor = TensorMap(ones,ComplexF64,physical*physical,physical*physical);
blocks(mpotensor)[H3(1)]*=0;
blocks(mpotensor)[H3(5)]*=0;
blocks(mpotensor)[H3(6)]*=0;
th_orig = MPOHamiltonian(-mpotensor);
th_fused = convert(FusedMPOHamiltonian,repeat(th_orig,length(ts)));

env_orig = environments(ts,th_orig);
env_fused = environments(ts,th_fused);

ac = ts.AC[middle_site]
ac_eff_orig = MPSKit.∂∂AC(middle_site,ts,th_orig,env_orig);
ac_eff_fused = MPSKit.∂∂AC(middle_site,ts,th_fused,env_fused);


ac2 = ts.AC[middle_site]*MPSKit._transpose_tail(ts.AR[middle_site+1])
ac2_eff_orig = MPSKit.∂∂AC2(middle_site,ts,th_orig,env_orig);
ac2_eff_fused = MPSKit.∂∂AC2(middle_site,ts,th_fused,env_fused);

In [11]:
@benchmark ac_eff_orig(ac)

BenchmarkTools.Trial: 859 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.102 ms[22m[39m … [35m20.650 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 70.13%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m4.290 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m5.801 ms[22m[39m ± [32m 4.297 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m25.62% ± 23.28%

  [39m [39m▁[39m▇[39m█[34m█[39m[39m▅[39m▂[39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m█[39m█[39m█[34m█[39m[39m█[39

In [12]:
@benchmark ac_eff_fused(ac)

BenchmarkTools.Trial: 7634 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m460.617 μs[22m[39m … [35m 16.485 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 94.52%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m552.822 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m649.255 μs[22m[39m ± [32m663.427 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.59% ±  4.39%

  [39m [39m▃[39m▄[39m▄[39m▆[39m█[39m█[39m▇[39m▄[39m▂[39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m█[39m█[

In [13]:
@benchmark ac2_eff_orig(ac2)

BenchmarkTools.Trial: 279 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m12.344 ms[22m[39m … [35m35.760 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 58.52%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m13.661 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m17.930 ms[22m[39m ± [32m 8.132 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m23.56% ± 23.87%

  [39m [39m▃[39m█[34m▅[39m[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▅[39m█[39m█[34m█[39m[39m█

In [14]:
@benchmark ac2_eff_fused(ac2)

BenchmarkTools.Trial: 5339 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m802.875 μs[22m[39m … [35m 16.076 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 89.76%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m849.284 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m931.388 μs[22m[39m ± [32m957.625 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m6.75% ±  6.17%

  [39m [39m [39m▃[39m▆[39m█[39m▆[39m▃[34m▃[39m[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▇[39m█[

## manueel

manually written out @tensor, compared to a version that used DelayedFact & friends. The whole recycling of intermediaries becomes mostly important when multithreading, so not very applicable here.

In [15]:
using MPSKitExperimental:DelayedFact,TransposeFact,free!

In [16]:
virtual = Rep[SU₂](i => 20 for i in 0:10);
physical = Rep[SU₂](1=>1)

ac = TensorMap(rand,ComplexF64,virtual*physical,virtual)

fun = let        
    e = transpose(TensorMap(rand,ComplexF64,oneunit(physical)*physical,physical*oneunit(physical)),(1,),(3,4,2))
    l = transpose(TensorMap(rand,ComplexF64,virtual*oneunit(physical)',virtual),(3,1),(2,))
    le = transpose(l*e,(2,5,4),(1,3))

    r = TensorMap(rand,ComplexF64,virtual*oneunit(physical),virtual)


    function tobench(x)
        @plansor y[-1 -2;-3] := le[-1 -2 5;2 3]*x[2 3;4]*r[4 5;-3]
    end
end

@benchmark fun(ac)

BenchmarkTools.Trial: 5515 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m540.607 μs[22m[39m … [35m14.691 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 90.60%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m637.016 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m899.579 μs[22m[39m ± [32m 1.651 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m25.06% ± 12.71%

  [34m█[39m[39m▆[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [34m█[39m[39m█[32m█[3

In [17]:
virtual = Rep[SU₂](i => 20 for i in 0:10);
physical = Rep[SU₂](1=>1)

ac = TensorMap(rand,ComplexF64,virtual*physical,virtual)

fun = let        
    e = transpose(TensorMap(rand,ComplexF64,oneunit(physical)*physical,physical*oneunit(physical)),(1,),(3,4,2))
    l = transpose(TensorMap(rand,ComplexF64,virtual*oneunit(physical)',virtual),(3,1),(2,))
    le = transpose(l*e,(2,5,4),(1,3))

    r = TensorMap(rand,ComplexF64,virtual*oneunit(physical),virtual)


    function tobench(x)

        lex = le*x
        lex_trans = transpose(lex,(1,2),(4,3))
        lex_trans*r
    end
end

@benchmark fun(ac)

BenchmarkTools.Trial: 6529 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m457.066 μs[22m[39m … [35m17.285 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 94.75%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m554.917 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m757.161 μs[22m[39m ± [32m 1.434 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m23.12% ± 11.50%

  [39m█[34m▅[39m[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m█[39m[32m█[3

In [18]:
virtual = Rep[SU₂](i => 20 for i in 0:10);
physical = Rep[SU₂](1=>1)

ac = TensorMap(rand,ComplexF64,virtual*physical,virtual)

fun = let        
    e = transpose(TensorMap(rand,ComplexF64,oneunit(physical)*physical,physical*oneunit(physical)),(1,),(3,4,2))
    l = transpose(TensorMap(rand,ComplexF64,virtual*oneunit(physical)',virtual),(3,1),(2,))
    le = transpose(l*e,(2,5,4),(1,3))

    r = TensorMap(rand,ComplexF64,virtual*oneunit(physical),virtual)

    multfactory_lex = DelayedFact(codomain(le)←virtual,storagetype(r))
    transposefactor_lex = TransposeFact(codomain(le)←virtual,storagetype(r),(1,2),(4,3))
    multfact_out = DelayedFact(virtual*physical←virtual,storagetype(r))

    function tobench(x)
        t_1 = multfactory_lex()
        mul!(t_1,le,x)

        t_2 = transposefactor_lex(t_1)
        free!(multfactory_lex,t_1);

        y = multfact_out()

        mul!(y,t_2,r)

        free!(transposefactor_lex,t_2)
        y
    end
end

@benchmark fun(ac)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m235.498 μs[22m[39m … [35m 14.324 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 95.68%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m254.949 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m311.675 μs[22m[39m ± [32m699.183 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m14.41% ±  6.27%

  [39m [39m▃[39m█[39m▅[39m▁[39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m█[39