Propagation Performance #27

GlaserN · 2021-01-29T13:04:30Z

Is your feature request related to a problem? Please describe.
In the propagation and creation of the Us everything is written by using python for loops. These should be replaced by the tensorflow equivalents to increase performance and actually make use of tensorflow workload distribution.

c3/c3/utils/tf_utils.py

Lines 219 to 224 in ef95330

    
           for ii in range(cflds[0].shape[0]): 
        
               cf_t = [] 
        
               for fields in cflds: 
        
                   cf_t.append(tf.cast(fields[ii], tf.complex128)) 
        
               dUs.append(tf_dU_of_t(h0, hks, cf_t, dt)) 
        
           return dUs

c3/c3/utils/tf_utils.py

Lines 140 to 145 in ef95330

    
           ii = 0 
        
           while ii < len(hks): 
        
               h += cflds_t[ii] * hks[ii] 
        
               ii += 1 
        
           terms = int(1e12 * dt) + 2 
        
           dU = tf_expm(-1j * h * dt, terms)

Currently every du is calculated by itself and no parallelization is done, which could/should be improved.

Describe the solution you'd like
Make the progapagtion with native tensorflow functions.

Describe alternatives you've considered

Additional context

shaimach · 2021-01-29T14:54:18Z

per time-slice exponentiation can be parallelized the multiplication of time slices can be parallelized by a tree-structure (in parallel multiply entry 2k and 2k+1, shrink the list to half as many slices, repeat) P.S. We probably want to switch to split-operator <https://cdnsciencepub.com/doi/pdf/10.1139/v92-078> method <https://cdnsciencepub.com/doi/pdf/10.1139/v92-078>.

…

On Fri, 29 Jan 2021 at 15:04, Niklas Glaser ***@***.***> wrote: *Is your feature request related to a problem? Please describe.* In the propagation and creation of the Us everything is written by using python for loops. These should be replaced by the tensorflow equivalents to increase performance and actually make use of tensorflow workload distribution. https://github.com/q-optimize/c3/blob/ef9533008dccbcb23810e1e7396f85b5f200da15/c3/utils/tf_utils.py#L219-L224 https://github.com/q-optimize/c3/blob/ef9533008dccbcb23810e1e7396f85b5f200da15/c3/utils/tf_utils.py#L141-L143 Currently every du is calculated by itself and no parallelization is done, which could/should be improved. *Describe the solution you'd like* Make the progapagtion with native tensorflow functions. *Describe alternatives you've considered* *Additional context* — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAH3Q4PNL6XKXMLD7W6SU7TS4KW63ANCNFSM4WY2QN4Q> .

nwittler · 2021-01-29T15:53:59Z

One thing to keep in mind is that the python structure is not necessarily the tensorflow structure. The examples above use decorated @tf.functions which means they will be parsed as a for loop but not executed that way. It all comes back to the fact that we need a proper software engineer to find the best tensorflow network representation for the propagation.

Below is an example notebook but that's about as far as I got investigating this.

import numpy as np
import tensorflow as tf
import time

slices = 3000

var = tf.Variable(np.random.rand(100,100))

def expm(var):
    return tf.linalg.expm(var)

@tf.function
def tf_expm(var):
    return tf.linalg.expm(var)

start_time = time.time()
res = []
for ii in range(slices):
    res.append(expm(var))
print(time.time() - start_time, "seconds")

19.600555658340454 seconds

start_time = time.time()
res2 = []
for ii in range(slices):
    res2.append(tf_expm(var))
print(time.time() - start_time, "seconds")

4.85532021522522 seconds

var_vec = tf.Variable(np.random.rand(slices, 100,100))

def expm_vec():
    return tf.vectorized_map(expm, var_vec)

start_time = time.time()
res_vec = expm_vec()
print(time.time() - start_time, "seconds")

WARNING:tensorflow:Using a while_loop for converting ResourceGather
6.617619276046753 seconds

@tf.function
def expm_tf_vec():
    return tf.vectorized_map(expm, var_vec)

start_time = time.time()
res_vec_2 = expm_tf_vec()
print(time.time() - start_time, "seconds")

WARNING:tensorflow:Using a while_loop for converting ResourceGather
6.34708309173584 seconds

…timize#27

lazyoracle · 2021-03-16T17:42:56Z

@GlaserN Is this issue fixed in #34?

fedroy added the enhancement New feature or request label Jan 29, 2021

lazyoracle self-assigned this Feb 2, 2021

lazyoracle added can-wait Not working, not urgent tensorflow Tensorflow Performance and Implementation labels Feb 2, 2021

lazyoracle removed their assignment Feb 4, 2021

GlaserN self-assigned this Feb 4, 2021

GlaserN added a commit to GlaserN/c3 that referenced this issue Feb 4, 2021

vectorization of propagation relates #shaimach/c3po#43 and fixes q-op…

e01f98f

…timize#27

GlaserN closed this as completed Mar 26, 2021

lazyoracle added this to the 1.3 milestone May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagation Performance #27

Propagation Performance #27

GlaserN commented Jan 29, 2021 •

edited

Loading

shaimach commented Jan 29, 2021 via email

nwittler commented Jan 29, 2021

lazyoracle commented Mar 16, 2021

Propagation Performance #27

Propagation Performance #27

Comments

GlaserN commented Jan 29, 2021 • edited Loading

shaimach commented Jan 29, 2021 via email

nwittler commented Jan 29, 2021

lazyoracle commented Mar 16, 2021

GlaserN commented Jan 29, 2021 •

edited

Loading