-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom initial_state GPU #306
Conversation
Codecov Report
@@ Coverage Diff @@
## custominitop #306 +/- ##
==============================================
Coverage 100.00% 100.00%
==============================================
Files 57 57
Lines 10806 10808 +2
==============================================
+ Hits 10806 10808 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stavros11 could you please have a look? Looks like the local state vectors are not recovered from the devices after applying gates, this may be due to the replacement of
.assign
with=
.
Thanks for the fix, now it works on GPU. Regarding the .assign
, is there any important reason it should be changed to =
? I am not sure why tests fail with =
but the reason we were using tf.Variables
and .assign
was that it was easier to keep track of the device that each part of the state is stored. State is stored as pieces represented by tf.Variables
and the pieces are transferred to each GPU for the calculation. I guess by changing state.pieces[i].assign(piece)
to state.pieces[i] = piece
after each calculation we are not transfering the updated piece back to the CPU but we refer state.pieces[i]
to the GPU tensor which is not what we want.
I think the simplest solution is to leave all .assign
calls as they are and just pass the new initial state op inside a tf.Variable
when initializing state.pieces[0]
. I checked this and all tests pass while performance is equivalent to master for both single and multi GPU. If you like I can push my branch with these changes here.
src/qibo/tensorflow/distutils.py
Outdated
state.pieces[0] = op.initial_state(nqubits=state.nlocal, | ||
dtype=DTYPES.get('DTYPECPX'), | ||
is_matrix=False, omp_num_threads=get_threads()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave all the .assign
calls as they are and change this to:
state.pieces[0] = op.initial_state(nqubits=state.nlocal, | |
dtype=DTYPES.get('DTYPECPX'), | |
is_matrix=False, omp_num_threads=get_threads()) | |
piece = op.initial_state(nqubits=state.nlocal, | |
dtype=DTYPES.get('DTYPECPX'), | |
is_matrix=False, omp_num_threads=get_threads()) | |
state.pieces[0] = tf.Variable(piece, dtype=piece.dtype) |
This way all tests pass and performance is the same as master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this issue.
However, I had to change all assign
with =
because, at least on CPU, the code crashes with:
> state.pieces[i].assign(piece)
E AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'
Does this happens to you too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do the above change .assign
works both on CPU and GPU. I think changing the initialization as I wrote above using tf.Variable(op.initial_state(...))
will solve this crash because all .assign
calls will happen on variables not tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let me try to revert all assigns after applying your initialization.
I am opening this PR to complement the implementation in #305 for GPU.
Now the code compiles and passes several tests on GPU, however there is something wrong with some of distributed circuit tests.
@stavros11 could you please have a look? Looks like the local state vectors are not recovered from the devices after applying gates, this may be due to the replacement of
.assign
with=
.