Sources:

https://pennylane.ai/qml/demos/tutorial_quantum_transfer_learning.html

https://pennylane.ai/qml/demos/tutorial_qgrnn.html

https://www.youtube.com/watch?v=YBHzT5V1SzUlist=PL_hJxz_HrXxsQNJHWp10up8x-hwd5uwr0

#Quantum Computing Definitions

Quantum systems are manipulated via a series of quantum gates in a quantum circuit

Power comes from the ability to interfere amplitudes in a very high-dimensional vector space

A quantum circuit consists of the following:


1.   Prepare some inital state $|\psi\rangle$
    *  usally all zero state or ground state
    * ($|0\rangle|0\rangle|0\rangle ....$)



2. Execute some unitary transformation $U(θ)$ using a sequence of gates

3. Measre some obervable quantity $B$ (classical information)

## Quantum Circuits

The average value (expectation value) of the measurement result is given by the **Born rule**:

$\langle B \rangle = \langle \psi| U^T(\theta) BU (\theta)|\psi\rangle$

- just linear algebra in high-dimension space
- every step is a matrix-vector or matrix-matrix multiplication
  - (inner product of two vectors)

The final expectation value is a continuously function of $\theta$, the gate paramters.

## Quantum Differntiable Programming

Quantum circuits are differentiable, so the ideas of differntiable programming apply
  - we can train qunatum computers like we train neural networks

#Automatic Differentiation of Quantum Circuits

**How to compute gradients of qunatum circuits using the paramter-shift rule**


Other options (yield problems):
1.   Approximate?
2.   Expectation value via ML methods?
3.   Use a different optimization method?

Consider the function

$f(\theta) = sin \theta$


Suppose we know how to evaluate this function.


*   we know cosine is the derivative of sin
*   even though it's a different function, it has the property of looking like a sin function, but with the phase shift
* the derivative of sin or cos can always be expressed as a forward shift by $\frac{π}{2}$ subtracting a negative shift by $\frac{π}{2}$ and then subtracting by 2


$\frac{d}{dθ} f(\theta) = \frac{f(\theta + \frac{π}{2}) - f(\theta - \frac{π}{2})}{2}$

- parameter-shift strategy

$\frac{d}{dθ} f(\theta) = c[f(\theta + s) - f(\theta - s]$

Forunately, qunatum circuits admit a parameter-shift rule!

**Different than Finite Differences!**
  - it's a approximation and not exact
  - only a small shift
  - is it to large, don't get a good approximation
  - small shift, blowing up or through noise don't see any meaningful shift



## When can we use the paramter-shift rule?


*   Can be used on hardware or simulators
*   The parameter-shift property depends on individual gates, not the rest of circuit
*   Not all gates have a paramter-shift recipe, but most important gates do (e.g. all single qubit rotations)
*   Could use finite-difference method as a fallback





## Variants of the Parameter-Shift Rule



*   The Original
  * s is fix

$\frac{d}{dθ} f(\theta) = c[f(\theta + s) - f(\theta - s]$

*   Continuous parameter-shift rule
  * s is continues

$\frac{d}{dθ} f(\theta) = \frac{f(\theta + s) - f(\theta - s)}{2 sin(s)}$



*   Parameter-shift & gate decompositon
  * decompose gates in individual gates which give the same transformation but these gates don't have any parameters associated with them or they have known parameters shift rules
*   Stochastic paramter-shift rule
  * promise of differentiating arbitrary gates




##Higher-Order Derivatives

The parameter-shift method can be extended to:


*   Hessians
*   Geometric Tensor/Natural Gradient/Fisher information matrix
*   Arbitrary higher-order derivatives

Extending means you end up just adding more terms and shifting by more and more things forward and backwards with a similar structure
  * can blow up but
  * thorugh the symmetries of quantum gates and the symmetries of quantum circuits means a lot of these terms can be redundant 
    * reuse derivative values to get higher-order

# Variational Quantum Algorithms

Variational circuits are the practical embodiment of the idea:

"Train quantum computers like we train NN"

There is some quantum circuit that forms the basic subroutine of a larger algorithm.

The quantum subroutine takes in the state preparation (input data $X$ and the circuit parameters $\theta$ and outputs some measurement statistics. Update parameters via classical optimization loop

A variational quantum circuit consists of the following:


1.   Prepare some inital state $|\psi\rangle$
    *  usally all zero state or ground state
    * ($|0\rangle|0\rangle|0\rangle ....$)



2. Execute some parametriued unitary transformation $U(θ)$ using a sequence of gates

3. Measre some obervable quantity $B$ (classical information)

A variational algorithm contains a few ingredients:



1.   A circuit **ansatz**
2.   A problem-specific **cost function**
3.   A training procedure (like Gradient Descent using the parameter-shift rule)



#Embedding Classical Data

Variational circuits have free parameters, but it is often required to input or embed classical data as part of the ansatz


Number of different strategies (open research question):
*   AmplitudeEmbedding
*   AngleEmbedding
*   BasisEmbedding
*   DisplacementEmbedding
*   IQPEmbedding
*   QAOAEmbedding
*   SqueezingEmbedding



##Effects of Embedding

A common simple choice is to embed data using the parameters of a single gate (through rotation of single qubit)

**NOT SUFFICIENT - gives just a simple sin function**

## Insights on Quantum Embeddings


1.   Data "reuploading"
  * repeated sequence of embeddings
  * more complex than just a single rotation
2.   Learned embeddings
  * don't train the circuit, train the embeddings
  * use standard quantum information metrics and tricks to eg. classify the data



#Optimizing Variational Circuits

Using the paramter-shift rule, we can optimize circuits using **gradient descent**
- many flavours: Adam, Newton, Momentum, GD

There are also a number of Quantum-aware optimizers:



*   Rotosolve/Rotoselect
  * don't use gradients, solve directly for the minimum
*   Quantum Natural Gradient
  * accounts for the inherent geometry of quantum Hilbert space
* iCANS/Rosalin
  * "Shots-frugal" optimizers estimate many quantities using limited samples



# Barren Plateaus

Quantum optimization landsacaps can have vast regions where both the gradient and its variance are exponentially suppressed

Can come from circuit ansatz, choice of parametrization, or cost function

Still open research question, but there are some methods to account for that:



*   Specialized initalization strategies
*   Layer-wise growth & training
*   Adiabatic approaches (slow evolution towards target)



#Hybrid Models

Variational Circuits are already hybrid models

Quantum computer: efficiently samples from hard probability distributions

Classical computer: data storage, data loading, measurement aggregation, parameter updating, managing the optimization algorithm, controlling the quantum device, communication, ...

**Backpropagation does not work inside a quantum computer**
  * We would have to store quantum information about the state after each gate of the circuit

It's not a fundamental barrier. 