Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stiefel Manifold #264

Open
Fengcheng-Pei opened this issue Mar 22, 2024 · 6 comments
Open

Stiefel Manifold #264

Fengcheng-Pei opened this issue Mar 22, 2024 · 6 comments

Comments

@Fengcheng-Pei
Copy link

Fengcheng-Pei commented Mar 22, 2024

Hi,

In my problem, my optimization variable lies in a Stiefel Manifold A (3x2). My original problem P1 is:

max z
s.t.  g(A) >= z
      f(A) >= Q
      A lies in Stiefel Manifold (3x2)

Then I use exact penalty to transform the original problem into a standard unconstrained manifold optimization problem P2 as:

minimize -z + a*b*log[1+exp((z-g(A))/b)] + a*b*log[1+exp((Q-f(A))/b)]
s.t.           A lies in Stiefel Manifold (3x2)

where z, Q are constants, a is the penalty factor and b is the smoothing factor. To solve this problem, I increase a and decrease b to iteratively optimize A until convergence. However, the objective function of P2 just increases with the increase of penalty factor a and doesn't converge to -z. The original problem is to find the feasible solution and I do not know whether my transformation could work well and I cannot understand why the object function of P2 just increase until +nan with iteration.

I really need your help and hope for your response!

Best regards,
Fengcheng Pei

@Fengcheng-Pei
Copy link
Author

Fengcheng-Pei commented Mar 24, 2024

Hi,

I also tried a simple Stiefel Manifold optimization with exact penalty and it does not converge, either. However, when I use the Sphere Manifold optimization with exact penalty, it will converge. I do not know the reason and hope your kind advice. Here, I have attached my code.

def arraysteering(self,beam,ra,rd):
    W=torch.tensor(beam)
    Du=torch.tensor(self.Du)
    D=torch.tensor(self.D)
    rou=1.0
    rou_max=2**(20)
    theta_rou=2.0
    miu=1.0
    min_miu=10**(-6)
    theta_miu=(min_miu/miu)**(1/30)
    diff=1.0
    ts=0.01
    r1=rd.H
    r2=ra.H

    while diff > ts:
        init_ori=np.zeros(shape=(3,2))
        init_ori[:,0]=np.array(r1).squeeze()
        init_ori[:,1]=np.array(r2).squeeze()
        manifold=Stiefel(3,2)
        @pymanopt.function.pytorch(manifold)
        def cost(A):
            a=torch.tensor(1j*np.zeros(shape=(self.T+self.K,self.N)))
            for i in range(self.T+self.K):
                for j in range(self.N):
                    a[i,j]=self.alpha*torch.sin(torch.arccos(A[:,0]@Du[:,i]))*torch.complex(torch.cos(j*torch.pi*Du[:,i]@A[:,1]),torch.sin(-j*torch.pi*Du[:,i]@A[:,1]))
            
            obj_2=0
            for i in range(self.T):
                for j in range(self.K):
                    obj_2=obj_2-torch.linalg.norm(a[self.K+i,:]@W[:,j])**2
            for i in range(self.K):
                sig_p=torch.linalg.norm(1/self.npower*torch.sqrt(torch.tensor(self.beta))/torch.linalg.norm(D[:,i])*a[i,:]@W[:,i])**2
                int_p=-sig_p
                for j in range(self.K):
                    int_p=int_p+torch.linalg.norm(1/self.npower*torch.sqrt(torch.tensor(self.beta))/torch.linalg.norm(D[:,i])*a[i,:]@W[:,j])**2
                obj_2=obj_2+rou*miu*torch.log2(1+torch.exp((self.QoS-sig_p/(int_p+1))/miu))

            return obj_2
        
        prob_2=pymanopt.Problem(manifold, cost)
        optimizer=pymanopt.optimizers.ConjugateGradient()
        Ori=np.matrix(optimizer.run(prob_2, initial_point=init_ori).point)
        r1=np.matrix(np.copy(Ori[:,0]))
        r2=np.matrix(np.copy(Ori[:,1]))

        diff=np.linalg.norm(Ori[:,0]-init_ori[:,0])+np.linalg.norm(Ori[:,1]-init_ori[:,1])
        rou=min(theta_rou*rou, rou_max)
        miu=max(theta_miu*miu, min_miu)

    Ori_opt=np.matrix(np.zeros(shape=(3,2)))
    Ori_opt[:,0]=np.matrix(np.copy(r1))
    Ori_opt[:,1]=np.matrix(np.copy(r2))
    return Ori_opt

Best regards,
Fengcheng Pei

@kellertuer
Copy link

I do not work much with the Python package here (I am the Julia variant person ;))

But the first thing I do not understand is why A only appears in the constraints. Doesn't that make A a parameter one can just choose and then finding max z Is easy? Or are you maximising over all A? Then I do not understand wh your cost would then be the constant z.
Maybe you can clarify

For your switch of the manifolds, you would then switch from
Stiefel(3,2) – that is two orthonormal vectors in R3 stored columnise in a matrix to
Sphere(3,2) – which is a matrix of 2 unit vectors stored in a matrix.
So this is a completely different set you are optimising over, since for the second manifold there is nothing connecting the two columns (like orthogonality in the first). So why did you try the other manifold then?

For the code I can sadly not contribute much , as I said I do not program in Python (much), and especially never used pytorch.

@Fengcheng-Pei
Copy link
Author

Fengcheng-Pei commented Mar 24, 2024

I do not work much with the Python package here (I am the Julia variant person ;))

But the first thing I do not understand is why A only appears in the constraints. Doesn't that make A a parameter one can just choose and then finding max z Is easy? Or are you maximising over all A? Then I do not understand wh your cost would then be the constant z. Maybe you can clarify

For your switch of the manifolds, you would then switch from Stiefel(3,2) – that is two orthonormal vectors in R3 stored columnise in a matrix to Sphere(3,2) – which is a matrix of 2 unit vectors stored in a matrix. So this is a completely different set you are optimising over, since for the second manifold there is nothing connecting the two columns (like orthogonality in the first). So why did you try the other manifold then?

For the code I can sadly not contribute much , as I said I do not program in Python (much), and especially never used pytorch.

Hi,

Thanks great a lot for your response! Let me explain my problem more clearly. My original problem P1 is:

max min(g_l(w,r1,r2)) for l=1,...,L
s.t.  f_k(w,r1,r2) >= Q for k=1,...,K
      ||r1||=1
      ||r2||=2
      r1.T*r2=0

So r1 and r2 lie in the Stiefel Manifold. Problem P1 is equivalent to problem P2:

max z
s.t.  g_l(w,r1,r2)>=z for l=1,...,L
      f_k(w,r1,r2) >= Q for k=1,...,K
      ||r1||=1
      ||r2||=2
      r1.T*r2=0

With block coordinate descent, I optimize z and w using convex optmization. Then the left problem P3 for optimizing r1 and r2 is:

max z
s.t.  g_l(w,A)>=z for l=1,...,L
      f_k(w,A) >= Q for k=1,...,K
      A=Stiefel(3,2)

where z and w are constants optimized before. To solve P3, I use exact penalty method and get problem P4:

min -z+m*(Σn*log(1+exp((z-g_l)/b))+Σn*log(1+exp((Q-f_k)/b)))
s.t.  A=Stiefel(3,2)

where m and n are penalty factor and smoothing factor, respectively. I start with a small m and iteratively increase m and optimize A untill convergence. However, the objective function of P4 keeps increasing during the iteration because of the increase of m.
Before, I solved P4 by optimizing r1 and r2 separately with Sphere manifold and the exact penalty method worked well. I kept the orthonality by using the trick of null space. However, when using Stiefel Manifold, it does not converge with the increase of penalty factor m. I am just very confuse why it could happen.

Best regards,
Fengcheng Pei

@kellertuer
Copy link

kellertuer commented Mar 24, 2024

Hi,
just two further short comments

I See that your r2 has to have a norm of 2, but both Stiefel (second column) or Sphere (second component) would yield norm 1, does that make a difference?
Besides that I am confused that P1 reads “max min” is that a typo?

orthonality by using the trick of null space.

I do not understand this in that shortage, since there might just be too many things called “trick”.

Overall I would probably prefer to go for P1 directly, I am not sure why you rephrase that.

Just as a side remark – If you want to have something to compare to, there is

@Fengcheng-Pei
Copy link
Author

Hi, just two further short comments

I See that your r2 has to have a norm of 2, but both Stiefel (second column) or Sphere (second component) would yield norm 1, does that make a difference? Besides that I am confused that P1 reads “max min” is that a typo?

orthonality by using the trick of null space.

I do not understand this in that shortage, since there might just be too many things called “trick”.

Overall I would probably prefer to go for P1 directly, I am not sure why you rephrase that.

Just as a side remark – If you want to have something to compare to, there is

Hi,

Thanks great a lot for your help! And I solved my problem, because I made a mistake with the dimension of the output of the optimization problem. However, I found that the simulation time of using Stiefel manifold was much longer than using Sphere manifold. Is it due to that the retraction steps of Stiefel manifold is more complex than Sphere manifold?

Best regards,
Fengcheng Pei

@kellertuer
Copy link

I still have not understood the “orthogonality trick”, but well – great that it works now!

“Much longer” is a bit vague and hard to tell. Does it take much more steps? Or the same number of steps?
Also it depends a lot on the choice of the retraction. Since one way to see them is as approximations of the exponential map, some might do that less accurately. On the sphere you probably do use the exponential map since that is easy enough to compute; on Stiefel there are several retractions available usually (but not exp).
The same of course for the inverse retraction (or logarithmic map).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants