Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weak forms for hyperelasticity #439

Closed
gdmcbain opened this issue Jul 21, 2020 · 71 comments
Closed

weak forms for hyperelasticity #439

gdmcbain opened this issue Jul 21, 2020 · 71 comments

Comments

@gdmcbain
Copy link
Contributor

@bhaveshshrimali (from #438):

Thanks! I was starting to take a look at implementing the hyperelasticity demo. The most natural examples to follow seem to be linear elasticity, nonlinear poisson and laplace with inhomogeneous bcs.

What would be the best place to look for functions like log, inv, det (like in UFL) when writing the weak forms? For instance the stress would have an expression that looks like the following in UFL:

def firstPKStress(u):
    F = Identity(len(u)) + grad(u) 
    J = det(F)
    return mu * F - mu * inv(F).T + J * (J-1) * inv(F).T

I can see the helper functions eye and grad, which should help in defining F as eye(1, u.shape[0]) + grad(u), but how about the determinant (det) and inverse (inv) ?

@bhaveshshrimali
Copy link
Contributor

Thanks a lot for keeping the discussion organized. Will open a separate issue (so that the main topic remains as is) for each topic/sub-topic in the future :)

@gdmcbain
Copy link
Contributor Author

Usually these kinds of operations are just done with NumPy, e.g. numpy.einsum. The physical indices (indexing over component or dimension) usually precede the numerical indices (indexing over element or quadrature point). Most of the NumPy operations do the right thing in broadcasting over the numerical indices, but, not all of them. I remember having a difficulty with one of them recently...

@gdmcbain
Copy link
Contributor Author

Yeah, does numpy.linalg.inv work if the ndarray indices are shuffled around so that the ones over component are at the end?

@bhaveshshrimali
Copy link
Contributor

Indeed it does! (never knew numpy.linalg.inv could handle higher dimensional arrays). So this means that inverse would then be:

from numpy import linalg as nla
def inv(T):
    reshapedT = np.einsum("ijk->kij",T)   # assuming that T is of shape `ndim x ndim x numQuadraturePoints` ?
    return np.einsum("ijk->kij", nla.inv(reshapedT)) 

and I think det does too

@kinnala
Copy link
Owner

kinnala commented Jul 21, 2020

Please note that since this is a nonlinear problem and we don't (yet?) have anything similar to derivative in FEniCS you have to either

  • calculate the Jacobian of the Piola-Kirchhoff stress by hand
  • try to use SumPy/Mathematica or similar symbolic engine to help in obtaining the Jacobian
  • try to apply something like JAX to do it in an automatized manner

In the past, I have done options 1 and 2 for hyperelasticity and tried option 3 for a simpler test problem. At some point I had a goal of encapsulating option 3 into scikit-fem so that it's simple to use but it is currently on hold as I've been focusing on other things.

@gdmcbain
Copy link
Contributor Author

Do Jacobian-free Newton–Krylov methods work well on this class of.problems?

@kinnala
Copy link
Owner

kinnala commented Jul 21, 2020

Do Jacobian-free Newton–Krylov methods work well on this class of.problems?

Oh yes, that's another option. You might need to do some work though to find a good preconditioner but I suppose small problems should work pretty well?

@bhaveshshrimali
Copy link
Contributor

  • calculate the Jacobian of the Piola-Kirchhoff stress by hand

This was my plan. I had done this previously for a simple hyperelasticity problem sometime back. The idea was to use SymPy to calculate the derivatives with respect to the invariants and then lambdify the the result.

  • try to apply something like JAX to do it in an automatized manner

Interesting. Although I could never manage to install jax on a windows machine in the past. Might be worthwhile to follow up on this.

@gdmcbain gdmcbain mentioned this issue Jul 22, 2020
Closed
@gdmcbain
Copy link
Contributor Author

You might need to do some work though to find a good preconditioner but I suppose small problems should work pretty well?

Yes, I think so. A (good or otherwise) preconditioner is not a strict requirement for Jacobian-free Newton–Krylov, as shown for ex10 when demonstrating scipy.optimize.root as a nonlinear solver #119. That example is a small problem but root(method='krylov') works really well for it. I think this is just using unpreconditioned lgmres. I think JFNK is a good first option for preliminary exploration of nonlinear problems on small meshes. I'll post the branch.

@gdmcbain gdmcbain mentioned this issue Jul 22, 2020
@bhaveshshrimali
Copy link
Contributor

@gdmcbain @kinnala , so I got some time today to start re-implementing the FEniCS demo in scikit-fem
I just wanted to make sure that I understand the syntax correctly to assemble the rhs and jacobian. For now (with the analytical jacobian), I have

import pygmsh as pm
import numpy as np
from skfem.io import from_meshio
from skfem.helpers import ddot, grad, eye, dot, transpose
from math import pi
from skfem import *

def vinv(T):
    reshapedT = np.einsum("ijk->kij",T)
    return np.einsum("ijk->kij", np.linalg.inv(reshapedT))

def vdet(T):
    reshapedT = np.einsum("ijk->kij",T)
    return np.einsum("ijk->kij", np.linalg.det(reshapedT))

def firstPKStress(u):
    F = eye(1., u.shape[0]) + grad(u)
    J = vdet(F)
    return mu * F - mu * transpose(vinv(F)) + lmbda * J * (J - 1) * transpose(vinv(F))

# TODO: check the expression once again
def jacobianPK(u):
    F = eye(1., u.shape[0]) + grad(u)
    J = vdet(F)
    Finv = vinv(F)
    dFdF = np.einsum("ik...,jl...->ikjl...", eye(1., u.shape[0]), eye(1., u.shape[0]))
    dFinvdF = -np.einsum("ik...,jl...->ijkl",Finv, Finv)
    C = mu * dFdF - mu * dFinvdF - lmbda * J * (J- 1) * dFinvdF + (2*J - 1) * J * np.einsum("ji...,lk...->ijkl",Finv, Finv)
    return C


geom = pm.opencascade.Geometry(1.e-1, 5.e-1)
box = geom.add_box([0., 0., 0.], [1., 1., 1.])
msh = pm.generate_mesh(geom, dim=3)

# import mesh in skfem
mesh = from_meshio(msh)
elem = ElementTetP1()
uelem = ElementVectorH1(elem)
iBasis = InteriorBasis(mesh, uelem, intorder=3)
fBasis = FacetBasis(mesh, uelem, intorder=3)
u = np.zeros(iBasis.N) # this takes care of dimension

# materialParams and init
bodyForce = np.array([0., -1./2, 0])
E, nu = 10., 0.3
mu = E/2/(1+nu)
lmbda = 2*mu*nu/(1-2*nu)

dofs = {
    "left": fBasis.get_dofs(lambda x: np.isclose(x[0], 0.)),
    "right": fBasis.get_dofs(lambda x: np.isclose(x[0], 1.))
}

I = iBasis.complement_dofs(dofs)

# assign DirichletBC
# variables used in the FEniCS demo
scale = y0 = z0 = 0.5
theta = pi/3.

u1Right = 0.
u2Right = lambda x,y,z: scale*(y0 + (y - y0)*np.cos(theta) - (z - z0)*np.sin(theta) - y)
u3Right = lambda x,y,z: scale*(z0 + (y - y0)*np.sin(theta) + (z - z0)*np.cos(theta) - z)

u[dofs["left"].nodal['u^1']] = 0.
u[dofs["left"].nodal['u^2']] = 0.
u[dofs["left"].nodal['u^3']] = 0.
u[dofs["right"].nodal['u^1']] = u1Right
u[dofs["right"].nodal['u^2']] = L2_projection(u2Right, fBasis, dofs["right"].nodal['u^2'])
u[dofs["right"].nodal['u^3']] = L2_projection(u3Right, fBasis, dofs["right"].nodal['u^3'])

@LinearForm
def rhs(v, w):
    return ddot(firstPKStress(v), grad(w["w"])) #+ dot(bodyForce, w["w"])

@BilinearForm
def jac(u, v, w):
    return ddot(ddot(grad(v), jacobianPK(u)), grad(w["w"]))


for itr in range(100):
    w = iBasis.interpolate(u)
    J = asm(jac, iBasis, w=w)  # not able to assemble properly
    F = asm(rhs, iBasis, w=w)
    u_prev = u.copy()
    u += solve(*condense(J, -F, I=I))  # try a full Newton Step
    if np.linalg.norm(u - u_prev) < 1e-8:
        break
    if __name__ == "__main__":
        print(np.linalg.norm(u - u_prev))

I think I am missing something when trying to correctly declare the linear and bilinear forms. I will probably take a fresh look later in the day tomorrow at how eye works. However, if you happen to spot something obvious, please let me know. Thanks!

This is the traceback,

F = eye(1., u.shape[0]) + grad(u)
AttributeError: 'DiscreteField' object has no attribute 'shape'

@kinnala
Copy link
Owner

kinnala commented Jul 22, 2020

I'm currently writing some additional Sphinx documentation on the subject "Anatomy of forms". I hope to add some tips for defining forms.

AttributeError: 'DiscreteField' object has no attribute 'shape'

A short explanation: parameters u and v in the form definition are DiscreteField objects, basically a tuple of ndarray's: https://github.com/kinnala/scikit-fem/blob/master/skfem/element/discrete_field.py#L7. Anyhow, they don't have a property shape.

Edit: I'd have to take a more careful look at the forms to propose a fix. I can look at it later when I have more time.

@kinnala
Copy link
Owner

kinnala commented Jul 22, 2020

So mathematically grad(u) is 2-tensor and you want to add ones to the diagonal.

You can get forwards by replacing

F = eye(1., u.shape[0]) + grad(u)

with

F = grad(u)
F[0, 0] += 1
F[1, 1] += 1
F[2, 2] += 1

I think skfem.helpers.eye is probably not the most useful function for this use case in it's current form. I think we need to consider replacing it with something better. I cannot recall why it was written that way.

@kinnala
Copy link
Owner

kinnala commented Jul 22, 2020

Maybe a more relevant helper for this purpose would be

def eye(u):
    v = np.zeros_like(u)
    for i in range(v.shape[0]):
        v[i, i] += 1
    return v

But this is good discussion also on what kind of helpers we should have in skfem.helpers and how to improve it.

@bhaveshshrimali
Copy link
Contributor

Thanks a lot for the help! I added the ellipsis to a couple of more places (after printing grad(u).shape)

def vinv(T):
    reshapedT = np.einsum("ij...->...ij",T)
    return np.einsum("...ij->ij...", np.linalg.inv(reshapedT))

and the same for determinant. And also changed to using your second suggestion.

def firstPKStress(u):
    F = grad(u)
    for i in range(mesh.p.shape[0]):
        F[i,i] += 1.
    J = vdet(F)
    return mu * F - mu * transpose(vinv(F)) + lmbda * J * (J - 1) * transpose(vinv(F))

and in the jacobian. Everything seems to be fine at least syntax-wise (maybe I am missing something). Will take a careful look at the forms and bcs later in the day. Thanks so much for your help!!

@bhaveshshrimali
Copy link
Contributor

Corrected (?) the forms (still haven't implemented the non-homogeneous Neumann condition). Will clean up the code in some time.

def firstPKStress(u):
    F = grad(u)
    for i in range(3):
        F[i,i] += 1.
    J = vdet(F)
    return mu * F - mu * transpose(vinv(F)) + lmbda * J * (J - 1) * transpose(vinv(F))

def jacobianPK(u):
    F = grad(u)
    eye = np.zeros_like(F)
    for i in range(3):
        F[i,i] += 1.
        eye[i,i] += 1.
    J = vdet(F)
    Finv = vinv(F)
    dFdF = np.einsum("ik...,jl...->ijkl...", eye, eye)
    dFinvdF = np.einsum("jk...,li...->ijkl...", Finv, Finv)
    C = mu * dFdF + mu * dFinvdF
    - lmbda * J * (J- 1) * dFinvdF
    + lmbda * (2*J - 1) * J * np.einsum("ji...,lk...->ijkl...",Finv, Finv)
    return C


@LinearForm
def rhs(v, w):
    return ddot(firstPKStress(w["w"]), grad(v)) + dot(bodyForce, v)

@BilinearForm
def jac(u, v, w):
    return ddot(ddot(grad(u), jacobianPK(w["w"])), grad(v))

However I noticed that assembling the bilinear form with ~7k dofs (precisely Points: 2290, Cells: 10191) takes about 3min on my laptop (will do precise profiling and report in some time). I suspect I may be doing something inefficiently in the jacobianPK function that affects the assembly.

@kinnala
Copy link
Owner

kinnala commented Jul 23, 2020

assembling the bilinear form with ~7k dofs (precisely Points: 2290, Cells: 10191) takes about 3min on my laptop

Can you post your full code including vdet and vinv so we can check it out?

@kinnala
Copy link
Owner

kinnala commented Jul 23, 2020

The implementation of vinv looks a bit suspicious to me as I'm not sure how the dimensions would go if you wanted to use np.linalg.inv for the inversion. Another option is to write the inverse explicitly using Cramer's rule as is done here: https://github.com/kinnala/scikit-fem/blob/master/skfem/mapping/mapping_affine.py#L49.

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Jul 23, 2020

assembling the bilinear form with ~7k dofs (precisely Points: 2290, Cells: 10191) takes about 3min on my laptop

Can you post your full code including vdet and vinv so we can check it out?

Sure (and thanks!). It is basically the full snippet that I posted above with the changes to the functions:

import pygmsh as pm
import numpy as np
from skfem.io import from_meshio
from skfem.helpers import ddot, grad, dot, transpose
from math import pi
from skfem import *

# define inverse:
def vinv(T):
    """
    physical dimensions precede numerical (see: https://github.com/kinnala/scikit-fem/issues/439#issuecomment-661649339)
    T_ij... (where i, j are physical dimensions)
    """
    reshapedT = np.einsum("ij...->...ij", T)
    return np.einsum("...ij->ij...", np.linalg.inv(reshapedT))

def vdet(T):
    """
    physical dimensions precede numerical (see: https://github.com/kinnala/scikit-fem/issues/439#issuecomment-661649339)
    T_ij... (where i, j are physical dimensions)
    """
    reshapedT = np.einsum("ij...->...ij", T)
    return np.einsum("...ij->ij...", np.linalg.det(reshapedT))

def firstPKStress(u):
    F = grad(u)
    F[0,0] += 1.
    F[1,1] += 1.
    F[2,2] += 1.
    J = vdet(F)
    return mu * F - mu * transpose(vinv(F)) + lmbda * J * (J - 1) * transpose(vinv(F))

def jacobianPK(u):
    F = grad(u)
    eye = np.zeros_like(F)
    for i in range(3):
        F[i,i] += 1.
        eye[i,i] += 1.
    J = vdet(F)
    Finv = vinv(F)
    dFdF = np.einsum("ik...,jl...->ijkl...", eye, eye)
    dFinvdF = np.einsum("jk...,li...->ijkl...", Finv, Finv)
    C = mu * dFdF + mu * dFinvdF -\
        lmbda * J * (J- 1) * dFinvdF +\
            lmbda * (2*J - 1) * J * np.einsum("ji...,lk...->ijkl...",Finv   , Finv)
    # print(C[0,0,0,0].min())
    return C


geom = pm.opencascade.Geometry(1.e-2, 1.e-1)
box = geom.add_box([0., 0., 0.], [1., 1., 1.])

msh = pm.generate_mesh(geom)
mesh = from_meshio(msh)
elem = ElementTetP1()
uelem = ElementVectorH1(elem)
iBasis = InteriorBasis(mesh, uelem, intorder=2)
fBasis = FacetBasis(mesh, uelem, intorder=2)
u = np.zeros(iBasis.N) #this takes care of dimension

# materialParams and init
bodyForce = np.array([0., -1./2, 0])
E, nu = 10., 0.3
mu = E/2/(1+nu)
lmbda = 2*mu*nu/(1-2*nu)
dofs = {
    "left": iBasis.get_dofs(lambda x: x[0]==0),
    "right": iBasis.get_dofs(lambda x: x[0]==1.)
}


# assign DirichletBC
# variables used in the FEniCS demo
scale = y0 = z0 = 0.5
theta = pi/3.

# scaling factor: bta: for Newton's method'
bta = 1.

u1Right = 0.
u2Right = lambda x,y,z: scale*(y0 + (y - y0)*np.cos(theta) - (z - z0)*np.sin(theta) - y)
u3Right = lambda x,y,z: scale*(z0 + (y - y0)*np.sin(theta) + (z - z0)*np.cos(theta) - z)

rightNodes = mesh.p[:,mesh.nodes_satisfying(lambda x: np.isclose(x[0], 1.))]
leftNodes = mesh.p[:,mesh.nodes_satisfying(lambda x: np.isclose(x[0], 0.))]

u[dofs["left"].nodal['u^1']] = 0.
u[dofs["left"].nodal['u^2']] = 0.
u[dofs["left"].nodal['u^3']] = 0.
u[dofs["right"].nodal['u^1']] = 0.1
# u[dofs["right"].nodal['u^2']] = L2_projection(u2Right, iBasis, dofs["right"].nodal['u^2']) # u2Right(rightNodes[0], rightNodes[1], rightNodes[2])
# u[dofs["right"].nodal['u^3']] = L2_projection(u3Right, iBasis, dofs["right"].nodal['u^3']) # u3Right(rightNodes[0], rightNodes[1], rightNodes[2])
I = iBasis.complement_dofs(dofs)

@LinearForm
def rhs(v, w):
    return ddot(firstPKStress(w["w"]), grad(v)) #+ dot(bodyForce, v)

@BilinearForm
def jac(u, v, w):
    return ddot(grad(v), ddot(jacobianPK(w["w"]), grad(u)))


for itr in range(2): #100
    print("Iteration: {}".format(itr+1))
    w = iBasis.interpolate(u)
    # print(w)
    J = asm(jac, iBasis, w=w)
    F = asm(rhs, iBasis, w=w)
    u_prev = u.copy()
    u += bta * solve(*condense(J, -F, I=I))
    print(np.linalg.norm(u_prev-u))
    print("lmbda + 2mu = {}".format(lmbda+2.*mu ))

You are right, there is something off with the calculation of the jacobian, and most likely it is in how inverse is calculated. But I also checked the deformation gradient and it seems to be off too.

Will do some thorough checks today in the evening.

Thanks again

@bhaveshshrimali
Copy link
Contributor

The implementation of vinv looks a bit suspicious to me as I'm not sure how the dimensions would go if you wanted to use np.linalg.inv for the inversion. Another option is to write the inverse explicitly using Cramer's rule as is done here: https://github.com/kinnala/scikit-fem/blob/master/skfem/mapping/mapping_affine.py#L49.

I remember stumbling accross this at the very beginning when I was looking for helper functions, but then the documentation in numpy.linalg.inv seemed to suggest that it will work when the physical indices are shifted to the end. I will check the explicit inversion (and try it too). Thanks!

I tried doing a small check:

import numpy as np
a = np.random.random((3,3,100,4))
ainvVec = np.einsum("...ij->ij...", np.linalg.inv(np.einsum("ij...->...ij", a)))
ainvLooped = np.array([
    [np.linalg.inv(a[:,:,i,j]) for j in range(a.shape[3])]
    for i in range(a.shape[2])
])
print(np.allclose(np.einsum("...ij->ij...", ainvLooped), ainvVec)) # returns `True`

@kinnala
Copy link
Owner

kinnala commented Jul 23, 2020

Here is the time on my laptop. I'll check quickly if there is something I can do to improve it.

In [2]: %time asm(jac, iBasis, w=w)                                                                                                                 
CPU times: user 17.9 s, sys: 16.1 s, total: 33.9 s
Wall time: 11.3 s
Out[2]: 
<3603x3603 sparse matrix of type '<class 'numpy.float64'>'
	with 135385 stored elements in Compressed Sparse Row format>

@bhaveshshrimali
Copy link
Contributor

Thanks a lot! I had previously done some vectorized assembly calculations (with loop over number of local basis) and remember that the time wasn't that huge. But that was a linear problem in 2D.

@kinnala
Copy link
Owner

kinnala commented Jul 23, 2020

Here is something:

def vdet(A):
    detA = np.zeros_like(A[0, 0])
    detA = A[0, 0] * (A[1, 1] * A[2, 2] -
                      A[1, 2] * A[2, 1]) -\
           A[0, 1] * (A[1, 0] * A[2, 2] -
                      A[1, 2] * A[2, 0]) +\
           A[0, 2] * (A[1, 0] * A[2, 1] -
                      A[1, 1] * A[2, 0])
    return detA

def vinv(A):
    invA = np.zeros_like(A)
    detA = vdet(A)
    invA[0, 0] = (-A[1, 2] * A[2, 1] +
                  A[1, 1] * A[2, 2]) / detA
    invA[1, 0] = (A[1, 2] * A[2, 0] -
                  A[1, 0] * A[2, 2]) / detA
    invA[2, 0] = (-A[1, 1] * A[2, 0] +
                  A[1, 0] * A[2, 1]) / detA
    invA[0, 1] = (A[0, 2] * A[2, 1] -
                  A[0, 1] * A[2, 2]) / detA
    invA[1, 1] = (-A[0, 2] * A[2, 0] +
                  A[0, 0] * A[2, 2]) / detA
    invA[2, 1] = (A[0, 1] * A[2, 0] -
                  A[0, 0] * A[2, 1]) / detA
    invA[0, 2] = (-A[0, 2] * A[1, 1] +
                  A[0, 1] * A[1, 2]) / detA
    invA[1, 2] = (A[0, 2] * A[1, 0] -
                  A[0, 0] * A[1, 2]) / detA
    invA[2, 2] = (-A[0, 1] * A[1, 0] +
                  A[0, 0] * A[1, 1]) / detA
    return invA, detA

and then use Finv, J = vinv(F) in jacobianPK. It gives

In [1]: %timeit asm(jac, iBasis, w=w)                                                                                                               
3.89 s ± 491 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

unless I made some grave mistake.

The rest of the time comes mainly from moving data between low-level and high-level code because this line has quite a bit of operations:

    C = mu * dFdF + mu * dFinvdF -\
        lmbda * J * (J- 1) * dFinvdF +\
            lmbda * (2*J - 1) * J * np.einsum("ji...,lk...->ijkl...",Finv   , Finv)

A package called numexpr might help here but not sure.

Btw, I think

@BilinearForm
def jac(u, v, w):
    return ddot(grad(v), ddot(jacobianPK(w["w"]), grad(u)))

is not what you're thinking: ddot is not detecting that jacobianPK is 4-tensor. I'll try to propose a fix.

@kinnala
Copy link
Owner

kinnala commented Jul 23, 2020

Maybe this is right?

@BilinearForm
def jac(u, v, w):
    return np.einsum('ijkl...,ij...,kl...', jacobianPK(w["w"]), grad(u), grad(v))

@bhaveshshrimali
Copy link
Contributor

I will check these. Really appreciate your help on this.

@bhaveshshrimali
Copy link
Contributor

Maybe this is right?

@BilinearForm
def jac(u, v, w):
    return np.einsum('ijkl...,ij...,kl...', jacobianPK(w["w"]), grad(u), grad(v))

This is exactly what I had in mind when trying to write the form. I didn't check, even though I checked the skfem.helpers.py file, ddot carefully enough. Thanks so much

@bhaveshshrimali
Copy link
Contributor

The rest of the time comes mainly from moving data between low-level and high-level code because this line has quite a bit of operations:

I see, Thanks!. Will also try experimenting with the optimize flag in einsum as well as comparing with opt_einsum

@gdmcbain
Copy link
Contributor Author

That looks good.

I was stumped for a while in an earlier quantitative comparison effort to reproduce the linear elastic FEniCS tutorial example in gdmcbain/fenics-tuto-in-skfem#3. The initial quite marked discrepancy was due to large error on the FEniCS side, it seeming much more sensitive to the coarseness of the mesh for some reason; it went away when the mesh was refined. Anyway it doesn't look as though there's any similar difficulty here.

Part of the motivation for having those examples in that other repo rather than here was that I wasn't sure about the copyright or licensing status of the FEniCS originals; I'm not a lawyer.

@bhaveshshrimali
Copy link
Contributor

I was stumped for a while in an earlier quantitative comparison effort to reproduce the linear elastic FEniCS tutorial example in gdmcbain/fenics-tuto-in-skfem#3. The initial quite marked discrepancy was due to large error on the FEniCS side, it seeming much more sensitive to the coarseness of the mesh for some reason; it went away when the mesh was refined. Anyway it doesn't look as though there's any similar difficulty here.

Ah interesting. Thanks for pointing this out. A more rigorous and self contained check would be to reproduce a manufactured solution. What do you think? I see if I can get time to do that this week. Simulating uniaxial tension (or any other affine bcs) and then comparing the stress-stretch vs the analytical solution (since the fields are trivial) is an option, but visually less appealing.

Also one needs to lower the tolerance in scipy.linalg.root. With default 6.e-6 I think the norm of difference between FEniCS and scikit-fem is much larger.

u = root(residual, u, method='krylov', tol=1.e-10,
    options={
        'disp': True, "line_search": "wolfe"
    }
    ).x

Part of the motivation for having those examples in that other repo rather than here was that I wasn't sure about the copyright or licensing status of the FEniCS originals; I'm not a lawyer.

Including this in that repo works too, in fact pretty much as-is with the current implementation. It's LGPL so I am assuming there would be no violations of any copyrights, but then again I am no legal expert either. :)

Just one remark though, this code (in current state) would take ~1min or longer (depending on the platform) to execute, which I believe is bit longer than some of the examples (ex-10 for instance) , but again I think serves to be more illustrative than performant.

@bhaveshshrimali
Copy link
Contributor

Also as a side comment for plotting (as mentioned in #361 ), vedo works pretty nicely (my workaround was to interpolate the vertex data into a dolfin.Function and then use vedo.dolfin.plot as-is to view it side-by-side with FEniCS) but the following works fine too

from vedo import show as vShow
from vedo import Mesh as vMesh
vShow(vMesh([
        mesh.p, mesh.t.T
    ]).warpByVectors(u[basis.nodal_dofs].T).addScalarBar() 
)

@gdmcbain
Copy link
Contributor Author

A more rigorous and self contained check would be to reproduce a manufactured solution. What do you think?

The method of manufactured solutions is good. There's a compelling argument for it in the FEniCS tutorial in which they state that by choosing a solution from within the finite element space (P2, e.g.), one of the kinds of error can be eliminated. (The nomenclature for the different kinds of error varies and confuses me; is it 'truncation' or 'discretization'?) Despite seeing the sense in that, I tend to go for comparison with analytic solutions (exx 16, 32), or mathematicophysically significant test-cases from the literature (exx 27, 29); that's just my aesthetics though.

Just one remark though, this code (in current state) would take ~1min or longer (depending on the platform) to execute, which I believe is bit longer than some of the examples

I do think it better to keep the examples in docs/examples short. Especially as the number increases, it's going to take longer and longer to run the tests/test_examples.py. (As aired recently on #443, I'm running on some pretty antiquated hardware.)

But if it's not possible to demonstrate the features and techniques in a small version of the example, we probably do more generally want to be thinking about a separate gallery in which more demanding cases can be showcased; like https://mfem.org/gallery/. Does hyperelasticity make sense in reduced dimensions, like axial symmetry or plane strain or stress? That also has the advantage that the script can use skfem.visuals.matplotlib. (Nice work on vedo #361 too though, thanks for that.)

@gdmcbain
Copy link
Contributor Author

gdmcbain commented Jul 29, 2020

Also one needs to lower the tolerance in scipy.linalg.root.

(I take it that that's scipy.optimize.root #119,)

Oh, so you're actually using JFNK in the latest version of the example? Cool. So it's competitive in some regime of size? And you don't need to assemble the Jacobian matrix at all?

Last night I finished reading the Knoll & Keyes (2004) paper. It's compelling; I think the general approach to nonlinearity could be a very good fit to scikit-fem & scipy.optimize.root (once we work out what all the options for that mean). K. & K. do stress (like @kinnala) that for larger problems, it's all about preconditioning (§7)

Preconditioning is where the battle for scalability is won or lost

and they are actually mostly interested in really large problems (for which the Jacobian is to be avoided), but for small demos where the point is the correctness of the discrete formulation of the physics via the residual, it's also ideal.

  • Knoll, D. A. & Keyes, D. E. (2004). Jacobian-free Newton-Krylov methods: A survey of approaches and applications. Journal of Computational Physics, 193, 357--397. doi: 10.1016/j.jcp.2003.08.010

@bhaveshshrimali
Copy link
Contributor

I do think it better to keep the examples in docs/examples short

Makes sense. That's the whole premise of examples.

Does hyperelasticity make sense in reduced dimensions, like axial symmetry or plane strain or stress?

Yes, infact I was thinking about a plane strain problem. I would need to reformulate the problem a bit, but shouldn't be too bad.

(I take it that that's scipy.optimize.root #119,)

Yeah, my bad. It is scipy.optimize.root.

Oh, so you're actually using JFNK in the latest version of the example? Cool. So it's competitive in some regime of size? And you don't need to assemble the Jacobian matrix at all?

Yes, it works surprisingly well. I don't assemble the Jacobian at all. The syntax was a bit confusing to me, but after reading properly this morning, I finally understood that the options["jac_options"]["method"], if given a function, needs to emulate the iterative solvers in scipy.sparse.linalg and not simply return the jacobian

In any case options["disp"] = True doesn't print the number of inner Krylov iterations, so nothing to make of how good/bad the Krylov method is doing. Also, nothing changes when I provide the function that calculates jacobian naively i.e. jac=jacobian like

u = root(residual, u, method='krylov', jac=jacobian, tol=1.e-10,
    options={
        'disp': True, "line_search": "wolfe"
    }
    ).x

which makes me think it is simply ignored when method="krylov" is specified.

Last night I finished reading the Knoll & Keyes (2004) paper. It's compelling; I think the general approach to nonlinearity could be a very good fit to scikit-fem & scipy.optimize.root (once we work out what all the options for that mean).

Thanks for the reference. I will try to take a look at it over the weekend. I still think that the solving can be improved (wrt speed) but would need to take a proper look at the documentation of the options in scipy. I tried naively pasting the option that they exemplify for the preconditioner for the inner Krylov method (inner_M), but it ruins the convergence.

My current target it to clean up the code, checkpoint it and then maybe test another more complicated hyperelastic free-energy function. Once that works (in a reasonable time), then moving over to incompressibility using mixed formulation (maybe even the non-conforming Crouzeix-Raviart and then conforming). For now, I can always implement in FEniCS and cross-check the solutions.

@bhaveshshrimali
Copy link
Contributor

Also I don't have scikits.umfpack installed on my windows machine, and I can't because of a broken Microsoft Visual C++ (need to fix it). Though not affecting the speed significantly in the original hand-written solve I realized that using umfpack does improve the speed for larger problems (the default SuperLU is super low).

Thanks again! I will probably be back on this towards Friday. And, yes also looking to experiment with jax as for more complicated problems AD could be useful (sympy is always there though AD is gaining popularity almost everywhere).

@kinnala
Copy link
Owner

kinnala commented Jul 29, 2020

Part of the motivation for having those examples in that other repo rather than here was that I wasn't sure about the copyright or licensing status of the FEniCS originals; I'm not a lawyer.

I haven't been too worried about the licensing of the examples because they are separate from the main source code, depend on skfem proper and not the other way around.

Strictly speaking all examples that depend on, e.g., pygmsh are GPLv3 due to the strong copyleft feature. If you feel that we should be explicit about this, then we can simply add a comment to the docstring about the differing license.

Edit: By separate, I mean that when installing skfem via the main distribution channel, we distribute only the contents of the directory skfem:

(testenv) conda-shell-chrootenv:tom@tunkki:~/src/scikit-fem$ pip show scikit-fem -f | egrep '.py$'
  skfem/__init__.py
  skfem/assembly/__init__.py
  skfem/assembly/basis/__init__.py
  skfem/assembly/basis/basis.py
  skfem/assembly/basis/facet_basis.py
  skfem/assembly/basis/interior_basis.py
  skfem/assembly/dofs.py
  skfem/assembly/form/__init__.py
  skfem/assembly/form/bilinear_form.py
  skfem/assembly/form/form.py
  skfem/assembly/form/form_parameters.py
  skfem/assembly/form/functional.py
  skfem/assembly/form/linear_form.py
  skfem/element/__init__.py
  skfem/element/discrete_field.py
  skfem/element/element.py
  skfem/element/element_composite.py
  skfem/element/element_global.py
  skfem/element/element_h1.py
  skfem/element/element_hcurl.py
  skfem/element/element_hdiv.py
  skfem/element/element_hex/__init__.py
  skfem/element/element_hex/element_hex1.py
  skfem/element/element_hex/element_hex_s2.py
  skfem/element/element_line/__init__.py
  skfem/element/element_line/element_line_hermite.py
  skfem/element/element_line/element_line_mini.py
  skfem/element/element_line/element_line_p1.py
  skfem/element/element_line/element_line_p2.py
  skfem/element/element_line/element_line_pp.py
  skfem/element/element_quad/__init__.py
  skfem/element/element_quad/element_quad0.py
  skfem/element/element_quad/element_quad1.py
  skfem/element/element_quad/element_quad2.py
  skfem/element/element_quad/element_quad_bfs.py
  skfem/element/element_quad/element_quad_dg.py
  skfem/element/element_quad/element_quad_s2.py
  skfem/element/element_quad/element_quadp.py
  skfem/element/element_tet/__init__.py
  skfem/element/element_tet/element_tet_mini.py
  skfem/element/element_tet/element_tet_n0.py
  skfem/element/element_tet/element_tet_p0.py
  skfem/element/element_tet/element_tet_p1.py
  skfem/element/element_tet/element_tet_p2.py
  skfem/element/element_tet/element_tet_rt0.py
  skfem/element/element_tri/__init__.py
  skfem/element/element_tri/element_tri_argyris.py
  skfem/element/element_tri/element_tri_dg.py
  skfem/element/element_tri/element_tri_mini.py
  skfem/element/element_tri/element_tri_morley.py
  skfem/element/element_tri/element_tri_p0.py
  skfem/element/element_tri/element_tri_p1.py
  skfem/element/element_tri/element_tri_p2.py
  skfem/element/element_tri/element_tri_rt0.py
  skfem/element/element_vector_h1.py
  skfem/helpers.py
  skfem/importers/__init__.py
  skfem/importers/meshio.py
  skfem/io/__init__.py
  skfem/io/json.py
  skfem/io/meshio.py
  skfem/mapping/__init__.py
  skfem/mapping/mapping.py
  skfem/mapping/mapping_affine.py
  skfem/mapping/mapping_isoparametric.py
  skfem/mapping/mapping_mortar.py
  skfem/mesh/__init__.py
  skfem/mesh/mesh.py
  skfem/mesh/mesh2d/__init__.py
  skfem/mesh/mesh2d/mesh2d.py
  skfem/mesh/mesh2d/mesh_quad.py
  skfem/mesh/mesh2d/mesh_tri.py
  skfem/mesh/mesh3d/__init__.py
  skfem/mesh/mesh3d/mesh3d.py
  skfem/mesh/mesh3d/mesh_hex.py
  skfem/mesh/mesh3d/mesh_tet.py
  skfem/mesh/mesh_line.py
  skfem/models/__init__.py
  skfem/models/elasticity.py
  skfem/models/general.py
  skfem/models/helpers.py
  skfem/models/poisson.py
  skfem/quadrature.py
  skfem/utils.py
  skfem/version.py
  skfem/visuals/__init__.py
  skfem/visuals/matplotlib.py

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Jul 31, 2020

A complete script as of now (will experiment a bit more following #449 starting tomorrow)

import meshio
import numpy as np
from numba import jit
from opt_einsum import contract
from scipy.optimize import root, least_squares
from scipy.sparse.linalg import spilu, LinearOperator
from typing import List, Optional
from skfem.io import from_meshio
from skfem.helpers import ddot, grad, dot, transpose
from skfem import *

# define inverse:
@jit(cache=True, nopython=True, nogil=True, parallel=True)
def vdet(A):
    detA = np.zeros_like(A[0, 0])
    detA = A[0, 0] * (A[1, 1] * A[2, 2] -
                    A[1, 2] * A[2, 1]) -\
        A[0, 1] * (A[1, 0] * A[2, 2] -
                    A[1, 2] * A[2, 0]) +\
        A[0, 2] * (A[1, 0] * A[2, 1] -
                    A[1, 1] * A[2, 0])
    return detA

@jit(cache=True, nopython=True, nogil=True, parallel=True)
def vinv(A):
    invA = np.zeros_like(A)
    detA = vdet(A)
    invA[0, 0] = (-A[1, 2] * A[2, 1] +
                A[1, 1] * A[2, 2]) / detA
    invA[1, 0] = (A[1, 2] * A[2, 0] -
                A[1, 0] * A[2, 2]) / detA
    invA[2, 0] = (-A[1, 1] * A[2, 0] +
                A[1, 0] * A[2, 1]) / detA
    invA[0, 1] = (A[0, 2] * A[2, 1] -
                A[0, 1] * A[2, 2]) / detA
    invA[1, 1] = (-A[0, 2] * A[2, 0] +
                A[0, 0] * A[2, 2]) / detA
    invA[2, 1] = (A[0, 1] * A[2, 0] -
                A[0, 0] * A[2, 1]) / detA
    invA[0, 2] = (-A[0, 2] * A[1, 1] +
                A[0, 1] * A[1, 2]) / detA
    invA[1, 2] = (A[0, 2] * A[1, 0] -
                A[0, 0] * A[1, 2]) / detA
    invA[2, 2] = (-A[0, 1] * A[1, 0] +
                A[0, 0] * A[1, 1]) / detA
    return invA, detA

def firstPKStress(u):
    F = np.zeros_like(grad(u))
    F[0,0] += 1.
    F[1,1] += 1.
    F[2,2] += 1.
    
    F += grad(u)
    Finv, J = vinv(F)

    return mu * F - mu * transpose(Finv) + lmbda * J * (J - 1) * transpose(Finv)

def jacobianPK(u):
    F = np.zeros_like(grad(u))
    F[0,0] += 1.
    F[1,1] += 1.
    F[2,2] += 1.

    F += grad(u)
    eye = np.zeros_like(F)
    for i in range(3):
        eye[i,i] += 1.

    Finv, J = vinv(F)
    dFdF = contract("ik...,jl...->ijkl...", eye, eye)
    dFinvdF = contract("jk...,li...->ijkl...", Finv, Finv)
    C = mu * dFdF + mu * dFinvdF -\
        lmbda * J * (J- 1) * dFinvdF +\
            lmbda * (2*J - 1) * J * contract("ji...,lk...->ijkl...", Finv, Finv)
    return C

def restBoundary(x):
    """
    returns the dofs that are not located on the left/right face
    for application of surface traction
    """
    topBottom = np.logical_or(
        np.isclose(x[1], 0.), np.isclose(x[1], 1.)
    )
    frontBack = np.logical_or(
        np.isclose(x[2], 0.), np.isclose(x[2], 1.)
    )
    leftRight = np.logical_or(
        np.isclose(x[0], 1.), np.isclose(x[0], 0.)
    )
    return np.logical_and(np.logical_or(topBottom, frontBack), np.logical_not(leftRight))

msh = meshio.xdmf.read("fenicsmesh.xdmf")

# import mesh in skfem
mesh = from_meshio(msh)

elem = ElementTetP1()
uelem = ElementVectorH1(elem)
iBasis = InteriorBasis(mesh, uelem, intorder=2)
fBasis = FacetBasis(mesh, uelem, intorder=2)
u = np.zeros(iBasis.N) #this takes care of dimension

# materialParams and init
bodyForce = np.array([0., -1./2, 0])
surfaceTraction = np.array([0.1, 0, 0]) 
E, nu = 10., 0.3
mu = E/2/(1+nu)
lmbda = 2*mu*nu/(1-2*nu)
dofs = {
    "left":  iBasis.get_dofs(lambda x: np.isclose(x[0], 0.)),
    "right": iBasis.get_dofs(lambda x: np.isclose(x[0], 1.))
    }
# assign DirichletBC
# variables used in the FEniCS demo
scale = y0 = z0 = 0.5
theta = np.pi/3.

u1Right = 0.
u2Right = lambda x,y,z: scale*(y0 + (y - y0)*np.cos(theta) - (z - z0)*np.sin(theta) - y)
u3Right = lambda x,y,z: scale*(z0 + (y - y0)*np.sin(theta) + (z - z0)*np.cos(theta) - z)

u[dofs["left"].nodal['u^1']] = 0.
u[dofs["left"].nodal['u^2']] = 0.
u[dofs["left"].nodal['u^3']] = 0.
u[dofs["right"].nodal['u^1']] = 0.

dofs2Right = dofs["right"].nodal['u^2']
dofs3Right = dofs["right"].nodal['u^3']

u[dofs2Right] = u2Right(*iBasis.doflocs[:, dofs2Right]) 
u[dofs3Right] = u3Right(*iBasis.doflocs[:, dofs3Right])

I = iBasis.complement_dofs(dofs)
D = iBasis.get_dofs(lambda x: np.logical_or(np.isclose(x[0], 0.), np.isclose(x[0], 1.))).all()

@LinearForm
def rhs(v, w):
    x = w.x
    return np.einsum("ij...,ij...", firstPKStress(w["w"]), grad(v)) - dot(bodyForce, v)

@LinearForm
def rhsSurf(v, w):
    return - dot(surfaceTraction, v) * (restBoundary(w.x))

@BilinearForm
def jacNewton(u, v, w):
    return np.einsum('ijkl...,ij...,kl...', jacobianPK(w["w"]), grad(u), grad(v))

def residual(x: np.ndarray) -> np.ndarray:
    xfull = np.empty_like(u)
    xfull[I] = x
    xfull[D] = u[D]
    res = asm(rhs, iBasis, w=iBasis.interpolate(xfull)) + asm(rhsSurf, fBasis)
    return res[I]

def jacobian(u: np.ndarray) -> np.ndarray:
    jacb = asm(jacNewton, iBasis, w=iBasis.interpolate(u))
    return jacb

M = LinearOperator([len(I)]*2, matvec=lambda x: x)

def update(x: np.ndarray,
        _: Optional[np.ndarray] = None,
        i: List[int] = [0],
        period: int = 10) -> None:

    if i[0] % period == 0:

        print('Updating Jacobian preconditioner.')

        u_full = np.empty_like(u)
        u_full[I] = x
        u_full[D] = u[D]

        J = asm(jacNewton, iBasis, w=iBasis.interpolate(u_full))
        JI = condense(J, D=D, expand=False).tocsc()
        JI_ilu = spilu(JI)
        M.matvec = JI_ilu.solve

    i[0] += 1

M.update = update
M.update(u[I])    

# JFNK
u[I] = root(residual, u[I], method='krylov', tol=1.e-10,
options={
    'disp': True, "line_search": "wolfe",
    "jac_options":{
        "method":"lgmres", "inner_M": M
    }
}
).x

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 5, 2020

Sorry I got busy with other things, so just came back to this now. Following the discussion here I changed inner_M to use the jacobian from linear elasticity as the preconditioner.

def jacobianLinear(u: np.ndarray) -> np.ndarray:
    jacb = asm(linear_elasticity(mu, lmbda), iBasis)
    return jacb

With this, there is some gain in terms of the total time taken for the solve:

Updating Jacobian preconditioner.
0:  |F(x)| = 0.0191315; step 1
1:  |F(x)| = 0.00411836; step 1
2:  |F(x)| = 0.000316193; step 1
3:  |F(x)| = 1.63828e-06; step 1
4:  |F(x)| = 5.44201e-09; step 1
5:  |F(x)| = 1.77117e-11; step 1

As an aside, I had previously used the canned solvers in PyAMG along with Newton's method to solve problems in hyperelasticity back in one of my classes with Luke here at Illinois. (My bad I didn't document the code properly then)

This is because smoother aggregation multigrid, as in PyAMG when supplied with the near null space of the assembled jacobian works really well for Hyperelasticity. Will experiment with it to see if I can create a suitable preconditioner to be used in this case.

Also took a brief look at the discussion in #450. I had thought of rewriting the jacobian to see if it could be jit-ed (after naively trying to jit the forms) using numba but the discussion therein provides a really good starting point to be able to use it (even though it is cumbersome) for other more complicated forms.

Thanks so much.

@kinnala
Copy link
Owner

kinnala commented Aug 5, 2020

Also took a brief look at the discussion in #450. I had thought of rewriting the jacobian to see if it could be jit-ed (after naively trying to jit the forms) using numba but the discussion therein provides a really good starting point to be able to use it (even though it is cumbersome) for other more complicated forms.

I was playing with the idea of creating a variant of skfem.helpers called skfem.helpers_numba (or something) which has versions of the helpers that are more suitable for Numba's @jit decorator. I'm not sure how Numba handles function calls but I read somewhere that as long as everything is decorated with @jit it should be able to inline and compile them in one go.

Edit: This would include having a variant of BilinearForm._kernel which would also be written via explicit loops.

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 5, 2020

I just rewrote the original Newton's method written explicitly with the residual (sans the forcing terms) and jacobian both jit-ed and now it takes about the same time as the JFNK approach (infact slightly better than JFNK) with the inverse jacobian from linear elasticity as the preconditioner.

JIT-ed functions for the residual and the jacobian

@jit(nopython=True, cache=True, fastmath=True)
def numbares(dv, dw):
    out = np.zeros_like(dv[0,0])
    F = np.zeros_like(dw)
    for a in range(dw.shape[2]):
        for b in range(dw.shape[3]):
            F[0, 0, a, b] += 1
            F[1, 1, a, b] += 1
            F[2, 2, a, b] += 1
    
    F += dw
    J = np.zeros_like(out)
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            J[a,b] += F[0, 0, a, b] * (F[1, 1, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 1, a, b]) -\
                    F[0, 1, a, b] * (F[1, 0, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 0, a, b]) +\
                    F[0, 2, a, b] * (F[1, 0,a ,b] * F[2, 1, a, b] -
                                        F[1, 1, a, b] * F[2, 0, a, b])
    Finv = np.zeros_like(F)
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            Finv[0, 0, a, b] += (-F[1, 2, a, b] * F[2, 1, a, b] +
                                F[1, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 0, a, b] += (F[1, 2, a, b] * F[2, 0, a, b] -
                                F[1, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 0, a, b] += (-F[1, 1, a, b] * F[2, 0, a, b] +
                                F[1, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 1, a, b] += (F[0, 2, a, b] * F[2, 1, a, b] -
                                F[0, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 1, a, b] += (-F[0, 2, a, b] * F[2, 0, a, b] +
                                F[0, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 1, a, b] += (F[0, 1, a, b] * F[2, 0, a, b] -
                                F[0, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 2, a, b] += (-F[0, 2, a, b] * F[1, 1, a, b] +
                                F[0, 1, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[1, 2, a, b] += (F[0, 2, a, b] * F[1, 0, a, b] -
                                F[0, 0, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[2, 2, a, b] += (-F[0, 1, a, b] * F[1, 0, a, b] +
                                F[0, 0, a, b] * F[1, 1, a, b]) / J[a, b]
    
    for i in range(dv.shape[0]):
        for j in range(dv.shape[0]):
            for a in range(J.shape[0]):
                for b in range(J.shape[1]):
                    out[a,b] += (mu * F[i, j, a, b] - mu * Finv[j, i, a, b] + lmbda * J[a,b] * (J[a,b] - 1.) * Finv[j, i, a, b]) * dv[i, j, a, b]
    
    return out


@jit(nopython=True, cache=True, fastmath=True)
def numbajac(du, dv, dw):
    out = np.zeros_like(du[0, 0])
    F = np.zeros_like(dw)
    for a in range(dw.shape[2]):
        for b in range(dw.shape[3]):
            F[0, 0, a, b] += 1
            F[1, 1, a, b] += 1
            F[2, 2, a, b] += 1
    
    F += dw
    J = np.zeros_like(out)
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            J[a,b] += F[0, 0, a, b] * (F[1, 1, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 1, a, b]) -\
                    F[0, 1, a, b] * (F[1, 0, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 0, a, b]) +\
                    F[0, 2, a, b] * (F[1, 0,a ,b] * F[2, 1, a, b] -
                                        F[1, 1, a, b] * F[2, 0, a, b])
    Finv = np.zeros_like(F)
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            Finv[0, 0, a, b] += (-F[1, 2, a, b] * F[2, 1, a, b] +
                                F[1, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 0, a, b] += (F[1, 2, a, b] * F[2, 0, a, b] -
                                F[1, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 0, a, b] += (-F[1, 1, a, b] * F[2, 0, a, b] +
                                F[1, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 1, a, b] += (F[0, 2, a, b] * F[2, 1, a, b] -
                                F[0, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 1, a, b] += (-F[0, 2, a, b] * F[2, 0, a, b] +
                                F[0, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 1, a, b] += (F[0, 1, a, b] * F[2, 0, a, b] -
                                F[0, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 2, a, b] += (-F[0, 2, a, b] * F[1, 1, a, b] +
                                F[0, 1, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[1, 2, a, b] += (F[0, 2, a, b] * F[1, 0, a, b] -
                                F[0, 0, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[2, 2, a, b] += (-F[0, 1, a, b] * F[1, 0, a, b] +
                                F[0, 0, a, b] * F[1, 1, a, b]) / J[a, b]

    kron = np.eye(dw.shape[0])
    for i in range(du.shape[0]):
        for j in range(du.shape[0]):
            for k in range(du.shape[0]):
                for l in range(du.shape[0]):
                    for a in range(J.shape[0]):
                        for b in range(J.shape[1]):
                            out[a, b] += (mu * kron[i, k] * kron[j, l] +\
                                (mu - lmbda * J[a, b] * (J[a, b] - 1)) * Finv[j, k, a, b] * Finv[l, i, a, b] +\
                                        lmbda * (2 * J[a, b] - 1) * J[a, b] * Finv[j,i,a,b] * Finv[l,k,a,b])\
                                * du[i, j, a, b] * dv[k, l, a, b]
    return out


@BilinearForm
def jac2(u, v, w):
    return numbajac(*(grad(u), grad(v), grad(w['w'])))

@LinearForm
def res2(v, w):
    return numbares(*(grad(v), grad(w["w"])))

Newton iterations

for itr in range(100):
    print(itr+1)
    w = iBasis.interpolate(u)
    Fres = asm(rhs, iBasis, w=w) + asm(rhsSurf, fBasis) + asm(res2, iBasis, w=w)
    Kres = asm(jac2, iBasis, w=w)
    delu = solve(*condense(Kres, -Fres, I=I), use_umfpack=True)
    u += delu  # try a full Newton Step
    normu = np.linalg.norm(delu)
    print(normu)
    if  normu < 1e-8:
        break

Output

1
9.19498128312998
2
1.381262471959698
3
0.2262866125841482
4
0.0036443534041997985
5
8.957027141211203e-06
6
2.0377558917947237e-10

The total wall time now is about ~100s with ~21k dofs (free + prescribed).
Let me know what you think of this. It does significantly increase the code-length but still not too bad in the sense that a couple of more modifications and it should be able to handle any compressible hyperelastic model.

I will post the complete code after some clean up.

@gdmcbain
Copy link
Contributor Author

gdmcbain commented Aug 5, 2020

Looks like good progress. I assume we'll find a way to tidy up the numba stuff so it doesn't unduly lengthen the top-level scripts; no need to worry about that for now.

Of course beyond the immediate speed tests showing that it

takes about the same time

the more important question is scalability, but that can be left for later too.

In general I think continuation will be helpful but isn't needed for this example as the Newton iteration converges quickly.

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 6, 2020

Thanks @kinnala ! So this version of a hyperelasticity demo should be complete then? Please let me know if you think this is suitable as a core example illustrating scikit-fem or more of an applied example like in fenics-tuto-in-skfem. Practically the latter would be better because of the long execution time.

The JFNK approach (thanks to @gdmcbain ) can be either a separate file or within the same code in a conditional.

Also, some elementary profiling results using cProfile


         7132762 function calls (6679064 primitive calls) in 120.629 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      864   72.150    0.084   72.150    0.084 hyperElasticity.py:156(numbajac)
        6   14.705    2.451   14.705    2.451 {built-in method scikits.umfpack.__umfpack.umfpack_dl_numeric}
      216    6.893    0.032    6.898    0.032 {built-in method _imp.create_dynamic}
        3    3.184    1.061    4.454    1.485 mesh3d.py:33(boundary_edges)
        2    2.703    1.352    2.945    1.473 {method 'Start' of 'vtkmodules.vtkRenderingCore.vtkRenderWindowInteractor' objects}
     2211    2.312    0.001    2.312    0.001 {method 'reduce' of 'numpy.ufunc' objects}
        1    2.067    2.067    2.067    2.067 {method 'GetScreenSize' of 'vtkmodules.vtkRenderingOpenGL2.vtkXOpenGLRenderWindow' objects}
       72    1.890    0.026    1.890    0.026 hyperElasticity.py:105(numbares)

numbajac takes the bulk of the time as expected. And the rest of them (sans the vtk calls) seem reasonable to me considering the small problem size (~21k dofs). Infact as I see it, the jacobian taking long times is what would be a bottleneck to test scalability, right ? At least as long as the assembly cannot be parallelized (?)

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 6, 2020

Complete code after some cleaning:

import numpy as np
from skfem.io import from_meshio
from skfem.helpers import grad, dot
from numba import jit
from vedo import show as vshow
from vedo import Mesh as vMesh
from time import time
import meshio
from skfem import *


def restBoundary(x):
    """
    returns the dofs that are not located on the left/right face
    for applying surface traction
    """
    topBottom = np.logical_or(
        np.isclose(x[1], 0.*x[1],atol=1e-4), np.isclose(x[1], np.ones(x[1].shape),atol=1e-4)
    )
    frontBack = np.logical_or(
        np.isclose(x[2], 0.*x[2],atol=1e-4), np.isclose(x[2], np.ones(x[2].shape),atol=1e-4)
    )
    leftRight = np.logical_or(
        np.isclose(x[0], np.ones(x[0].shape),atol=1e-4), np.isclose(x[0], 0.*x[0],atol=1e-4)
    )
    return np.logical_and(np.logical_or(topBottom, frontBack), np.logical_not(leftRight))

@jit(nopython=True, cache=True, fastmath=True)
def vdet(F):
    J = np.zeros_like(F[0,0])
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            J[a,b] += F[0, 0, a, b] * (F[1, 1, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 1, a, b]) -\
                    F[0, 1, a, b] * (F[1, 0, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 0, a, b]) +\
                    F[0, 2, a, b] * (F[1, 0,a ,b] * F[2, 1, a, b] -
                                        F[1, 1, a, b] * F[2, 0, a, b])
    return J


@jit(nopython=True, cache=True, fastmath=True)
def vinv(F):
    J = vdet(F)
    Finv = np.zeros_like(F)
    for a in range(J.shape[0]):
        for b in range(J.shape[1]):
            Finv[0, 0, a, b] += (-F[1, 2, a, b] * F[2, 1, a, b] +
                                F[1, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 0, a, b] += (F[1, 2, a, b] * F[2, 0, a, b] -
                                F[1, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 0, a, b] += (-F[1, 1, a, b] * F[2, 0, a, b] +
                                F[1, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 1, a, b] += (F[0, 2, a, b] * F[2, 1, a, b] -
                                F[0, 1, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[1, 1, a, b] += (-F[0, 2, a, b] * F[2, 0, a, b] +
                                F[0, 0, a, b] * F[2, 2, a, b]) / J[a, b]
            Finv[2, 1, a, b] += (F[0, 1, a, b] * F[2, 0, a, b] -
                                F[0, 0, a, b] * F[2, 1, a, b]) / J[a, b]
            Finv[0, 2, a, b] += (-F[0, 2, a, b] * F[1, 1, a, b] +
                                F[0, 1, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[1, 2, a, b] += (F[0, 2, a, b] * F[1, 0, a, b] -
                                F[0, 0, a, b] * F[1, 2, a, b]) / J[a, b]
            Finv[2, 2, a, b] += (-F[0, 1, a, b] * F[1, 0, a, b] +
                                F[0, 0, a, b] * F[1, 1, a, b]) / J[a, b]
    return Finv

@jit(nopython=True, cache=True, fastmath=True)
def numbares(dv, dw):
    out = np.zeros_like(dv[0,0])
    F = np.zeros_like(dw)
    for a in range(dw.shape[2]):
        for b in range(dw.shape[3]):
            F[0, 0, a, b] += 1
            F[1, 1, a, b] += 1
            F[2, 2, a, b] += 1
    
    F += dw
    J = vdet(F)
    Finv = vinv(F)

    for i in range(dv.shape[0]):
        for j in range(dv.shape[0]):
            for a in range(J.shape[0]):
                for b in range(J.shape[1]):
                    out[a,b] += (mu * F[i, j, a, b] - mu * Finv[j, i, a, b] + lmbda * J[a,b] * (J[a,b] - 1.) * Finv[j, i, a, b]) * dv[i, j, a, b]
    
    return out

@jit(nopython=True, cache=True, fastmath=True)
def numbajac(du, dv, dw):
    out = np.zeros_like(du[0, 0])
    F = np.zeros_like(dw)
    for a in range(dw.shape[2]):
        for b in range(dw.shape[3]):
            F[0, 0, a, b] += 1
            F[1, 1, a, b] += 1
            F[2, 2, a, b] += 1
    
    F += dw
    J = vdet(F)
    Finv = vinv(F)

    kron = np.eye(dw.shape[0])
    for i in range(du.shape[0]):
        for j in range(du.shape[0]):
            for k in range(du.shape[0]):
                for l in range(du.shape[0]):
                    for a in range(J.shape[0]):
                        for b in range(J.shape[1]):
                            out[a, b] += (mu * kron[i, k] * kron[j, l] +\
                                (mu - lmbda * J[a, b] * (J[a, b] - 1)) * Finv[j, k, a, b] * Finv[l, i, a, b] +\
                                        lmbda * (2 * J[a, b] - 1) * J[a, b] * Finv[j,i,a,b] * Finv[l,k,a,b])\
                                * du[i, j, a, b] * dv[k, l, a, b]
    return out

t1 = time()

msh = meshio.xdmf.read("fenicsmesh.xdmf")
mesh = from_meshio(msh)

elem = ElementTetP1()
uelem = ElementVectorH1(elem)
iBasis = InteriorBasis(mesh, uelem, intorder=1)
fBasis = FacetBasis(mesh, uelem, intorder=1)
u = np.zeros(iBasis.N)

bodyForce = np.array([0., -1./2, 0])
surfaceTraction = np.array([0.1, 0, 0]) 
E, nu = 10., 0.3
mu = E/2/(1+nu)
lmbda = 2*mu*nu/(1-2*nu)
dofs = {
    "left":  iBasis.get_dofs(lambda x: np.isclose(x[0], 0.*x[0],atol=1e-4)),
    "right": iBasis.get_dofs(lambda x: np.isclose(x[0], np.ones(x[0].shape),atol=1e-4))
    }

scale = y0 = z0 = 0.5
theta = np.pi/3.

u1Right = 0.
u2Right = lambda x,y,z: scale*(y0 + (y - y0)*np.cos(theta) - (z - z0)*np.sin(theta) - y)
u3Right = lambda x,y,z: scale*(z0 + (y - y0)*np.sin(theta) + (z - z0)*np.cos(theta) - z)

u[dofs["left"].nodal['u^1']] = 0.
u[dofs["left"].nodal['u^2']] = 0.
u[dofs["left"].nodal['u^3']] = 0.
u[dofs["right"].nodal['u^1']] = 0.

dofs2Right = dofs["right"].nodal['u^2']
dofs3Right = dofs["right"].nodal['u^3']

u[dofs2Right] = u2Right(*iBasis.doflocs[:, dofs2Right]) 
u[dofs3Right] = u3Right(*iBasis.doflocs[:, dofs3Right]) 

I = iBasis.complement_dofs(dofs)
D = iBasis.get_dofs(lambda x: np.logical_or(np.isclose(x[0], 0.*x[0],atol=1e-4), np.isclose(x[0], np.ones(x[0].shape),atol=1e-4))).all()

@LinearForm
def rhs(v, w):
    return - dot(bodyForce, v)

@LinearForm
def rhsSurf(v, w):
    return - dot(surfaceTraction, v) * (restBoundary(w.x))

@BilinearForm
def jac2(u, v, w):
    return numbajac(*(grad(u), grad(v), grad(w['w'])))

@LinearForm
def res2(v, w):
    return numbares(*(grad(v), grad(w["w"])))

for itr in range(100):
    w = iBasis.interpolate(u)
    Fres = asm(rhs, iBasis, w=w) + asm(rhsSurf, fBasis) + asm(res2, iBasis, w=w)
    Kres = asm(jac2, iBasis, w=w)
    delu = solve(*condense(Kres, -Fres, I=I), use_umfpack=True)
    u += delu 
    normu = np.linalg.norm(delu)
    print(f"Newton iter: {itr+1}, norm_du: {normu}")
    if  normu < 1e-8:
        break

print(time() - t1)
disps = meshio.Mesh(
    mesh.p.T, cells={"tetra":mesh.t.T}, point_data={"u":u[iBasis.nodal_dofs].T}
)
meshio.xdmf.write("trialFEniCSCleaned.xdmf", disps)

# Comparison with FEniCS results

solFEniCS = meshio.xdmf.read("uFEniCS.xdmf")
normDiff = np.linalg.norm(disps.point_data["u"] - solFEniCS.point_data["uFE"],axis=1).max()
print(f"Absolute difference between the dofs from FEniCS and skfem: {normDiff}")

# Visualization: using `vedo` 
uf = disps.point_data["u"]
unorm = np.linalg.norm(uf, axis=1)
dispMesh = vMesh([mesh.p.T, mesh.t.T]).warpByVectors(uf).pointColors(unorm, cmap="jet").addScalarBar()
vshow(dispMesh)

@gdmcbain
Copy link
Contributor Author

gdmcbain commented Aug 6, 2020

The JFNK approach (thanks to @gdmcbain ) can be either a separate file or within the same code in a conditional.

i'd say leave it in a separate file for now. There are quite a few things to experiment with in JFNK (e.g. approximating the Jacobian in the preconditioner so it's cheaper to evaluate, not updating the Jacobian at every Newton iteration so ncalls comes down, trying AMG instead of ILU for the preconditioning, and then various parameters for ILU or AMG, ...).

@kinnala
Copy link
Owner

kinnala commented Aug 6, 2020

Just out of curiosity: if the scikit-fem version takes 100s, how long does the same example take in FEniCS (just to get an idea how far we are from C++ performance in this case)?

@bhaveshshrimali
Copy link
Contributor

if the scikit-fem version takes 100s, how long does the same example take in FEniCS (just to get an idea how far we are from C++ performance in this case)?

About ~25s. This is from the second time onwards as it jit compiles everything the first time. And just to have an idea, there the bulk of the time is indeed taken by the solve

         1539062 function calls (1516009 primitive calls) in 32.724 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   24.172   24.172   24.179   24.179 solving.py:243(_solve_varproblem)

@kinnala
Copy link
Owner

kinnala commented Aug 6, 2020

And just to have an idea, there the bulk of the time is indeed taken by the solve

It's not clear to me that this timing properly separates assembly time and linear solve time.

Looking at the source code of solving.py it seems that _solve_varproblem includes both:
https://bitbucket.org/fenics-project/dolfin/src/946dbd3e268dc20c64778eb5b734941ca5c343e5/python/dolfin/fem/solving.py#lines-276

I assume linear solve time is more or less equal in both cases because no magical differences there. Your previous profiling indicates that in scikit-fem linear solve takes a total of 15 seconds. Assuming that linear solve time is more or less equal, the assembly of the Jacobian takes approximately a total of 10 seconds in FEniCS and 70 seconds in scikit-fem.

It's actually surprisingly good, I would have expected a larger difference given the amount of optimizations done automatically by FEniCS.

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 6, 2020

Looking at the source code of solving.py it seems that _solve_varproblem includes both:

Ah, right my bad. Indeed it does both. I too think the time to solve should be the same (at least for this case) although I am using the generic solve() syntactic sugar in FEniCS instead of fine tuning the solver (but even doing that wouldn't make much of a difference I think)**

**Edit: for this case

@bhaveshshrimali
Copy link
Contributor

bhaveshshrimali commented Aug 6, 2020

In yet another attempt to speed up the calculation of jacobian/residual, I tried experimenting with numba's prange at appropriate places (and to my surprise the results are significantly encouraging)

         849799 function calls (831449 primitive calls) in 54.190 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      864   28.022    0.032   28.022    0.032 hyperElasticityCleaned.py:90(numbajac)
        6   14.294    2.382   14.294    2.382 {built-in method scikits.umfpack.__umfpack.umfpack_dl_numeric}
        2    2.073    1.036    2.953    1.477 mesh3d.py:33(boundary_edges)
     2159    1.966    0.001    1.966    0.001 {method 'reduce' of 'numpy.ufunc' objects}
       72    0.969    0.013    0.969    0.013 hyperElasticityCleaned.py:68(numbares)
        6    0.770    0.128    0.770    0.128 {built-in method scipy.sparse._sparsetools.coo_tocsr}
       12    0.663    0.055    0.698    0.058 basis.py:301(linear_combination)
        6    0.604    0.101   31.875    5.312 bilinear_form.py:42(assemble)

The total time is now under a minute. I modified the definitions of vdet and vinv to use prange instead of range and also at several other places inside numbajac and numbares. Will experiment further a bit more...

replace range with prange

@jit(nopython=True, cache=True, fastmath=True, parallel=True)
def vdet(F):
    J = np.zeros_like(F[0,0])
    for a in prange(J.shape[0]):
        for b in prange(J.shape[1]):
            J[a,b] += F[0, 0, a, b] * (F[1, 1, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 1, a, b]) -\
                    F[0, 1, a, b] * (F[1, 0, a, b] * F[2, 2, a, b] -
                                        F[1, 2, a, b] * F[2, 0, a, b]) +\
                    F[0, 2, a, b] * (F[1, 0,a ,b] * F[2, 1, a, b] -
                                        F[1, 1, a, b] * F[2, 0, a, b])
    return J

@kinnala
Copy link
Owner

kinnala commented Sep 11, 2020

I'm closing this discussion for now since the hyperelasticity example is merged. Let's create a new issue if something pops up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants