# Exploring the Point Generators

In this notebook we explore the three different point generator classes.

1. The *PointGenerator* class is the base class the other two inherit from. It can be used for Complete Intersection Calabi-Yau (CICY) manifolds with a single hypersurface, e.g. the quintic, bicubic, p1p3, the tetraquadric, or a torus, some K3s ...
2. There is the *CICYPointGenerator* which works for any CICY.
3. There is the *ToricPointGenerator* which can generate points on any CY coming from the Kreuzer-Skarke list. NOTE: as of v0.0.1 there are still some issues with finding the correct integration weights from generalizing a theorem by Shiffman and Zelditch.

The Fermat quintic manifold can be implemented in each of the three PointGenerator classes and will be used as an example.

Recall that the Fermat quintic is given by the following polynomial:

$$
Q = \sum_i^5 z_i^5
$$

in a $\mathbb{P}^4$ ambient space.

In [1]:
import numpy as np
import os as os
import pickle as pickle

## PointGenerator

First we load the PointGenerator from the cymetric package.

In [2]:
from cymetric.pointgen.pointgen import PointGenerator

In general all routines and classes of the cymetric package will have a brief description of their functionality and their arguments. You can access it with the help function.

In [3]:
help(PointGenerator)

Help on class PointGenerator in module cymetric.pointgen.pointgen:

class PointGenerator(builtins.object)
 |  PointGenerator(monomials, coefficients, kmoduli, ambient, vol_j_norm=1, verbose=2, backend='multiprocessing')
 |  
 |  The PointGenerator class.
 |  
 |  The numerics are entirely done in numpy; sympy is used for taking 
 |  (implicit) derivatives.
 |  
 |  Use this one if you want to generate points and data on a CY given by
 |  one hypersurface.
 |  
 |  All other PointGenerators inherit from this class.
 |  
 |  Example:
 |      We consider the Fermat quintic given by
 |  
 |      .. math::
 |  
 |          Q(z) = z_1^5 + z_2^5 + z_3^5 + z_4^5 + z_5^5
 |  
 |      and set it up with:
 |  
 |      >>> import numpy as np
 |      >>> from cymetric.pointgen.pointgen import PointGenerator
 |      >>> monomials = 5*np.eye(5, dtype=np.int)
 |      >>> coefficients = np.ones(5)
 |      >>> kmoduli = np.ones(1)
 |      >>> ambient = np.array([4])
 |      >>> pg = PointGenerator(monom

In the next step we define the five defining monomials, the coefficients in front of each monomial, the single Kähler moduli, and the ambient space projective factor(s):

In [4]:
monomials = 5*np.eye(5, dtype=np.int64)
coefficients = np.ones(5)
kmoduli = np.ones(1)
ambient = np.array([4])

We can now initiate the PointGenerator

In [5]:
pg = PointGenerator(monomials, coefficients, kmoduli, ambient)

and generate some (100) points:

In [6]:
points = pg.generate_points(100)
points

array([[-0.26571881-9.17297455e-01j, -0.36345359+2.42049070e-01j,
         1.        +0.00000000e+00j, -0.59842131-1.43934436e-01j,
        -0.35605605-6.91997143e-01j],
       [ 1.        -5.55111512e-17j, -0.31804611-5.70832600e-01j,
         0.78152219+6.00384439e-01j,  0.15162194-1.69715014e-02j,
         0.31991803-3.70442042e-01j],
       [ 1.        +2.77555756e-17j, -0.23516908-2.60266458e-01j,
        -0.77837294+1.05176966e-01j,  0.81948808-4.76754593e-01j,
         0.66788379+2.93563540e-01j],
       [ 0.8244501 +5.64605799e-01j, -0.16831985-6.66244130e-01j,
         1.        +0.00000000e+00j,  0.28974831+6.27956749e-01j,
        -0.05495057-4.40566702e-02j],
       [-0.18569486+7.50333720e-01j,  0.24012511-3.46593059e-01j,
         1.        +0.00000000e+00j, -0.31459026+8.85538129e-01j,
        -0.45377699-5.09986505e-02j],
       [-0.62309363+5.81626089e-02j, -0.4679331 +6.79700462e-02j,
        -0.54795213-2.22345813e-02j,  1.        +0.00000000e+00j,
        -0.9677924

We see that the largest coordinate of each point is 1+0.j. The reason for that is two-fold:

1. The numerics are more stable when we work with points, which have affine coordinates in the range of $|x_i| < 1$.
2. The coordinate with 1+0.j already specifies the ambient space patch we are working in.

We can also check if these points are all really satisfying the hypersurface equation of the Calabi-Yau.

In [7]:
pg.cy_condition(points)

array([ 1.72084569e-15+1.11022302e-15j, -1.96370697e-15+7.50094431e-15j,
       -1.92901251e-15-6.13398221e-15j, -4.45792318e-15-5.78768899e-15j,
       -3.04270498e-15-2.25514052e-16j,  8.88178420e-16+3.74006381e-15j,
       -1.55431223e-15-4.71844785e-16j, -9.99200722e-16+5.55111512e-17j,
        2.16493490e-15+1.94289029e-15j, -7.38124839e-16+2.63677968e-16j,
        3.02904403e-15-1.26027661e-15j, -2.03682286e-15-3.62199420e-15j,
        1.15879528e-15-3.94649591e-16j,  4.14016152e-16-9.76097051e-16j,
       -2.56088553e-16-3.58220398e-16j, -5.55111512e-17-2.22044605e-16j,
       -2.88657986e-15+4.44089210e-16j,  9.99200722e-16+1.31838984e-15j,
       -6.55031585e-15+2.77555756e-15j,  1.11022302e-15+2.22044605e-16j,
        2.77555756e-15-3.19189120e-16j, -3.77475828e-15-2.27595720e-15j,
       -3.31408078e-15+2.33320308e-15j,  1.52655666e-15+1.55431223e-15j,
       -4.30211422e-16-4.85722573e-16j, -2.30284541e-16-1.68268177e-16j,
        2.67841305e-15-6.93889390e-15j,  2.77555756

what we are really interested in from the *PointGenerator* is a training set for our neural networks. Such a training set can be generated as follows:

In [8]:
help(pg.prepare_dataset)

Help on method prepare_dataset in module cymetric.pointgen.pointgen:

prepare_dataset(n_p, dirname, val_split=0.1, ltails=0, rtails=0) method of cymetric.pointgen.pointgen.PointGenerator instance
    Prepares training and validation data.
    
    Args:
        n_p (int): Number of points to generate.
        dirname (str): Directory name to save dataset in.
        val_split (float, optional): train-val split. Defaults to 0.1.
        ltails (float, optional): Percentage discarded on the left tail
            of weight distribution. Defaults to 0.
        rtails (float, optional): Percentage discarded on the right tail
            of weight distribution. Defaults to 0.
    
    Returns:
        int: 0



We specify the number of points and the directory name to save the file in. Note the file will always have the name *dataset.npz*.

In [9]:
dirname = 'fermat_pg'
n_p = 100000

and generate the dataset. This will also compute and return $\kappa=\text{vol}_\text{K}/\text{vol}_\text{CY}$

In [11]:
kappa = pg.prepare_dataset(n_p, dirname);
kappa

pointgen:INFO:Vol_k: 0.16666666666666663, Vol_cy: 9.450470169034233.


0.017635806863109615

We load the dataset with

In [12]:
data = np.load(os.path.join(dirname, 'dataset.npz'))

and study its content

In [13]:
for key in data:
    print(key, type(data[key]))

X_train <class 'numpy.ndarray'>
y_train <class 'numpy.ndarray'>
X_val <class 'numpy.ndarray'>
y_val <class 'numpy.ndarray'>
val_pullbacks <class 'numpy.ndarray'>


It contains training and validation data and the validation pullbacks. You might ask, what is written in the y_true values for validation and training data, given that we don't know the exact Ricci-flat metric. The 'y_train/val' arrays contain the integration weights and $\Omega \wedge \bar\Omega$ for each point. In principle, they can be used for any relevant pointwise information that could be needed during the training process.

In [14]:
weights = data['y_val'][:, 0]
omega = data['y_val'][:, 1]

we can also compute these values directly with the *PointGenerator*. Note, that the points in 'X_train/val' are floats, because our neural network will work with real values. We can recover the complex points as follows:

In [15]:
points = data['X_val'][:, 0:pg.ncoords] + 1.j*data['X_val'][:, pg.ncoords:]

and then compute the weights

In [16]:
weights2 = pg.point_weight(points)
np.allclose(weights, weights2)

True

and the holomorphic volume form

In [17]:
omega2 = pg.holomorphic_volume_form(points)
omega2 = omega2 * np.conj(omega2)
np.allclose(omega, omega2)

True

We will have to give information of the monomials and their derivatives to the tensorflow model. For this purpose we will create pickled dictionary denoted by *BASIS*.

In [18]:
pg.prepare_basis(dirname, kappa=kappa)

0

Let's have a look at the information stored in *basis.pickle*

In [19]:
with open(os.path.join(dirname, 'basis.pickle'), 'rb') as f:
    basis = pickle.load(f)
for key in basis:
    print(key)

DQDZB0
DQDZF0
QB0
QF0
NFOLD
AMBIENT
KMODULI
NHYPER
KAPPA


So in case you want to use your own PointGenerator with our TensorFlow models you will have to create a similar basis dictionary. Here we briefly describe what each of these keys stands for. In general *Q* denotes the defining hypersurface(s) with the final integer digit denoting the hypersurface index. *D* refers to derivatives, *Z* to the ambient space coordinates, *B* to a monomials basis and *F* for factors/coefficients for each monomial.

1. "DQDZB0": $\frac{\partial Q_0}{\partial z_i}$ monomials basis of derivatives of the first (and only for quintic) hypersurface w.r.t. ambient coordinates. 
2. "DQDZF0": $\frac{\partial Q_0}{\partial z_i}$ coefficients of the monomial basis.
3. "QB0": monomials basis for first hypersurface
4. "QF0": coefficients for monomial basis.
5. "NFOLD": CY dimension.
6. "AMBIENT": degrees of projective spaces making up the ambient space.
6. "KMODULI": kähler moduli corresponding to each projective factors. Note the CY needs to be favourable, otherwise you will have some superposition.
7. "NHYPER": number of hypersurfaces.
8. "KAPPA": The ratio between the volume measures (Kahler volume over holomorphic volume)

That pretty much sums up our introduction to the *PointGenerator* class, next we will implement the Fermat quintic in the *CICYPointGenerator*.

## CICYPointGenerator

The *CICYPointGenerator* and *ToricPointGenerator* come with the same functionality and routines as the *PointGenerator*. 

First we load from the cymetric package.

In [20]:
from cymetric.pointgen.pointgen_cicy import CICYPointGenerator

In contrast to the *PointGenerator* the *CICYPointGenerator* expects a list of monomials and coefficients. We reuse our previous monomials (and coefficients) with

In [21]:
pgcicy = CICYPointGenerator([monomials], [coefficients], kmoduli, ambient)

we again create a dataset

In [22]:
dirname = 'fermat_pgcicy'

In [23]:
kappa = pgcicy.prepare_dataset(n_p, dirname)

  improvement from the last ten iterations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  r = _umath_linalg.det(a, signature=signature)
  r = _umath_linalg.det(a, signature=signature)
pointgen:INFO:Vol_k: 0.16666666666666663, Vol_cy: 9.890030356317446.


as you might have realised, this took significantly longer than before. The reason for that is also related to all the warnings. The *CICYPointGenerator* utilises *scipy.optimize.fsolve* to find solutions on CICYs. *fsolve* only provides a single root of the defining hypersurfaces and requires more involved numerics, thus leading to worse performance (also in accuracy).

We again create a basis for the tensorflow models

In [24]:
pgcicy.prepare_basis(dirname, kappa)
with open(os.path.join(dirname, 'basis.pickle'), 'rb') as f:
    basis = pickle.load(f)
for key in basis:
    print(key)

DQDZB0
DQDZF0
QB0
QF0
NFOLD
AMBIENT
KMODULI
NHYPER
KAPPA


and see that the keys are identical to before.

## ToricPointGenerator

The *ToricPointGenerator* is somewhat special and requires additional input data generated from [SageMath](https://www.sagemath.org/). This *toric_data* can be straightforwardly generated in any sage kernel that has access to the *cymetric* package. In praxis only a single module is needed which can be found [here](../cymetric/sage/sagelib.py). 

The next cell won't work in your regular python notebook because it requires some sage routines for toric geometry. See [here](https://doc.sagemath.org/html/en/reference/schemes/sage/schemes/toric/variety.html) for more information about toric varieties and their implementation inn sage and [here](https://doc.sagemath.org/html/en/reference/discrete_geometry/sage/geometry/triangulation/point_configuration.html) for information about triangulations of PointCollections.

We begin by setting up the quintic vertices, which define the fan of the toric ambient variety. After initialising said variety we load the *prepare_toric_cy_data()* routine and generate the neccessary data for the *ToricPointGenerator* and the toric TensorFlow models.

In [24]:
#from cymetric.sage.sagelib import prepare_toric_cy_data
#import os as os
## Quintic vertices
#vertices = [
#    [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1], [-1, -1, -1, -1]
#]
#origin = [0 for _ in range(len(vertices[0]))]
#polytope = LatticePolytope(vertices)
#pConfig = PointConfiguration(polytope.points(), star=origin)
##set to topcom for more efficient triangulations
#pConfig.set_engine("TOPCOM")
## restrict to fine star regular.
#triangulations = pConfig.restrict_to_connected_triangulations()
#triangulations = triangulations.restrict_to_fine_triangulations()
#triangulations = triangulations.restrict_to_regular_triangulations()
#triangulations = triangulations.restrict_to_star_triangulations(origin)
##take first triangulation; build fan and TV
#triang = triangulations.triangulate()
#tv_fan = triang.fan()
#TV = ToricVariety(tv_fan)
#fname = os.path.join('fermat_pgtoric', 'toric_data.pickle')
#toric_data = prepare_toric_cy_data(TV, fname)

We are now in a position to go back to our regular python kernel or simply continue in Sage and load the *ToricPointGenerator*.

In [25]:
from cymetric.pointgen.pointgen_toric import ToricPointGenerator

Let's have a look at the information that has been written to *toric_data.pickle*.

In [26]:
dirname = 'fermat_pgtoric'
with open(os.path.join(dirname, 'toric_data.pickle'), 'rb') as f:
    toric_data = pickle.load(f)
for key in toric_data:
    print(key)
    print(toric_data[key])

dim_cy
3
vol_j_norm
5
coeff_aK
[(-0.1778961409489022-0.5388180239251266j), (0.8725496761145602-0.8920333636493865j), (-0.48191605742215693+0.6201213934126107j), (0.5051228687518922-0.3204616545019887j), (0.56043103821785-1.192604974618264j), (-0.6015747268105381-0.8929074734470247j), (1.0383518290867688+0.14412256466431925j), (0.3344885807041861+0.5064208681479809j), (1.30687759730605-1.4671020942548247j), (-0.7136849837439282+0.5216671541668138j), (0.733289798648282+0.8972693239731884j), (0.19342371875207343-0.4635239547631873j), (0.5052538131603456+0.3218897639883497j), (0.22650850630548+1.0682608912557352j), (1.3149222620353853-1.182838109398989j), (-0.2673464700585983+0.41601918559396117j), (-0.09129920673625604+2.3899675596611747j), (0.29348571406378704-0.44308746338305843j), (0.17687640551071815+0.8838703990298199j), (0.07625067819206656-0.060475984223955254j), (-1.0145564096851238+1.491610240145762j), (-0.5672606416845261+0.3606565801768684j), (-0.0861987923653643+2.128908293169

We have the following keys:

1. "dim_cy": contains the dimension of the Calabi-Yau.
2. "vol_j_norm": information for normalisation of weights.
2. "coeff_aK": are generic complex coefficients in front of each of the defining hypersurface monomials, which you get from the Batyrev construction. Note: Those are by default complex valued as they represent a (redundant) description of the complex moduli.
3. "exp_aK": is the monomial basis for the defining equation.
4. "exps_sections": is a monomial basis for the sections of the kähler cone generators. This one will be needed to generate the integration weights and a Kähler metric in the same Kähler class as our Ricci-flat metric.
5. "patch_masks": are (boolean) coordinate masks for all the patches in the TV.
6. "glsm_charges": are the GLSM charges of the TV.
7. "triple": are the triple intersection numbers of the TV.

Having loaded the toric_data we can initiallise the ToricPointGenerator

In [27]:
pgtoric_gen = ToricPointGenerator(toric_data, kmoduli)

We now have a generic quintic with coefficients in front of all 125 monomials.

What if we only want the Fermat Quintic? We can override the information in "coeff_aK" and "exp_aK" with e.g.

In [28]:
toric_data["coeff_aK"] = coefficients
toric_data["exp_aK"] = monomials

and

In [29]:
pgtoric = ToricPointGenerator(toric_data, kmoduli)

we continue just as before with creating a dataset

In [30]:
kappa = pgtoric.prepare_dataset(n_p, dirname)

  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvement from the last five Jacobian evaluations.
  improvement from the last ten iterations.
  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
  improvem

pointgen:INFO:Vol_k: 0.1666666666666666, Vol_cy: 9.875469464470568.


We note, that the data generation again took quite some time even though there should just be a single hypersurface. We also optimize w.r.t. to the sections coming from the Kähler generators and thus have more than a single hypersurface to consider.

We create the *BASIS* containing all information for the TensorFlow models:

In [31]:
pgtoric.prepare_basis(dirname, kappa)
with open(os.path.join(dirname, 'basis.pickle'), 'rb') as f:
    basis = pickle.load(f)
for key in basis:
    print(key)

DQDZB0
DQDZF0
QB0
QF0
NFOLD
AMBIENT
KMODULI
NHYPER
KAPPA


In the next notebook we will load the *basis.pickle* and *toric_data.pickle* for the TensorFlow model. The information in *basis.pickle* will override what is written in *toric_data.pickle* so that the models know we want to work with the Fermat and not some generic quintic.

We can also check if the points are all on the fermat:

In [32]:
data = np.load(os.path.join(dirname, 'dataset.npz'))
points = data['X_val'][:,0:pg.ncoords]+1.j*data['X_val'][:,pg.ncoords:]

using the regular *PointGenerator*

In [33]:
np.sum(np.isclose(np.abs(pg.cy_condition(points)), 0))/len(points)

1.0

the toric

In [34]:
np.sum(np.isclose(np.abs(pgtoric.cy_condition(points)), 0))/len(points)

1.0

and the cicy

In [35]:
np.sum(np.isclose(np.abs(pgcicy.cy_condition(points)), 0))/len(points)

1.0