In [None]:
# Copyright notice
__author__ = "Matteo Lulli"
__copyright__ = "Copyright (c) 2020-2021 Matteo Lulli (lullimat/idea.deploy), matteo.lulli@gmail.com"
__credits__ = ["Matteo Lulli"]
__license__ = """                                                                                                                                        
Permission is hereby granted, free of charge, to any person obtaining a copy                                                                             
of this software and associated documentation files (the "Software"), to deal                                                                            
in the Software without restriction, including without limitation the rights                                                                             
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell                                                                                
copies of the Software, and to permit persons to whom the Software is                                                                                    
furnished to do so, subject to the following conditions:                                                                                                 
                                                                                                                                                         
The above copyright notice and this permission notice shall be included in all                                                                           
copies or substantial portions of the Software.                                                                                                          
                                                                                                                                                         
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR                                                                               
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,                                                                                 
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE                                                                              
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER                                                                                   
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,                                                                            
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE                                                                            
SOFTWARE.                                                                                                                                                
"""
__version__ = "0.1"
__maintainer__ = "Matteo Lulli"
__email__ = "matteo.lulli@gmail.com"
__status__ = "Development"

# Tutorial: Two-dimensional Ising Model

Welcome to this tutorial on how to build a module for the two-dimensional Ising model using the *root* classes of the **idea.deploy** framework.

This tutorial will guide you in a step-by-step way into the process of developing a Metropolis Monte Carlo simulation for the two-dimensional Ising model. This will be done with different degrees of complication

- starting first from a simple implementation where spin variables are represented by a signed integer (```int```), 
- then shifting to using unsigned chars (```unsigned char```), and 
- finally land on an asynchronous-multi-spin-coded (aMSC) implementation where the spin variables will be stored in the bits of unsigned integers (```unsigned int``` or ```unsigned``` according to the architecture) and will be updated in parallel. 

The last step allows for roughly a 32-fold factor gain in performance with respect to the simples signed integer representation since we will be simulating 32 systems in paraller. This is done at the cost of introducing a small correlation between different realizations of the system. At the same time, this process will allow to have a faster convergence of the statistical analysis.

During this process we will shift strategy for the memory addressing going from a straightforward geometrical implementation which will then evolve to a checkerboard separation and finally end in a **sliced** scheme which allows for a better alignement of the memory reads while halving the number of MPI memory transfers in the multi-process implementation.

We will be measuring the performance differences with the built-in tools of the **idea.deploy** framework and also develop some standard statistical analysis tools along the way.

This tutorial is clearly *physics-oriented* and serves the double purpose of illustrating the functionalities of the framework while being *educational* as far as the two-dimensional Ising model and simple statistical analysis are concerned.

I will be constantly pushing updates on the *develop* branch of this project so that the last version can simply be retreived by constantly pulling the modifications to the repository.

I will also keep the tutorial finely grained in such a way that each subsection is self-consistent with each new class being a copy of the previous one and simply distinguished by a ```V#``` suffix.

## Bare-code implementation

## ```IdpySims``` Class implementation: V0

In [None]:
# Development cell
%load_ext autoreload
%autoreload 2

In [None]:
# Import Statements

'''
Appending the root directory of the idea.deploy project.
This is done explicitly here. 
In the modules this is done in the local __init__.py file
'''
import sys
sys.path.append("../")

'''
Importing numpy
'''
import numpy as np
'''
Import Code from IPython.display for pretty printing
'''
from IPython.display import Code

'''
Importing reduce functional tool
'''
from functools import reduce

'''
Importing languages types
'''
from idpy.IdpyCode import CUDA_T, OCL_T, IDPY_T, GetTenet, GetParamsClean

'''
Importing IdpySims class
'''
from idpy.IdpyCode.IdpySims import IdpySims

'''
Importing IdpyKernel, IdpyFunction, IdpyLoop
'''
from idpy.IdpyCode.IdpyCode import IdpyKernel, IdpyFunction, IdpyLoop

'''
Importing the congruential pseudo-random number generators
'''
from idpy.PRNGS.CRNGS import CRNGS
from idpy.PRNGS.CRNGS import F_Norm as F_Norm_CRNGS

'''
Importing class for custom types
'''
from idpy.Utils.CustomTypes import CustomTypes

'''
Importing dictionary for translating C/C++ types to numpy types
'''
from idpy.Utils.NpTypes import NpTypes

As a first step we need to define the custom types for the simulations, which in this case will be pretty straight-forwardly set to ```int``` giving the ability to the spins to take one of the two values ```+1``` or ```-1```. As we shall see later in this tutorial this choice amount to a pretty large waste of resources, but at the same time provides a very clear implementation that can be used to perform simple double checks of more advanced algorithms.

Finally, choosing to parametrize the relevamt types for the simulation allows to easily change the type later, without any need to re-edit the whole code in a heavy way. So, even if in this moment it might look an unnecessary practice, it helps us building a solid and useful practice.

We also define an ```NpTypes``` object which allows to *translate* the typical ```C/C++``` types into ```numpy``` types which can be specified when decalring ```numpy``` and ```IdpyMemory``` arrays.

In [None]:
Ising2DTypes_V0 = CustomTypes({'SpinType': 'int', 
                               'LatticeType': 'int',
                               'WeightType': 'float', 
                               'BetaType': 'float', 
                               'EnergyType': 'int'})
NPT_V0 = NpTypes()

The step was pretty straight forward, since for the moment we will only be managing two types in the kernel code.

Let us now define the simulation class

In [None]:
class Ising2D_V0(IdpySims):
    '''
    class Ising2D_V0
    We declare the __init__ function (constructor) using already
    *args and **kwargs because this benefits the customazibility
    of the class by allowing to pass an arbitrary number of parameters
    that can be esaily managed
    '''
    def __init__(self, *args, **kwargs):
        '''
        params_dict is a common name across the project to indicate
        the dictionary containing the parameters for the class
        '''
        if not hasattr(self, 'params_dict'):
            self.params_dict = {}
        '''
        GetParamsClean: filters the list of needed_params out of kwargs and
        puts it in self.params_dict, what is left is put in _swap_kwargs
        '''
        _swap_kwargs = GetParamsClean(kwargs, [self.params_dict], 
                                      needed_params = ['L', 'beta', 'seed',
                                                       'lang', 'device', 
                                                       'cl_kind', 'custom_types'])        
        '''
        Initializing the parent class
        '''
        IdpySims.__init__(self, *args, **_swap_kwargs)
        '''
        Getting custom types
        '''
        if 'custom_types' in self.params_dict:
            self.custom_types = self.params_dict['custom_types']
        else:
            self.custom_types = Ising2DTypes_V0        
        '''
        We use the (inherited) dictionary sims_vars to store handy quantities
        We need to check for the parity of the system size because we would only
        like to use even linear-size systems in order to update in parallel
        half of the spins at each time step.
        '''
        _L, self.sims_vars['L'] = self.params_dict['L'], self.params_dict['L']
        if _L % 2:
            raise Exception("The lattice linear size L must be even!")
        
        self.sims_vars['dim_sizes'] = \
            np.array([_L, _L], dtype = NPT_V0.C[self.custom_types['SpinType']])
        '''
        Computing the 'volume' in a dimension-independent way
        '''
        self.sims_vars['V'] = reduce(lambda x, y: x * y, self.sims_vars['dim_sizes'])
        
        self.sims_vars['beta'] = self.params_dict['beta']
        '''
        If the 'seed' for the random number generator is not passed set it by default to 1
        '''
        if 'seed' not in self.params_dict:
            self.sims_vars['seed'] = 1
        else:
            self.sims_vars['seed'] = self.params_dict['seed']
        '''
        Here we need to enforce that the lang variable is chosen
        '''
        if 'lang' not in self.params_dict:
            raise Exception("Need to specify parameter 'lang' either CUDA_T or OCL_T")        
                    
        '''
        Getting the tenet for the simulation
        '''
        self.tenet = GetTenet(self.params_dict)
        
        '''
        Setting up congruential random number generator
        The module will try to initialize by default
        We only allocate half of the volume because the algorithm is going to
        update half of the system in parallel
        '''
        print("Trying to use 'MINSTD' congruential pseudo-random number generator")
        self.crng = CRNGS(tenet = self.tenet, lang = self.params_dict['lang'],
                          n_prngs = self.sims_vars['V'] // 2, 
                          seed = self.sims_vars['seed'])
        '''
        We now setup the variables necessary for the launch of the device kernels
        namely number of threads and number of blocks
        '''

    def MainLoop(self):
        pass
        
    '''
    The function End frees up the memory allocated on the device
    '''
    def End(self):
        self.tenet.End()
        
'''
Let us now rename the IdpyFunction tha will be used to obtain normalized 
random numbers. We do this to show how to avoid name clashes among different modules.
During the import we import the class as F_Norm_CRNGS.
Now we will create a dummy class providing a better name
'''
class F_NormRand(F_Norm_CRNGS):
    def __init__(self, *args, **kwargs):
        F_Norm_CRNGS.__init__(self, *args, **kwargs)
        
'''
Let us move now to the definition of the kernel function
'''
        
class K_UpdateSpins_V0(IdpyKernel):
    '''
    class K_UpdateSpins_V0: child class of IdpyKernel
    The declaration of the constructor is somehow 'kindly forced' 
    to contain all the optional arguments, mainly for memorizing reasons, 
    or to easily retrieve the definition from previously written code.
    custom_types: is passed by using CustomTypes.Push()
    constants: is a dictionary of all the constant/macros definition
    f_classes: list of IdpyFunction classes used as device function
    optimizer_flag: boolean, toggles compiler optimizer'''
    def __init__(self, 
                 custom_types = {}, constants = {}, f_classes = [], 
                 optimizer_flag = None):
        '''
        Calling right away the constructor of the parent class to setup all
        inherited variables
        '''
        IdpyKernel.__init__(self, 
                            custom_types, constants = constants, f_classes = f_classes, 
                            optimizer_flag = optimizer_flag)
        '''
        We set the 'g_tid' flag that automatically gives access to the `global thread id`
        i.e. the sequential (lexicographic) thread index that the IdpyKernel parent class
        automatically instatiates according to CUDA or OpenCL computing models.
        When writing the code we can use the (unsigned int) variable g_tid
        to access the thread id and in turn use it to access the desired memory location.'''
        self.SetCodeFlags('g_tid')
        '''
        We can use the class attribute 'params' as part of the function/method
        declaration in C. It simply consists of a dictionary where we list the 
        arguments to be passed to the kernel in C style as the keys of the arguments
        whose descriptors, which are CUDA/OpenCL dependent, are passed as a list for the
        key value. The written implementation below is certainly clearer.
        
        All we need to pass to the kernel are the spins and the inverse temperature beta 
        for this simple implementation. We also pass the parity of the sites to execute
        the update on so that we can update the whole system with only one function. 
        Finally, we need the seeds for the random number generator.'''
        self.params = {'SpinType * spins': ['global', 'restrict'], 
                       'BetaType beta': ['const'], 
                       'LatticeType parity': ['const'], 
                       'CRNGType * seeds': ['global', 'restrict']}
        '''
        Finally we write the kernel itself which only consists in a string.
        An implicit option we pass to the kernel lies in which entry of the
        dictionary `kernels` we choose to define. Indeed it is possible to define
        language specific kernels for both CUDA and OpenCL in order to maximize the 
        efficiency of the kernel itself. However, in running for physics results
        a compromise between extreme optimization and time-to-result should be 
        looked for such that the most results can be obtained with the smallest effort.
        In this perspective we choose to define the idea.deploy metalanguage as a first
        good trade-off efficiency-wise. Indeed, most bottle necks primarily come out
        of bad programming in the first place, rather than from poor optimization.
        This choice is implemented by defining the `[IDPY_T]` entry of the kernels
        dictionary. Further on, if any language-specific features will be desirable we
        will write language-specific kernels by defining the `[CUDA_T]` and `[OCL_T]`
        entries.
        
        Now, watch out, the commenting style and the syntax of the next lines
        are in the C fashion.'''
        self.kernels[IDPY_T] = """
        // First of all we check if the the thread-id is inside the
        // system size domain. We do this since, performance-wise,
        // it is much more convenient to simply define a grid that is overall
        // a little larger than necessary but with the benefit of having
        // thread blocks of a convenient `canonical` size, e.g. 128
        if(g_tid < V && g_tid % 2 == parity){
            // Let's read the value of the local spin corresponding
            // to the thread id
            SpinType lspin = spins[g_tid]
            // First, we get the thread id lattice coordinates
            LatticeType x = g_tid % L, y = g_tid / L;
            // Next we get the neighbors indices using periodic
            // boundary conditions. We use here the simplest linear
            // indexing with strides.
            LatticeType spx = (x + 1) % L + y * L; 
            LatticeType spy = x + ((y + 1) % L) * L;
            LatticeType smx = (x - 1 + L) % L + y * L;
            LatticeType smy = x + ((y - 1 + L) % L) * L;
            // Then, we calculate the energy difference for flipping the spin
            EnergyType delta_e = 2 * lspin * (spins[spx] + spins[spy] + spins[smx] + spins[smy]);
            // Then, we compute the acceptance probability for the flip
            WeightType w = exp((BetaType)(- beta * delta_e));
            // Finally we compare this number to a random number normalized in [0,1)
            // First we get the seed
            lseed = seeds[g_tid];
            // The we compute the noramlized pseudo-random number which takes
            // the pointer of lseed since the seed need to be updated after being used
            // We put a little `spin` to the code here and update the spin with a branchless
            // operation by first converting it to an occupation number variable
            // which is then flipped if the normalized random number is larger than
            // the Boltzmann weight associated to the transition
            //
            // First, go to occupation number representation
            SpinType loccupation = (lspin - 1) / 2;
            // Second, flip if possible, convert back to spin representation and
            // write back to global memory
            spins[g_tid] = 2*(loccupation - (F_NormRand(&lseed) > w)) - 1;
            // and finally we write the seed back in the global memory
            seeds[g_tid] = lseed;
        }
        """

In the cell below we can check how the code is generated by the ```IdpyKernel``` class ```K_UpdateSpins_V0``` which we defined in the cell above. To check the differences among the CUDA and OpenCL one just needs to change CUDA_T to OCL_T in the argument of the ```Code``` method.

This is just a quick and dirty instantiation of an object of the class ```K_UpdateSpins_V0```. A few more steps are needed to actually *deploy* the kernel to the device and make it operate on the device memory.

In [None]:
Code(K_UpdateSpins_V0().Code(CUDA_T), language = 'C')

*(provissory)* Below we instantiate an object for the class ```Ising2D_V0``` related 

In [None]:
_test_i2d_v0 = Ising2D_V0(L = 8, beta = 1, lang = OCL_T, device = 0, cl_kind = 'cpu')

In [None]:
_test_i2d_v0.End()