# Introduction to configurations and supercells

Within CASM, a configuration represents a particular periodic perturbation of the infinite crystal within the space of allowed perturbations defined by the DoF specified in the prim. A configuration may be used as an input to an effective Hamiltonian.

A configuration is defined in terms of:
- A supercell, representing the translational periodicity of the perturbation. 
- DoF values, including discrete site DoF (occupant DoF) and continuous site DoF "within" the supercell (i.e. the translationally unique perturbation), and the values of the continuous global DoF.

Configurations are also associated with properties (i.e. energy, relaxation displacements, relaxation strain, etc.) that are dependent on the values of the DoF. CASM enables storing and looking up properties by `<calctype>` and property name, to allow for different values of the properties depending on the calculation method.

Supercells may be generated automatically when enumerating configurations, but it is also common to enumerate supercells independently. 

## Supercell enumeration

Supercell lattices are defined by $L^{scel} = L^{prim} * T$, where $T$ is a 3x3 integer transformation matrix, and $L^{scel}$ and $L^{prim}$ are the supercell and prim lattices, as column vector matrices. Two supercell lattices, $L^{scel}$ and ${L^{scel}}^{\prime}$ are equivalent if ${L^{scel}}^{\prime} = A*L^{scel}*T$, for any crystal point group operation matrix, $A$, and integer transformation matrix, $T$. 

CASM gives supercells a unique name of the form `SCELV_A_B_C_D_E_F` based on the supercell volume (V is the determinant of $T$) and the six non-zero elements (A-F) of the hermite normal form of $T$.

The `ScelEnum` method of `enum` enables enumerating supercells. It includes options to control enumerating supercells within a range of volumes, as multiples of a non-primitive unit cell, and restricted to particular directions (i.e. 1d or 2d supercells). 

Additionally, `query` commands can be used to filter results so that only supercells that match a particular criteria are added to the project database. A list of `query` properties of supercells can be obtained with `query --help properties -t scel`.


A list of `ScelEnum` method parameters can be obtained with:

In [None]:
ccasm enum --desc ScelEnum

This example will use the ZrO prim:

In [None]:
prim_str='
{
  "basis" : [
    {
      "coordinate" : [ 0.0000000, 0.0000000, 0.0000000 ],
      "occupants" : [ "Zr" ]
    },
    {
      "coordinate" : [ 0.6666666, 0.3333333, 0.5000000 ],
      "occupants" : [ "Zr" ]
    },
    {
      "coordinate" : [ 0.3333333, 0.6666666, 0.2500000 ],
      "occupants" : [ "Va", "O" ]
    },
    {
      "coordinate" : [ 0.3333333, 0.6666666, 0.7500000 ],
      "occupant_dof" : [ "Va", "O" ]
    }
  ],
  "coordinate_mode" : "Fractional",
  "description" : "hcp Zr with octahedral interstitial O ",
  "lattice_vectors" : [
    [ 3.23398686, 0.00000000, 0.00000000 ],
    [ -1.61699343, 2.80071477, 0.00000000 ],
    [ -0.00000000, 0.00000000, 5.16867834 ]
  ],
  "title" : "ZrO"
}'

In [None]:
start=$(pwd)
mkdir -p $start/enum/ZrO && cd $start/enum/ZrO

echo "$prim_str" > prim.json
ccasm init
ls .

## Enumerate supercells up to a maximum volume

Enumerate all supercells of volume 4 or less.

In [None]:
ccasm enum -m ScelEnum -i '{"max": 4}'

## Enumerate supercells within a range of volumes

Enumerate all supercells of volume 6 or 7:

In [None]:
ccasm enum -m ScelEnum -i '{"min": 6, "max": 7}'

## Enumerate supercells of a non-primitive unit cell

Using the 2x2x2 supercell as the base unit cell, enumerate up to volume 4 times its volume:

In [None]:
ccasm enum -m ScelEnum -i '{
  "max": 4, 
  "unit_cell": [
    [2, 0, 0],
    [0, 2, 0],
    [0, 0, 2]
  ]
}'

## Restrict supercell enumeration to particular directions

<i>Note: The "dirs" values "a", "b", "c" indicate the first, second, and third lattice vector after applying "unit_cell".</i>

Using the 2x2x2 supercell as the basis unit cell, enumerate 2d supercells in the HCP basal plane. 

In [None]:
ccasm enum -m ScelEnum -i '{
  "max": 9, 
  "unit_cell": [
    [2, 0, 0],
    [0, 2, 0],
    [0, 0, 2]
  ],
  "dirs": "ab"
}'

## Use `select` and `query` to get properties of enumerated supercells

Use the `-t scel` option to indicate `query` and `select` operations on supercells. CASM stores a default list of selected supercells called the `MASTER` selection within the ".casm" directory. Other standard selections CASM defines that can be used with the `-c` option are: 

- `All`: all supercells in the database included and selected
- `NONE`: all supercells in the database included but none selected
- `EMPTY`: no supercells included

_Note: `CALCULATED` is not allowed for supercells_

Custom user lists can be generated with `-o <filename>` and used with `-c <filename>`.

In [None]:
ccasm select -h

In [None]:
# select all enumerated supercells
ccasm select -t scel --set-on

# query properties of selected supercells (default uses the MASTER selection)
ccasm query -t scel -k scel_size > all_supercells_and_size
head all_supercells_and_size

# from ALL supercells, select volume 4 supercells and save in a custom selection
ccasm select -t scel -c ALL --set 'eq(scel_size,4)' -o volume_4_scel_list.txt

# query some properties of just the volume 4 supercells, save as JSON
ccasm query -t scel -c volume_4_scel_list.txt -k multiplicity pointgroup_name lattice_params -o volume_4_props.json
head -n 30 volume_4_props.json

## Applying symmetry to configurations and determining the canonical form

_Note: This can be considered an advanced topic not necessary for all CASM users_

Supercell lattice vectors determine important symmetry properties of configurations. The supercell factor group is the subset of the prim factor group that leaves the supercell lattice vectors invariant. For each supercell, CASM generates a permutation representation of the supercell factor group, `factor_group_permutations`, that describes how DoF values are permuted among sites in the supercell by supercell factor group operations. For each supercell, CASM also generates the set of `translation_permutations` which describe how DoF values are permuted among sites by lattice translations. The number of unique translations is equal to the number of prim unit cells within a supercell. 

Site DoF values of configurations transform under application of a symmetry operation by first applying the factor group operation (i.e. rotating a displacement vector or permuting the occupation value for an anisotropic molecule with discrete orienations) using symmetry representations generated for each prim basis site, and then applying the factor group and translation permuations to permute DoF values among sites. Two configurations are symmetrically equivalent if any combination of factor group permutation and translation permuation results in equivalent DoF values. Additionally, if application of just a translation permutation results in equivalent DoF values, then the configuration is not primitive.

CASM defines an ordering among configurations with the same prim that compares, in order, supercell lattice vectors, continuous global DoF values, occupation DoF values, and continuous site DoF values. Lattices are compared using the CASM lattice comparison, and DoF values are compared using lexicographical ordering, taking into account floating point tolerances for checking equivalence of continuous DoF. Optimizations are made so that the comparison of a configuration before and after application of symmetry can be done only transforming enough DoF values to determine the result of the comparison.

The canonical form of a configuration is the equivalent configuration which compares the greatest. By convention, to ease comparison when generating new configurations, the standard methods included in the CASM interface only store the canonical configuration in the CASM project's configuration database. There may be uses cases with large supercells in which it is appropriate to relax this convention. Additionally, by convention, primitive configurations are always stored and users are given a choice whether non-primitive configurations are also stored. 

## Representation of configuration DoF values

_Note: This can be considered an advanced topic not necessary for all CASM users_

CASM stores the DoF values of a configuration in a set of vectors and matrices, one for each type of DoF.

### Occupation DoF

Occupation DoF values are stored in a vector of integer of size equal to the number of sites in the supercell and sorted by sublattice. Within the block of a particular sublattice, values are ordered by unit cell. The integer value is the index of a molecule the occupants list for the prim basis site corresponding to that sublattice. Applying symmetry first transforms the value at a site (if anisotropic or ordered differently on equivalent sublattices) and then permutes the values among sites. Ex:

    [<- sublattice 0 "occ" values -> | <- sublattice 1 "occ" values -> | ... ]


### Continuous site DoF

The values of the continuous site DoF of a particular type are stored in a matrix, with column index indicating the site within the supercell, and the row index indicating the component of the DoF value. This representation allows for applying factor group symmetry to each sublattice separately, with one matrix multiplication per sublattice, before permuting columns. Internally, CASM stores component values in the user-defined basis specified by the prim. However, when outputting the configuration DoF values for storage in the CASM project database and the `config.json` JSON format DoF values are converted to the standard basis of the particular DoF type (i.e. "dx", "dy", "dz" for displacement).

#### Example: Displacement values, using the standard basis.

With the prim DoF basis equal to the standard basis (dx, dy, dz) the internal and external representations are equivalent:

    [<- sublattice 0 dx values -> | <- sublattice 1 dx values -> | ... ]
    [<- sublattice 0 dy values -> | <- sublattice 1 dy values -> | ... ]
    [<- sublattice 0 dz values -> | <- sublattice 1 dz values -> | ... ]

#### Example: Displacement values, with non-standard prim DoF basis:

    "basis" : [ {
        "coordinate": [c0x, c0y, c0z],
        "occupants": [...],
        "dofs": {
          "disp" : {
            "axis_names" : ["dxy", "dz"],
            "axes" : [[1.0, 1.0, 0.0],
                    [0.0, 0.0, 1.0]]}}
      },
      {
        "coordinate": [c1x, c1y, c1z],
        "occupants": [...],
        "dofs": {
          "disp" : {
            "axis_names" : ["d\bar{x}y", "dz"],
            "axes" : [[-1.0, 1.0, 0.0],
                    [0.0, 0.0, 1.0]]}}
      },
      ...
    }

Internal representation:

    [<- sublat 0 dxy values -> | <- sublat 1 d\bar{x}y values ->| ... ]
    [<- sublat 0 dz values  -> | <- sublat 1 dz values ->       | ... ]

External representation:

    [<- sublattice 0 dx values -> | <- sublattice 1 dx values -> | ... ]
    [<- sublattice 0 dy values -> | <- sublattice 1 dy values -> | ... ]
    [<- sublattice 0 dz values -> | <- sublattice 1 dz values -> | ... ]


Note that the values matrix has only two rows, this is the maximum
site basis dimension.

#### Example: Displacement values, with varying prim DoF basis:

    "basis" : [ {
        "coordinate": [c0x, c0y, c0z],
        "occupants": [...],
        "dofs": {
          "disp" : {}
      },
      {
        "coordinate": [c1x, c1y, c1z],
        "occupants": [...],
        "dofs": {
          "disp" : {
            "axis_names" : ["dxy", "dz"],
            "axes" : [[-1.0, 1.0, 0.0],
                    [0.0, 0.0, 1.0]]}}
      },
      ...
    }
    
Internal representation:

    [<- sublattice 0 dx values -> | <- sublattice 1 dxy values ->| ... ]
    [<- sublattice 0 dy values -> | <- sublattice 1 dz values -> | ... ]
    [<- sublattice 0 dz values -> | <- 0.0 values ->             | ... ]

Note that the values matrix has three rows, this is the maximum site basis dimension, but for sublattices with lower site basis dimension the internel representation is padded with fixed zeros.

External representation:

    [<- sublattice 0 dx values -> | <- sublattice 1 dx values -> | ... ]
    [<- sublattice 0 dy values -> | <- sublattice 1 dy values -> | ... ]
    [<- sublattice 0 dz values -> | <- sublattice 1 dz values -> | ... ]

### Continuous global DoF values

The values of the continuous global DoF of a particular type are stored in a vector. Similar to continuous site DoF, the internel representation is in terms of the user-specified prim basis while the externel representation is converted to the standard basis.

#### Example: GLstrain values, using the standard basis.

The internal representation equals the external representation:

    [E_{xx}, E_{yy}, E_{zz}, \sqrt(2)*E_{yz}, \sqrt(2)*E_{xz}, \sqrt(2)*E_{xy}]


#### Example: GLstrain values, excluding shear strains.

The internal representation:

    [E_{xx}, E_{yy}, E_{zz}]

The external representation:

    [E_{xx}, E_{yy}, E_{zz}, \sqrt(2)*E_{yz}, \sqrt(2)*E_{xz}, \sqrt(2)*E_{xy}]


## Use `info` to get supercell information directly

Initializing a CASM project directory and enumerating supercells is not required to get supercell information from `info`. The `SupercellInfo` method allows for getting information about the supercell lattice vectors, the ordering of the representation of DoF values in a configuration, permutation representations, and linear site ordering. 

In [None]:
ccasm info --desc SupercellInfo

In [None]:
supercell_info_str="{\"prim\":$prim_str, \"transformation_matrix_to_super\": [[2, 0, 0], [0, 2, 0], [0, 0, 2]]}"
ccasm info -m SupercellInfo -i "$supercell_info_str" | jq "keys"

In [None]:
cd $start && rm -r $start/enum