## Defining the primitive crystal structure and degrees of freedom 

A CASM project enables generating, fitting, and evaluating effective Hamiltonians. It consists of a directory hierarchy that holds project settings, DFT input and output files, the cluster expansion source code, compiled libraries, and coefficient parameters. After it is initialized, a CASM project directory is identified by the presence of a hidden ".casm" directory.

This example demonstrates defining the primitive crystal structure and degrees of freedom (DoF) and initializing a CASM project. Three example primitive crystal structures will be used to demonstrate the specification of different types of degrees of freedom:

1. [HCP Zr with O octahedral interstitial -- Occupation cluster expansion project](#occupation_clex)
2. [ZrH<sub>2</sub> -- Strain polynomial effective Hamiltonian prim](#strain_polynomial)
3. [ZrH<sub>2</sub> -- Coupled strain-displacement cluster expansion effective Hamiltonian prim](#coupled)


<a id='occupation_clex'></a>

## 1) HCP Zr with O octahedral interstitial -- Occupation cluster expansion project
    
This is an example CASM project used to fit a cluster expansion for the energy of HCP Zr with interstital oxygen. Based on empirical knowledge of the system, this model makes the approximation that the HCP Zr crystal is perfect and octahedral interstitial positions may be either vacant or filled with oxygen atoms. To begin, we need to define the project's "prim". 

The "prim" defines the primitive crystal structure and degrees of freedom. It includes lattice vectors, crystal basis sites, global degrees of freedom, and site degrees of freedom, including allowed occupant species on each basis site. A complete description of the JSON format used to specify the prim is located [here](https://prisms-center.github.io/CASMcode_docs/pages/formats/casm/crystallography/BasicStructure.html). In this particular case, it contains:

- **lattice_vectors**: Row-vector matrix of crystal lattice vectors. Units are typically Angstrom, but are ultimately determined by the method used to perform calculations. 

- **basis**: An array of crystal basis sites, including coordinate and allowed degrees of freedom. For this ZrO project, the basis sites contain:

  - **coordinate**: The location of the basis site, according to the "coordinate_mode".
  
  - **occupants**: A list of the possible occupant species that may reside at each site. The names are case sensitive, and “Va” is reserved for vacancies.

- **coordinate_mode**: Defines the units of basis site coordinates. May be one of:

  - "Cartesian": To specify basis coordinates as $r_{cart} = (x, y, z)$
  - "Fractional" or "Direct": To specify basis coordinates defined in terms of the lattice vectors, $r_{frac}$, where $r_{cart} = L r_{frac}$, and $L$ is the lattice as a column-vector matrix. 
  

For "lattice_vectors", it is common, but not required, to use the results of a fully relaxed calculation of the structure with the default occupation values. The default occupation on each site is the species listed first in "occupants". For occupation cluster expansions, ideal supercells of the prim lattice are used for the initial state of DFT calculations and it is the default reference for strain.


In [None]:
prim_str='
{
  "basis" : [
    {
      "coordinate" : [ 0.0000000, 0.0000000, 0.0000000 ],
      "occupants" : [ "Zr" ]
    },
    {
      "coordinate" : [ 0.6666666, 0.3333333, 0.5000000 ],
      "occupants" : [ "Zr" ]
    },
    {
      "coordinate" : [ 0.3333333, 0.6666666, 0.2500000 ],
      "occupants" : [ "Va", "O" ]
    },
    {
      "coordinate" : [ 0.3333333, 0.6666666, 0.7500000 ],
      "occupant_dof" : [ "Va", "O" ]
    }
  ],
  "coordinate_mode" : "Fractional",
  "description" : "hcp Zr with octahedral interstitial O ",
  "lattice_vectors" : [
    [ 3.23398686, 0.00000000, 0.00000000 ],
    [ -1.61699343, 2.80071477, 0.00000000 ],
    [ -0.00000000, 0.00000000, 5.16867834 ]
  ],
  "title" : "ZrO"
}'

### Initializing a CASM Project

Here we write the prim file and use the `init` method to initialize the project. In this step, CASM will read the prim file, perform a symmetry analysis, and generate some default directories. 

To facilitate comparisons and easier interpretation, CASM defines a standard definition for the prim lattice. If the input prim is found to be non-primitive, or not the Niggli reduced cell, or not in a standard orientation, it will print a recommended prim. The recommended prim can be overriden with the `init` `--force` option, but some standard methods may not be available with a non-primitive prim.

Project files that the user should not typically modify directly, including a copy of the prim, are stored in a hidden .casm sub-directory of the CASM project directory. The presense or absence of the .casm directory is used by CASM to detect a CASM project.

The following code creates an `init/ZrO` subdirectory of the current working directory for purposes of this example. For a real project, modify the paths to the location where you want to create a CASM project.


In [None]:
start=$(pwd)
mkdir -p $start/init/ZrO && cd $start/init/ZrO

echo "$prim_str" > prim.json
ccasm init
ls .

### Checking the initialized project

The effective Hamiltonians that CASM generates are constructed so that they obey the same symmetry as the prim structure. After initializing a project, it is a good idea to check that CASM has identified all of the expected symmetries. 

If CASM fails to find some expected symmetries it may mean that that there is a mistake, or that lattice vectors or basis site coordinates do not include enough significant digits. With too few, the equivalence of sites or lattice vectors under the action of symmetry operations may be missed. CASM uses a default tolerance of 1e-5 for crystallography routines.

After the project is initialized, the symmetry CASM has found can be printed using the `sym` command. `sym` will print information about three important symmetry groups. Each group is a vector of representative symmetry operations. The symmetery operations transform a spatial coordinate $x \rightarrow x'$ according to $x' = A*x+b$, where $A$ is the 3x3 "operation matrix" and $b$ is the "shift" vector. Operations may be printed either in fractional coordinates (FRAC) or Cartesian coordinates (CART).

- **lattice point group**: This is the point group of the Bravais lattice: the list of operations that map the lattice (i.e. all points that are integer multiples of the lattice vectors) onto itself and keep the origin fixed. The "shift" vectors will always be zero. 

- **factor group**: The crystal space group is the set of all symmetry operations that map the lattice onto itself and map basis sites onto equivalent basis sites (i.e. all degrees of freedom are equivalent). The crystal space group is not limited to operations that keep the origin fixed, so due to the perdiodicity of the crystal the crystal space group is infinite. The factor group is a finite description of the crystal space group, in which all operations that differ only by a "shift" are represented by a single operation whose "shift" lies within the primitive cell. Formally, this is a group formed by the cosets of $T$ in $S$, where $T$ is the translation group of the Bravais lattice and $S$ is the crystal space group.

- **crystal point group**: This is the group of point operations formed by taking the factor group operations and setting their "shift" to zero. Macroscopic properties of the crystal must exhibit the symmetries of the crystal point group. It is by definition a subgroup of the lattice point group.




In [None]:
ccasm sym --coord CART --factor-group

There is also an option to print the symmetry groups using a "brief" description, following the conventions of the International Tables for Crystallography. (Try adding the option `--coord CART` to see the Cartesian represenation.)

In [None]:
ccasm sym --brief --lattice-point-group
ccasm sym --brief --factor-group
ccasm sym --brief --crystal-point-group

### The `info` method

The `info` method gives more direct and flexible access to CASM data and methods via JSON input and output. It currently allows for getting detailed information about a prim, supercells, and the neighbor lists used in effective Hamiltonian evalautions, with additional options planned for the future. For any data that does not require a CASM project it will work whether or not a CASM project exists, but if called from within a CASM project, then that project will be used for default input values such as the prim. 

In [None]:
# list available "info" methods
ccasm info -h

# list input and output options for a particular method
ccasm info --desc PrimInfo

### Using `info` 

As an example, we can get information about the ZrO prim without initializing a CASM project.

In [None]:
info_input_str="{ \"prim\": ${prim_str}, \"properties\": []}"
ccasm info -m PrimInfo -i "$info_input_str" > ZrO_info.json

# list names of output properties in ZrO_info.json
jq 'keys' ZrO_info.json

echo "asymmetric_unit:"
jq '.asymmetric_unit' ZrO_info.json

<a id='strain_polynomial'></a>

## 2) ZrH<sub>2</sub> -- Strain polynomial effective Hamiltonian prim

All continuous DoF are represented as vectors having a standard basis that is related to the fixed reference frame of the crystal. The DoF object may optionally encode a user-specified basis in terms of the standard basis. The user-specified basis may fully span the standard basis or only a subspace. Within a `"dofs"` object, each DoF is given by the key/object pair `"<dofname>" : {...}` where `<dofname>` is the name specifier of a particular DoF type and the associated object specifies non-default options.

The options include:

- `axis_names`: array of string

  Names given to the user-specified basis vectors when writing basis function formulas. The length of `axis_names` must match the number of rows in `basis`. 
  
- `basis`: row-vector matrix

  The basis provides the user-specified basis vectors in terms of the standard basis. The number of rows is the dimension of user-specified basis. The number of columns must equal the number of dimensions in the standard basis (i.e. 3 for displacement, 6 for strain).
  

Example: Strain DoF, using the Green-Lagrange strain metric with custom user basis excluding shear strain:

    "dofs" : {
      "GLstrain" : {
        "axis_names" : ["E_{xx}", "E_{yy}", "E_{zz}"], 
        "basis" : [                       // optional, default is Identity matrix (equivalent to standard basis)
          [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], // This is an example of a custom user basis that excludes shear strain
          [0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
          [0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
        ]
      }
    }


Allowed global DoF include:

- "GLstrain": Green-Lagrange strain metric, $\frac{1}{2}(C-I)$
- "Hstrain": Hencky strain metric, $\frac{1}{2}ln(C)$
- "Bstrain": Biot strain metric, $(U-I)$ 
- "Ustrain": Stretch tensor, $U$
- "EAstrain": Euler-Almansi strain metric, $\frac{1}{2}(I-(F F^{T})^{-1})$

The strain metrics are defined in terms of the deformation gradient tensor, $F$, and Green's deformation tensor, $C$. The deformation gradient tensor relates the strained and unstrained lattices through $L^{strained} = F * L^{ideal}$, and can be decomposed, via $F = R * U$, into a rotation tensor, $R$, and stretch tensor, $U$. Green's deformation tensor, $C = F^{T}*F$, excludes rigid rotations.

For all strain metrics, the standard basis is $[E_{xx}, E_{yy}, E_{zz}, \sqrt(2)E_{yz}, \sqrt(2)E_{xz}, \sqrt(2)E_{xy}]$, and the default axis names are ["e_1", "e_2", "e_3", "e_4", "e_5", "e_6"].


In [None]:
ZrH2_GLstrain_prim_str='{
  "basis" : [
    {
      "coordinate" : [ 0.000000000000, 0.000000000000, 0.000000000000 ],
      "occupants" : [ "Zr" ]
    },
    {
      "coordinate" : [ 0.250000000000, 0.250000000000, 0.250000000000 ],
      "occupants" : [ "H" ]
    },
    {
      "coordinate" : [ 0.750000000000, 0.750000000000, 0.750000000000 ],
      "occupants" : [ "H" ]
    }
  ],
  "dofs": {
      "GLstrain": {}
  },
  "coordinate_mode" : "Fractional",
  "lattice_vectors" : [
    [ 0.000000000000, 2.410696500000, 2.410696500000 ],
    [ 2.410696500000, 0.000000000000, 2.410696500000 ],
    [ 2.410696500000, 2.410696500000, 0.000000000000 ]
  ],
  "title" : "ZrH2"
}'

ZrH2_GLstrain_info_input_str="{ \"prim\": ${ZrH2_GLstrain_prim_str}, \"properties\": []}"
ccasm info -m PrimInfo -i "$ZrH2_GLstrain_prim_str" > ZrH2_GLstrain_info.json

# list names of output properties in ZrH2_GLstrain_info.json
jq 'keys' ZrH2_GLstrain_info.json

<a id='coupled'></a>

## 3) ZrH<sub>2</sub> -- Coupled strain-displacement cluster expansion effective Hamiltonian prim

Continuous site DoF are specified with a "dofs" parameter that is equivalent to the global "dofs" paremeter, but specified for each basis site.

Example: Displacement DoF, in the xy plane only

    "dofs" : {
      "disp" : {
        "axis_names" : ["dx", "dy"], 
        "basis" : [        // optional, default is Identity matrix (equivalent to standard basis)
          [1.0, 0.0, 0.0], // This is an example of a custom user basis that excludes displacements along z
          [0.0, 1.0, 0.0]
        ]
      }
    }


Allowed site DoF include:

- "disp": Displacement, with standard basis $[dx, dy, dz]$

Additionally, for this prim we will use the "selectivedynamics" species attribute to indicate that DFT calculations should fix Zr atoms at the position defined by the strain and displacement DoF, but allow H atoms to relax. Molecular occupants, and atomic occupants with user-specified properties are specified using the `"species"` parameter. The JSON format for Molecule specifications is given [here](https://prisms-center.github.io/CASMcode_docs/pages/formats/casm/crystallography/BasicStructure.html).


In [None]:
ZrH2_GLstrain_disp_prim_str='{
  "basis" : [
    {
      "coordinate" : [ 0.0000000, 0.0000000, 0.0000000 ],
      "occupant_dof" : [ "Zr" ],
      "dofs": {
        "disp": {}
      }
    },
    {
      "coordinate" : [ 0.2500000, 0.2500000, 0.2500000 ],
      "occupant_dof" : [ "H" ]
    },
    {
      "coordinate" : [ 0.7500000, 0.7500000, 0.7500000 ],
      "occupant_dof" : [ "H" ]
    }
  ],
  "species" : {
    "H": {
      "properties": {
        "selectivedynamics": {
          "value": [1, 1, 1]
        }
      }
    },
    "Zr": {
      "properties": {
        "selectivedynamics": {
          "value": [0, 0, 0]
        }
      }
    }
  },
  "dofs" : {
      "GLstrain" : {}
  },
  "coordinate_mode" : "Fractional",
  "description" : "Cubic ZrH_{2}",
  "lattice_vectors" : [
    [0.0      , 2.4106965, 2.4106965],
    [2.4106965, 0.0      , 2.4106965],
    [2.4106965, 2.4106965, 0.0      ]
  ],
  "title" : "ZrH2"
}'

ZrH2_GLstrain_disp_info_input_str="{ \"prim\": ${ZrH2_GLstrain_disp_prim_str}, \"properties\": []}"
ccasm info -m PrimInfo -i "$ZrH2_GLstrain_disp_info_input_str" > ZrH2_GLstrain_disp_info.json

# list names of output properties in ZrH2_GLstrain_disp_info.json
jq 'keys' ZrH2_GLstrain_disp_info.json

In [None]:
cd $start && rm -r $start/init