# Defining Defects 

A persistent challenge with organizing computational defect data is the ambiguity with which a defect simulation is defined.
The standard practice is to simulate the isolated defects by using a larger simultion to create an isolated defect and then using charge-state corrections to approximate the properties of a defect in the dilute limit.
This means that the same defect can be simulated with different simulation cells.
Ideally, if you want to build a computational defects database you will need some way to check if different simulations represent the same defect.

A core concept in this packge is that a defect is defined indepently of the simultion cell.
All of the information about which defect we are simulating is captured by the `Defect` object.
A point defect is defined by the Wigner-Seitz cell representation of the bulk material stored as a `structure` attribute,
and a `site` attribute that indicates where the defect is in the unit cell.
Different kinds of point defects all subclass the `Defect` objects which gives easy access to functions such as generating a cubic simulation supercell.
As along as the database or any algorithm keeps track of this `Defect` object, you can just use simple structure matching to find out if two simulations represent the same defect.


### We can look at a substitutional defect to see how this works.

In [None]:
from pathlib import Path
TEST_FILES = Path("../tests/test_files")

In [None]:
from pymatgen.analysis.defects.core import Substitution
from pymatgen.core import Structure, PeriodicSite, Species

bulk = Structure.from_file(TEST_FILES / "GaN.vasp")
bulk.lattice == bulk.get_primitive_structure().lattice # check that you have the primitive structure

True

In [None]:
ga_site = bulk.sites[0]
mg_site = PeriodicSite(Species("O"), ga_site.frac_coords, bulk.lattice)

In [None]:
Substitution(structure=bulk, site=mg_site)

O subsitituted on the Ga site at at site #0