# Data structures


In [1]:
import numpy as np
import xftsim as xft
from xftsim import index, struct

xft.config.print_durations_threshold=10. ## reduce verbosity
np.random.seed(123) ## set random seed for reproducibility

Here we introduce `HaplotypeArray` and `PhenotypeArray` objects, the two primary data objects `xftsim` operates on. These objects are indexed as follows:

| Object | Row Index | Column Index |
| --- | --- | --- |
|`struct.HaplotypeArray`|`index.SampleIndex`|`index.HaploidVariantIndex`|
|`struct.PhenotypeArray`|`index.SampleIndex`|`index.ComponentIndex`|


:::{warning}

It will be challenging to to understand this information if you haven't read [the tutorial on indexing](./indexing.ipynb)!

:::

Counterintuively, Neither `HaplotypeArray` are `PhenotypeArray` actual classes. Rather, both construct instances of `xarray.DataArray` with an extended API available through the `xft` accessor. If you're confused, don't worry--we'll go through all of this step by step.

:::{tip}

The [xarray documentation](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html) can be very helpful if you haven't used `xarray` before!

:::

First, we'll run a small simulation (the details of which aren't important for now) so that we have some arrays to work with:


In [None]:
founder_haplotypes = xft.founders.founder_haplotypes_uniform_AFs(n=800, m=100)
architecture = xft.arch.GCTA_Architecture(h2=[.5,.5], phenotype_name=['height', 'BMD'], haplotypes=founder_haplotypes)
recombination_map = xft.reproduce.RecombinationMap.constant_map_from_haplotypes(founder_haplotypes, p =.1)
mating_regime = xft.mate.LinearAssortativeMatingRegime(r = .5, offspring_per_pair=2,
                                                       component_index = xft.index.ComponentIndex.from_product(['height', 'BMD'], ['phenotype']))
sim = xft.sim.Simulation(founder_haplotypes=founder_haplotypes,
                         mating_regime=mating_regime,
                         recombination_map=recombination_map,
                         architecture=architecture)
sim.run(2)

haplo = sim.haplotypes
pheno = sim.phenotypes

## Haplotype arrays

For $n$ individuals and $m$ diploid loci, a haplotype array is an $n\times 2m$ `xr.DataArray` of 8-bit integers

In [None]:
haplo