# Tutorial

### Getting started

Welcome to the *simcat* quickstart tutorial. This page is intended to walk you through the process of using *simcat* to implement machine learning on genome-wide SNP data to infer a species network.

In [1]:
import simcat
import toytree

### What do you need?

To use *ipcoal*, we expect you to have 1) a **genome-wide SNP dataset**, such as from a RAD method, and 2) an **inferred species tree** with Ne value estimates on the branches and with branch lengths in units of generations.

### The inference process:

*simcat* uses supervised machine learning to infer the location of an admixture edge on a species tree. This involves first generating a big database of simulated data that maps the signal of every possible introgressive edge back to the scenario that produced it. 

### Training simulation database

To generate a training database, we simply need to read in our species tree with *toytree* and give it, along with parameter boundaries, to the `Database()` function. Here is an example of 

In [None]:
tree = toytree.tree('')

In [None]:
db = simcat.Database(
    name=str(jobname),
    workdir=".",
    tree=tree,
    n_admix_edges=1,
    n_sampled_snps=20000,
    n_sampled_admix_prop=1,
    n_sampled_Ne=1,
    n_sampled_reps=1,
    Ne_min=100000,
    Ne_max=1000000,
    Ne_fixed=False,
    admix_prop_min=0.00,
    admix_prop_max=0.50,
    admix_edge_min=0.3,
    admix_edge_max=0.7,
    exclude_sisters=False,
    node_slider=True,
    max_rows_per_test=20,
    seed=randseed,
    force=False,
    quiet=False,
    random_sampling=True,
    nthreads=2)

In [None]:
db.run()