# Integrating multi-omics data with `momics`

`momics` excels at managing multi-omics data. It allows you to store and retrieve data from a variety of sources. Here, we will see how to create, manage and query local repositories using `momics`.

## Creating a local `momics` repository

To create a local `momics` repository, one can use the `Momics` constructor in python.

In [None]:
from momics.momics import Momics

## Creating repository
Momics("yeast_CNN_data.momics").remove()  # Purging existing files
mom = Momics("yeast_CNN_data.momics")  # Creating a brand new repository

## Adding data to the repository

The first step to populate a repository is to register chromosomes. To do this, a dictionary with the chromosome name as the key and the chromosome length as the value is passed to the `ingest_chroms` method. This dictionary can be created manually, e.g. from a local `fasta` file. 

In [None]:
## We will get chromosome sizes from a local fasta file.
from pyfaidx import Fasta

f = Fasta("/home/jaseriza/repos/momics/data/S288c.fa")
chrom_lengths = {chrom: len(seq) for chrom, seq in zip(f.keys(), f.values())}

mom.ingest_chroms(chrom_lengths, genome_version="S288c")
mom.chroms()

Once the chromosomes are registered, you can ingest data, e.g. genomic sequence or genomic features, to the repository.

In [None]:
## Ingesting genome reference sequence
mom.ingest_sequence("/home/jaseriza/repos/momics/data/S288c.fa")
mom.seq()

Coverage tracks in `bigwig` format can also be ingested to the local repository. Here again, a `<ID>:<path>` dictionary is passed to the `ingest_tracks` method.

In [None]:
## Ingesting genome-wide tracks
mom.ingest_tracks(
    {
        "atac": "/home/jaseriza/repos/momics/data/S288c_atac.bw",
        # "rna": "/home/jaseriza/repos/momics/data/S288c_rna.bw",
        "scc1": "/home/jaseriza/repos/momics/data/S288c_scc1.bw",
        "mnase": "/home/jaseriza/repos/momics/data/S288c_mnase.bw",
    }
)
mom.tracks()

## Querying data from the repository

Now that we have added data to the repository, we can query specific genomic ranges using `MomicsQuery` objects. 

In [None]:
## We define non-overlapping windows of 1kb over the entire S288c genome
windows = mom.bins(1000, cut_last_bin_out=True)
windows

In [None]:
## Next, we build a query object to query specific tracks from the momics object
from momics.query import MomicsQuery

q = MomicsQuery(mom, windows)
q.query_tracks(tracks=["atac", "scc1"], silent=False)
"ATAC coverage over the first range queried: " + str(q.coverage["atac"]["I:0-1000"][0:5]) + "..."

In [None]:
## We can also query sequences over the windows
q.query_sequence(silent=False)
"Genome sequence over the first range queried: " + str(q.seq["nucleotide"]["I:0-1000"][0:10]) + "..."

## Extracting data from the repository

A `momics` repository can also be used to store and retrieve data. This data can be extracted from the repository and saved to a local file. 

In [None]:
atac = mom.tracks(label="atac")
atac

In [None]:
from momics import utils as mutils

path = mutils.dict_to_bigwig(atac, "extracted_atac_track.bw")
"File saved to: " + path.name

## Deleting a repository 

To delete a repository, you can use the `remove()` method on the repository object. This will delete the repository and all its contents. Now that this notebook is complete, we can delete the repository :)

In [None]:
# mom.remove()