# Integrating multi-omics data with `momics`

`momics` excels at managing multi-omics data. It allows you to store and retrieve data from a variety of sources. Here, we will see how to create, manage and query local repositories using `momics`.

## Creating a local `momics` repository

To create a local `momics` repository, one can use the `Momics` constructor in python.

In [1]:
from momics.momics import Momics

## Creating repository
Momics("yeast_CNN_data.momics").remove()  # Purging existing files
mom = Momics("yeast_CNN_data.momics")  # Creating a brand new repository

momics :: INFO :: 2025-03-26 17:14:44,504 :: No cloud config found for momics.Consider populating `~/.momics.ini` file with configuration settings for cloud access.
momics :: INFO :: 2025-03-26 17:14:44,535 :: Purged yeast_CNN_data.momics
momics :: INFO :: 2025-03-26 17:14:44,540 :: Created yeast_CNN_data.momics


## Adding data to the repository

The first step to populate a repository is to register chromosomes. To do this, a dictionary with the chromosome name as the key and the chromosome length as the value is passed to the `ingest_chroms` method. This dictionary can be created manually, e.g. from a local `fasta` file. 

In [2]:
## We will get chromosome sizes from a local fasta file.
from pyfaidx import Fasta

f = Fasta("/home/jaseriza/repos/momics/data/S288c.fa")
chrom_lengths = {chrom: len(seq) for chrom, seq in zip(f.keys(), f.values())}

mom.ingest_chroms(chrom_lengths, genome_version="S288c")
mom.chroms()

Unnamed: 0,chrom_index,chrom,length
0,0,I,230218
1,1,II,813184
2,2,III,316620
3,3,IV,1531933
4,4,V,576874
5,5,VI,270161
6,6,VII,1090940
7,7,VIII,562643
8,8,IX,439888
9,9,X,745751


Once the chromosomes are registered, you can ingest data, e.g. genomic sequence or genomic features, to the repository.

In [3]:
## Ingesting genome reference sequence
mom.ingest_sequence("/home/jaseriza/repos/momics/data/S288c.fa")
mom.seq()

momics :: INFO :: 2025-03-26 17:14:47,879 :: Genome sequence ingested in 3.1272s.


Unnamed: 0,chrom_index,chrom,length,seq
0,0,I,230218,CCACACCACA...TGTGTGTGGG
1,1,II,813184,AAATAGCCCT...GTGGGTGTGT
2,2,III,316620,CCCACACACC...GGTGTGTGTG
3,3,IV,1531933,ACACCACACC...TAGCTTTTGG
4,4,V,576874,CGTCTCCTCC...TTTTTTTTTT
5,5,VI,270161,GATCTCGCAA...TGGTGTGTGG
6,6,VII,1090940,CCACACCCAC...TTTTTTTTTT
7,7,VIII,562643,CCCACACACA...GTGTGTGTGG
8,8,IX,439888,CACACACACC...GTGTGTGTGT
9,9,X,745751,CCCACACACA...GTGTGGGTGT


Coverage tracks in `bigwig` format can also be ingested to the local repository. Here again, a `<ID>:<path>` dictionary is passed to the `ingest_tracks` method.

In [4]:
## Ingesting genome-wide tracks
mom.ingest_tracks(
    {
        "atac": "/home/jaseriza/repos/momics/data/S288c_atac.bw",
        # "rna": "/home/jaseriza/repos/momics/data/S288c_rna.bw",
        "scc1": "/home/jaseriza/repos/momics/data/S288c_scc1.bw",
        "mnase": "/home/jaseriza/repos/momics/data/S288c_mnase.bw",
    }
)
mom.tracks()

momics :: INFO :: 2025-03-26 17:14:49,605 :: 3 tracks ingested in 1.5888s.


Unnamed: 0,idx,label,path
0,0,atac,/home/jaseriza/repos/momics/data/S288c_atac.bw
1,1,scc1,/home/jaseriza/repos/momics/data/S288c_scc1.bw
2,2,mnase,/home/jaseriza/repos/momics/data/S288c_mnase.bw


## Querying data from the repository

Now that we have added data to the repository, we can query specific genomic ranges using `MomicsQuery` objects. 

In [5]:
## We define non-overlapping windows of 1kb over the entire S288c genome
windows = mom.bins(1000, cut_last_bin_out=True)
windows

Unnamed: 0,Chromosome,Start,End
0,I,0,1000
1,I,1000,2000
2,I,2000,3000
3,I,3000,4000
4,I,4000,5000
...,...,...,...
12143,XVI,943000,944000
12144,XVI,944000,945000
12145,XVI,945000,946000
12146,XVI,946000,947000


In [None]:
## Next, we build a query object to query specific tracks from the momics object
from momics.query import MomicsQuery

q = MomicsQuery(mom, windows)
q.query_tracks(tracks=["atac", "scc1"], silent=False)
"ATAC coverage over the first range queried: " + str(q.coverage["atac"]["I:0-1000"][0:5]) + "..."

ValueError: Track rna not found in the repository.

In [None]:
## We can also query sequences over the windows
q.query_sequence(silent=False)
"Genome sequence over the first range queried: " + str(q.seq["nucleotide"]["I:0-1000"][0:10]) + "..."

momics :: INFO :: 2025-03-26 16:50:29,200 :: Query completed in 0.4867s.


'Genome sequence over the first range queried: CCACACCACA...'

## Extracting data from the repository

A `momics` repository can also be used to store and retrieve data. This data can be extracted from the repository and saved to a local file. 

In [None]:
atac = mom.tracks(label="atac")
atac

{'I': array([1.91385 , 2.73407 , 2.73407 , ..., 0.820222, 0.820222, 0.273407],
       dtype=float32),
 'II': array([1.09363, 1.09363, 1.09363, ...,     nan,     nan,     nan],
       dtype=float32),
 'III': array([nan, nan, nan, ..., nan, nan, nan], dtype=float32),
 'IV': array([     nan, 0.820222, 1.36704 , ...,      nan,      nan,      nan],
       dtype=float32),
 'V': array([nan, nan, nan, ..., nan, nan, nan], dtype=float32),
 'VI': array([     nan,      nan,      nan, ..., 2.46067 , 2.46067 , 0.546815],
       dtype=float32),
 'VII': array([0.820222, 1.36704 , 1.36704 , ...,      nan,      nan,      nan],
       dtype=float32),
 'VIII': array([nan, nan, nan, ..., nan, nan, nan], dtype=float32),
 'IX': array([3.5543 , 4.37452, 5.74156, ...,     nan,     nan,     nan],
       dtype=float32),
 'X': array([     nan,      nan,      nan, ..., 3.5543  , 2.46067 , 0.546815],
       dtype=float32),
 'XI': array([2.46067 , 2.46067 , 2.73407 , ..., 0.273407, 0.273407, 0.273407],
       dtype

In [None]:
from momics import utils as mutils

path = mutils.dict_to_bigwig(atac, "extracted_atac_track.bw")
"File saved to: " + path.name

'File saved to: extracted_atac_track.bw'

## Deleting a repository 

To delete a repository, you can use the `remove()` method on the repository object. This will delete the repository and all its contents. Now that this notebook is complete, we can delete the repository :)

In [None]:
# mom.remove()