# Virtual environmnet

```conda activate higlass-server``` 

# mcool files

**example:**
```python manage.py ingest_tileset --filename data/AMG_HS_heart_NHCFV_HiC.mcool --coordSystem hg38 --filetype cooler --datatype matrix --uid AMG_HS_heart_NHCFV_HiC --project-name NHCFV```

**note:** no `.` in uid

# beddb files

### Convert bed file to beddb(higlass format)
**example:**<br>
in tmp folder<br>
```clodius aggregate bedfile --chromsizes-filename hg38.chrom.sizes AMG_HS_HEART_NHCFV_SNP.bed```<br>
Change the name to agree with our naming convention<br>
```mv AMG_HS_HEART_NHCFV_SNP.bed.beddb AMG_HS_HEART_NHCFV_HIC_SNP.beddb```<br>
```cp AMG_HS_HEART_NHCFV_SNP.beddb ../data```<br>
```python manage.py ingest_tileset --filename data/AMG_HS_HEART_NHCFV_SNP.beddb --coordSystem hg38 --filetype beddb --datatype bedlike --uid AMG_HS_HEART_NHCFV_SNP --project-name NHCFV```<br>


**another example for 1d-arcs:**
```python manage.py ingest_tileset --filename data/AMG_HS_heart_NHCFV_HiC_loop.beddb --coordSystem hg38 --filetype beddb --datatype bedlike --uid AMG_HS_heart_NHCFV_HiC_loop --project-name NHCFV```
<br>
**note:** this can be used to ingest arc files, the arc track datatype need to be changed to "1d-arcs" as higlass doesn't support 1d-arcs as a datatype in the ingest_tileset method at the time this notebook is writted, 5.5.2020.  

# bigwig files

**AMG data example:**
```python manage.py ingest_tileset --filename data/AMG_HS_heart_NHCFV_ATAC_pval.bigwig --coordSystem hg38 --filetype bigwig --datatype vector --uid AMG_HS_heart_NHCFV_ATAC_pval --project-name NHCFV```<br>


metadata: <br>
https://www.encodeproject.org/experiments/ENCFF944YFS/ H3K3me1 <br>
https://www.encodeproject.org/experiments/ENCFF326LWU/ H3K27ac <br>

**ENCODE data example:**
```python manage.py ingest_tileset --filename data/ENCODE_HS_heart_PC_H3K4me1_pval.bigwig --coordSystem hg38 --filetype bigwig --datatype vector --uid ENCODE_HS_heart_PC_H3K4me1_pval --project-name AMG_internalized_ENCODE```<br>

```python manage.py ingest_tileset --filename data/ENCODE_HS_heart_PC_H3K3me3_pval.bigwig --coordSystem hg38 --filetype bigwig --datatype vector --uid ENCODE_HS_heart_PC_H3K3me3_pval --project-name AMG_internalized_ENCODE```<br>

# Multivec

```clodius convert bedfile-to-multivec E095_25_imputed12marks_hg38lift_dense.bed.gz --assembly hg38 --starting-resolution 200 --row-infos-filename row_infos.txt --num-rows 25 --format states```

# automate the ingest process to save some time
### this needes to be in the data folder (i.e. )

In [8]:
import subprocess
import os
SPECIES_ALLOWED = ["HS"]
def get_uid(source, species, tissue, cell_type, expt, *add_on):
    if species.upper() not in SPECIES_ALLOWED:
        raise(KeyError("Not a recognized species"))
    return f"{source.upper()}_{species.upper()}_{tissue.upper()}_{cell_type.upper()}_{expt.upper()}" + "".join(["_"+ x.upper() for x in add_on])
    
def ingest(project_name, file_type, source, species, tissue, cell_type, expt, *add_on):

    # set filetype 2 datatype map
    ft2dt = {"bigwig": "vector", "beddb": "bedlike", "cooler": "matrix"}
    ft2ext = {"bigwig": "bigwig", "beddb": "beddb", "cooler": "mcool"}
    
    # get uid and filename
    uid = get_uid(source, species, tissue, cell_type, expt, *add_on)
    print(uid)
    file_name = uid + "." + ft2ext[file_type]
    
   
    if 'manage.py' not in os.listdir("../"):
        raise(EnvironmentError("You should be running this script in higlass-server/data folder"))
    
    # ingest
    process = subprocess.run(f'python ../manage.py ingest_tileset --filename {file_name} \
                       --coordSystem hg38 --filetype {file_type} --datatype {ft2dt[file_type]} \
                       --uid {uid} --project-name {project_name}',shell=True, universal_newlines=True, 
                        stderr=subprocess.PIPE)

    if process.stderr:
        print(process.stderr)
    

In [7]:
ingest("NHCFV", "beddb", "AMG", "HS", "heart", "NHCFV", "HiC", "SNP")

AMG_HS_HEART_NHCFV_HIC_SNP


## Random notes

In [None]:
python manage.py ingest_tileset --filename data/ROADMAP_E095_HS_HEART_PC_CHROMHMM.beddb --coordSystem hg38 --filetype beddb --datatype bedlike --uid ROADMAP_E095_HS_HEART_PC_CHROMHMM --project-name ChromHMM

In [None]:
    python manage.py ingest_tileset --filename data/AMG_HS_ --coordSystem hg38 --filetype beddb --datatype bedlike --uid AMG_HS_heart_NHCFV_HIC_SNP --project-name NHCFV

In [None]:
ython manage.py ingest_tileset --filename data/ENCODE_HS_heart_PC_H3K4me3_pval.bigwig --coordSystem hg38 --filetype bigwig --datatype vector --uid ENCODE_HS_heart_PC_H3K4me3_pval --project-name AMG_internalized_ENCODE