To create the files need for uploading an assembly, all you need is pandas and bioframe (https://bioframe.readthedocs.io/)

In [None]:
import pandas as pd
import bioframe

If you have an assembly where you can fetch centromeres from the UCSC database via bioframe:

In [28]:
centromeres = bioframe.fetch_centromeres("hg38")
centromeres

Unnamed: 0,chrom,start,end,mid
0,chr1,122026459,124932724,123479591
1,chr10,39686682,41593521,40640101
2,chr11,51078348,54425074,52751711
3,chr12,34769407,37185252,35977329
4,chr13,16000000,18051248,17025624
5,chr14,16000000,18173523,17086761
6,chr15,17083673,19725254,18404463
7,chr16,36311158,38265669,37288413
8,chr17,22813679,26616164,24714921
9,chr18,15460899,20861206,18161052


Fetch the sizes:(per default bioframes fetches from the UCSC database)

In [30]:
chromsizes = bioframe.fetch_chromsizes("hg38")
chromsizes_frame = chromsizes.reset_index().query("index != 'chrM'")
chromsizes_frame.to_csv("hg38.chrom.sizes", sep="\t", header=None, index=False)
chromsizes_frame

Unnamed: 0,index,length
0,chr1,248956422
1,chr2,242193529
2,chr3,198295559
3,chr4,190214555
4,chr5,181538259
5,chr6,170805979
6,chr7,159345973
7,chr8,145138636
8,chr9,138394717
9,chr10,133797422


Now we create the arms out of chromsizes and centromeres.

In [35]:
arms = bioframe.make_chromarms(chromsizes, centromeres)
arms[['chrom','start','end']].to_csv('arms.hg38',index=False)
arms[['chrom','start','end']]

Unnamed: 0,chrom,start,end
0,chr1,0,123479591
1,chr1,123479591,248956422
2,chr2,0,93139351
3,chr2,93139351,242193529
4,chr3,0,92214016
5,chr3,92214016,198295559
6,chr4,0,50728006
7,chr4,50728006,190214555
8,chr5,0,48272853
9,chr5,48272853,181538259


If centromeres can not be fetched for our genome:

In [37]:
centromeres = bioframe.fetch_centromeres("mm9")
centromeres

Unnamed: 0,chrom,start,end,mid


We can simply take the midpoint as centromeres, since we only use the arms for optimization purposes.

_Future version of HiCognition might only need the chromsizes._

In [1]:
chromsizes = bioframe.fetch_chromsizes("mm9")
chromsizes_frame = chromsizes.reset_index().query("index != 'chrM'")

chromsizes_frame.to_csv("mm9.chrom.sizes", sep="\t", header=None, index=False)

# take center as centromeres
centromeres = chromsizes//2


arms = [
    arm
    for chrom in chromsizes_frame["index"]
    for arm in (
        (chrom, 0, centromeres.get(chrom, 0)),
        (chrom, centromeres.get(chrom, 0), chromsizes.get(chrom, 0)),
    )
]

# construct dataframe out of arms
arms = pd.DataFrame(arms, columns=["chrom", "start", "end"])

arms.to_csv("arms.mm9", index=False)