In [1]:
#| echo: false
#| output: false

%load_ext autoreload
%autoreload 2

In [2]:
from geneinfo.utils import GeneListCollection
from geneinfo.utils import GeneList as glist

## GeneList

Long lists of gene names do not work well visually:

In [3]:
dummy_genes = ['ABCB7', 'ACTRT1', 'AKAP4', 'ALG13', 'ARHGAP36', 'ATP7A', 'ATRX', 'BCLAF3', 'BRCC3', 'CAPN6', 'CCNB3', 'CFAP47', 'CLCN5', 'CMC4', 'CNKSR2', 'COX7B', 'CYBB', 'DCX', 'DKC1', 'DYNLT3', 'ENOX2', 'ENOX2-AS1', 'EZHIP', 'F8', 'F8A1', 'FAM120C', 'FGF16']
dummy_genes

['ABCB7',
 'ACTRT1',
 'AKAP4',
 'ALG13',
 'ARHGAP36',
 'ATP7A',
 'ATRX',
 'BCLAF3',
 'BRCC3',
 'CAPN6',
 'CCNB3',
 'CFAP47',
 'CLCN5',
 'CMC4',
 'CNKSR2',
 'COX7B',
 'CYBB',
 'DCX',
 'DKC1',
 'DYNLT3',
 'ENOX2',
 'ENOX2-AS1',
 'EZHIP',
 'F8',
 'F8A1',
 'FAM120C',
 'FGF16']

GeneList objects work just like normal lists but have some additional features that usefull for exploring sets of genes.

When displayed they render as Markdown in columns to make them easier to read:

In [4]:
#| classes: .gene-list

list_A = glist(dummy_genes)
list_A

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


## Highlight genes

The bitwise operator `<<` is overloaded and highlights genes also present in another gene list:

In [5]:
#| classes: .gene-list

list_B = glist(dummy_genes[::2])
list_A << list_B

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


In [6]:
#| classes: .gene-list

list_C = glist(dummy_genes[:12])
list_A << list_C

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


You apply the `<<` operator repeatedly to highlight genes from up to four other gene lists. Each time adding a new style of highlighting is applied in the following sequence:

1. <span style="font-weight: bold;">Bold</span>
2. <span style="color:#1876D2;">Color</span>
3. <span style="text-decoration: underline;">Underline</span>
4. <span style="font-style: italic;">Italic</span>

Genes with all styles applied looks like <span style="font-weight: bold; color:#1876D2; text-decoration: underline; font-style: italic;">this</span>. 

In [7]:
#| classes: .gene-list

list_D = glist(dummy_genes[::4])
list_E = glist(dummy_genes[2::10])

list_A << list_B << list_C <<  list_D << list_E

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


The highlight color can be changed by passing a HEX color to `set_highlight_color`:

In [16]:
#| classes: .gene-list

glist.set_highlight_color('#009D2B')
list_A << list_E << list_D <<  list_C << list_B

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


Reset highlght color:

In [17]:
#| classes: .gene-list

glist.reset_highlight_color()
list_A << list_E << list_D <<  list_C << list_B

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


## Set operations

The bitwise operators, `&`, `|`, and `^`, to allow set operations on gene lists.

Highlight in A the intersection between B and C:

In [19]:
#| classes: .gene-list

list_A << (list_B & list_C)

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


Highlight in A the union between B and C:

In [20]:
#| classes: .gene-list

list_A << (list_B | list_C)

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


Highlight in A the genes not shared by B and C:

In [21]:
#| classes: .gene-list

list_A << (list_B ^ list_C)

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


Highlight in A the genes in B but not in C (set difference):

In [22]:
#| classes: .gene-list

list_A << (list_B ^ (list_B & list_C))

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


Highlight in A the genes in C but not in B (set difference):

In [23]:
#| classes: .gene-list

list_A << (list_C ^ (list_C & list_B))

0,1,2,3,4,5,6,7,8
ABCB7,ALG13,ATRX,CAPN6,CLCN5,COX7B,DKC1,ENOX2-AS1,F8A1
ACTRT1,ARHGAP36,BCLAF3,CCNB3,CMC4,CYBB,DYNLT3,EZHIP,FAM120C
AKAP4,ATP7A,BRCC3,CFAP47,CNKSR2,DCX,ENOX2,F8,FGF16


## GeneListCollection

Load table of gene lists from a csv file:

In [31]:
gene_lists = GeneListCollection('google_sheet.csv')

Or from a Google Sheet using its ID and the sheet name:

In [25]:
# gene_lists = GeneListCollection(google_sheet='2JSjSLuto3jqdEnnG7JqzeC_1pUZw76n7XueVAYrUOpk')

See which neuron genes are also SFARI genes:

In [None]:
# gene_lists = GeneListCollection(google_sheet='2JSjSLuto3jqdEnnG7JqzeC_1pUZw76n7XueVAYrUOpk')

In [33]:
#| echo: false
#| output: false
#| classes: .gene-list

gene_lists = GeneListCollection(google_sheet='1JSjSLuto3jqdEnnG7JqzeC_1pUZw76n7XueVAYrUOpk')

In [34]:
#| echo: false
#| output: false
#| classes: .gene-list

neuron_genes = glist(gene_lists.get('neuron_npx_proteome'))
sfari = glist(gene_lists.get('sfari_all_conf'))
neuron_genes << sfari

0,1,2,3,4,5,6,7,8,9
ABCB7,BGN,EBP,GPR173,LAS1L,NLGN4X,PNMA3,RPS6KA3,TAF9B,USP51
ABCD1,BRCC3,EFHC2,GPR82,LDOC1,NONO,PNMA6A,RPS6KA6,TBC1D25,USP9X
ACOT9,BRWD3,EFNB1,GPRASP1,LONRF3,NSDHL,PNPLA4,RRAGB,TBC1D8B,UTP14A
ACSL4,C1GALT1C1,EGFL6,GPRASP2,LRCH2,NUDT11,POLA1,RS1,TBL1X,UXT
ADGRG2,CA5B,EIF1AX,GRIA3,MAGED1,NUP62CL,PORCN,SAT1,TCEAL1,VAMP7
ADGRG4,CACNA1F,EIF2S3,GRIPAP1,MAGED2,NXT2,PPEF1,SCML1,TCEAL2,VBP1
AFF2,CASK,ELK1,GSPT2,MAGEE1,NYX,PPP1R3F,SCML2,TCEAL3,VCX3B
AIFM1,CCDC120,EMD,GUCY2F,MAGEE2,OCRL,PQBP1,SH3BGRL,TCEAL4,VMA21
ALG13,CCDC22,ENOX2,GYG2,MAGEH1,OFD1,PRAF2,SH3KBP1,TCEAL5,WDR13
AMER1,CD99L2,F8,HAUS7,MAGIX,OGT,PRDX4,SHROOM2,TCEAL6,WDR44


In [37]:
(glist(gene_lists.get('cDEG')) 
 << glist(gene_lists.get('Hama'))
 << glist(gene_lists.get('ech90_regions'))
 << glist(gene_lists.get('hum_nean_admix'))
 << glist(gene_lists.get('ari_nonPUR'))
)

0,1,2,3,4,5
CFAP47,EDA,HUWE1,PHF8,SCML1,UPF3B
DDX3X,EIF1AX,IQSEC2,PRICKLE3,SRPX2,VSIG1
DIAPH2,EMD,mc_ampl_SPANXN5,RBM41,SYP,
DYNLT3,HTR2C,OCRL,RTL4,SYTL5,
