# Set environment

In [1]:
source("Pathway_config.R")

Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
Loading required package: org.Hs.eg.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append,

# Import gene ortholog map

In [18]:
h99_jec21 <- read_tsv(
    file.path(INFODIR, 'h99_jec21_syntenic_orthologs.txt'), 
    col_names = TRUE)
colnames(h99_jec21) <- c('h99', 'source', 'jec21', 'NA')

h99_jec21 <- h99_jec21 %>% dplyr::select(h99, jec21)

set.seed(1)
h99_jec21 %>% sample_n(size = 5)

“Missing column names filled in: 'X4' [4]”Parsed with column specification:
cols(
  `[Gene ID]` = col_character(),
  `[source_id]` = col_character(),
  `[Input Ortholog(s)]` = col_character(),
  X4 = col_character()
)


h99,jec21
CNAG_02024,CNE05100
CNAG_02865,"CNC04803, CNC04805"
CNAG_04366,CNI01825
CNAG_07317,CNA01860
CNAG_01545,CNC06390


- h99
    - [CNAG_00006](https://www.ncbi.nlm.nih.gov/gene/?term=CNAG+00006)
        - prohibitin-2
    - [CNAG_00007](https://www.ncbi.nlm.nih.gov/gene/?term=CNAG_00007)
        - methionine-tRNA ligase
    - [CNAG_00008](https://www.ncbi.nlm.nih.gov/gene/23883880)
        - hypothetical protein
    - [CNAG_07373](https://www.ncbi.nlm.nih.gov/gene/23890227)
        - carbamoyl-phosphate synthase, large subunit

- JEC21
    - [CNA00070](https://www.uniprot.org/uniprot/Q5KQ94)
        - Proteolysis and peptidolysis-related protein
    - [CNA00060](https://www.ncbi.nlm.nih.gov/gene/3254020)
        - methionine-tRNA ligase
    - [CNA00050](https://www.ncbi.nlm.nih.gov/gene/3253550)
        - cell cycle arrest in response to pheromone-related protein
    - [CNA06000](https://www.ncbi.nlm.nih.gov/gene/3253832)
        - aspartate carbamoyltransferase

# get genesets

In [26]:
attach(file.path(OUTDIR, "pathway_genesets.RData"))

The following objects are masked from file:/home/jovyan/work/scratch/analysis_output/out/pathway_genesets.RData (pos = 3):

    genesets_cne_h99, genesets_cne_jec



In [75]:
names(genesets_cne_h99)[1]
tmp <- h99_jec21 %>% filter(h99 %in% genesets_cne_h99[[1]])
tmp %>% head(10)

h99,jec21
CNAG_00038,CNA00280
CNAG_00057,CNA00470
CNAG_00515,CNA04970
CNAG_00515,CNA04970
CNAG_00735,CNA07130
CNAG_00797,CNA07740
CNAG_01078,CND02060
CNAG_01120,CND02450
CNAG_01675,CNC01860
CNAG_01675,CNC01860


In [76]:
names(genesets_cne_h99)[2]
tmp <- h99_jec21 %>% filter(h99 %in% genesets_cne_h99[[2]])
tmp %>% head(10)

h99,jec21
CNAG_00061,CNA00510
CNAG_00747,CNA07260
CNAG_01120,CND02450
CNAG_01264,CND03850
CNAG_01657,CNC01700
CNAG_01680,CNC01910
CNAG_02736,CNC03610
CNAG_03225,"CNG03100, CNG03490"
CNAG_03226,CNG03480
CNAG_03266,"CNG03100, CNG03490"


Let's search each gene

- CNAG_00038 (h99)
    - [alcohol dehydrogenase](https://www.ncbi.nlm.nih.gov/gene/23883909)
- CNA00280 (jec21)
    - [Alcohol dehydrogenase](https://www.uniprot.org/uniprot/Q5KQ67)
-----
- CNAG_00057 (h99)
    - [fructose-1,6-bisphosphatase I](https://www.ncbi.nlm.nih.gov/gene/23883926)
- CNA00470 (jec21)
    - [Fructose-bisphosphatase](https://www.uniprot.org/uniprot/Q5KQ48)
-----
- CNAG_00515 (h99)
    - [mannitol dehydrogenase](https://www.ncbi.nlm.nih.gov/gene/?term=CNAG_00515)
- CNA04970 (jec21)
    - [Mannitol dehydrogenase](https://www.uniprot.org/uniprot/Q5KNX5)
-----
- CNAG_00735 (h99)
    - [aldehyde dehydrogenase family 7 member A1](https://www.ncbi.nlm.nih.gov/gene/?term=CNAG_00735)
- CNA07130 (jec21)
    - [Succinate-semialdehyde dehydrogenase [NAD(P)+]](https://www.uniprot.org/uniprot/Q5KNA9)
-----
- CNAG_00797 (h99)
    - [acetyl-CoA synthetase](https://www.ncbi.nlm.nih.gov/gene/?term=CNAG_00797)
- CNA07740 (jec21)
    - [Acetyl-coenzyme A synthetase](https://www.uniprot.org/uniprot/Q5KN48)

In [52]:
tmp <- genesets_cne_h99[1:2]

tmp <- lapply(tmp, function(gene_lst){
    h99_jec21 %>% filter(h99 %in% gene_lst)
})

tmp <- lapply(tmp, function(gene_dat){
    str_split(gene_dat$jec21, ',')
})

tmp <- lapply(tmp, unlist)
tmp <- lapply(tmp, unique)

tmp