# Community and Gene Semantic Similarity Analysis (n=495)
### Aim:
Calculate Wang's Semantic Similarity Score for the GO annotations for the GOI and the overrepresented GOs for the community computed via GSEA
<br>
### Output:
Dictionary of dictionaries of results of sematic similarity calculations.
The data for each gene is stored under the following three keys
##### Dict Keys:
- **['gsea_analysis']** - (DICT) Genes (keys) and semantic similarity scores as values
- **['missing_genes']** - (LIST) List of genes missing from the DepMap datafile
- **['community too small']** - (LIST) List of genes with communities too small to perform GSEA
<br>

#### Description:
- Load NCBI associations dictionary
- For each community detection dataset compute the semantic similarity score and compile results dictionary
- Export each results dictionary for each algorithm as a pickle file

In [3]:
# Import DepMap tools and NCBI associations dictionary
import os
import pandas as pd
from DepMapTools.DataImport import SaveLoad
from DepMapTools.GeneOntology import OntologyAnalysis
from DepMap.Networks import Permutations
from genes_ncbi_9606_proteincoding import GENEID2NT as g_id_hum

In [4]:
# Instantiate Classes
sl = SaveLoad()
oa = OntologyAnalysis()
pm = Permutations()
# Define NCBI_DICT of gene ontology associations
ncbi = g_id_hum

In [8]:
# Define dictionary paths (n=495)
PRD = ".."
k_path = os.path.join(PRD,
                      '2_Community_Analysis/pickle_files/chronos_k_community_495.pickle')
k_emp = os.path.join(PRD,
                     '2_Community_Analysis/pickle_files/chronos_k_permute_495.pickle')
g_path = os.path.join(PRD,
                      '2_Community_Analysis/pickle_files/chronos_girvan_community_495.pickle')
g_emp = os.path.join(PRD,
                     '2_Community_Analysis/pickle_files/chronos_girvan_permute_495.pickle')
lo_path = os.path.join(PRD,
                       '2_Community_Analysis/pickle_files/chronos_louvain_community_495.pickle')
lo_emp = os.path.join(PRD,
                      '2_Community_Analysis/pickle_files/chronos_louvain_permute_495.pickle')
la_path = os.path.join(PRD,
                       '2_Community_Analysis/pickle_files/chronos_label_community_495.pickle')
la_emp = os.path.join(PRD,
                      '2_Community_Analysis/pickle_files/chronos_label_permute_495.pickle')

In [9]:
# Load community analysis dictionaries
k_dict = sl.load_dict_pickle(k_path)
k_emp = sl.load_dict_pickle(k_emp)

g_dict = sl.load_dict_pickle(g_path)
g_emp = sl.load_dict_pickle(g_emp)

lo_dict = sl.load_dict_pickle(lo_path)
lo_emp = sl.load_dict_pickle(lo_emp)

la_dict = sl.load_dict_pickle(la_path)
la_emp = sl.load_dict_pickle(la_emp)

In [10]:
# Define URLs for OBO basic and OBO slim as variables
url_basic = 'http://purl.obolibrary.org/obo/go/go-basic.obo'
url_slim = 'http://current.geneontology.org/ontology/subsets/goslim_generic.obo'

In [11]:
# Get NCBI associations
associations = oa.get_ncbi_associations()

  EXISTS: gene2go
HMS:0:00:05.073382 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 


In [13]:
# Make final dict
k_final = pm.make_sig_dict(k_dict, k_emp)
g_final = pm.make_sig_dict(g_dict, k_emp)
lo_final = pm.make_sig_dict(lo_dict, k_emp)
la_final = pm.make_sig_dict(la_dict, k_emp)

### K-Clique Similarity Analysis

In [14]:
# Define Gene Ontology dictionary for K-Clique data
k_gos, k_small, k_missing = oa.make_go_dict(ncbi, k_final)
print('K_CLIQUE')
print(f'{len(k_gos)} genes with GSEA for community')
print(f'{len(k_small)} genes with no GSEA as community too small')

  EXISTS: gene2go
HMS:0:00:04.720103 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 
K_CLIQUE
345 genes with GSEA for community
0 genes with no GSEA as community too small


In [15]:
print(f"Missing Genes (total={len(k_missing)}):")
print('-'*20)
print(f'Community Too Small (total={len(k_small)}):')

Missing Genes (total=1):
--------------------
Community Too Small (total=0):


In [16]:
# Perform semantic similarity analysis for k-clique data
k_final, k_no_go = oa.wang_sim_analysis(k_gos, url_basic, url_slim)
results = {'gsea_analysis': k_final,
           'missing_genes': k_missing,
           'community_too_small': k_small
           }

  EXISTS: go-basic.obo
go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms; optional_attrs(relationship)
ACTR10
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
AHCTF1
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdat

In [17]:
# Save k-clique results
sl.save_dict_pickle(results, "chronos_k_similarity_significant_495")

<br>

### Girvan-Newman Similarity Analysis

In [18]:
# Define Gene Ontology dictionary for Girvan-Newman data
gn_gos, gn_small, gn_missing = oa.make_go_dict(ncbi, g_final)
print('Girvan-Newman')
print(f'{len(gn_gos)} genes with GSEA for community')
print(f'{len(gn_small)} genes with no GSEA as community too small')

  EXISTS: gene2go
HMS:0:00:05.378163 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 
Girvan-Newman
359 genes with GSEA for community
0 genes with no GSEA as community too small


In [19]:
print(f"Missing Genes (total={len(gn_missing)}):")
print('-'*20)
print(f'Community Too Small (total={len(gn_small)}):')

Missing Genes (total=2):
--------------------
Community Too Small (total=0):


In [20]:
# Perform semantic similarity analysis for Girvan-Newman data
gn_final, gn_no_go = oa.wang_sim_analysis(gn_gos, url_basic, url_slim)
results_gn = {'gsea_analysis': gn_final,
              'missing_genes': gn_missing,
              'community_too_small': gn_small
              }

  EXISTS: go-basic.obo
go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms; optional_attrs(relationship)
ACTR10
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
AIFM3
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata

In [21]:
# Save Girvan-Newman results
sl.save_dict_pickle(results_gn, "chronos_girvan_similarity_significant_495")

<br>

### Louvain Similarity Analysis

In [22]:
# Define Gene Ontology dictionary for Louvain data
lo_gos, lo_small, lo_missing = oa.make_go_dict(ncbi, lo_final)
print('Louvain')
print(f'{len(lo_gos)} genes with GSEA for community')
print(f'{len(lo_small)} genes with no GSEA as community too small')

  EXISTS: gene2go
HMS:0:00:05.282881 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 
Louvain
434 genes with GSEA for community
0 genes with no GSEA as community too small


In [23]:
print(f"Missing Genes (total={len(lo_missing)}):")
print('-' * 20)
print(f'Community Too Small (total={len(lo_small)}):')

Missing Genes (total=1):
--------------------
Community Too Small (total=0):


In [24]:
# Perform semantic similarity analysis for Louvain data
lo_final, lo_no_go = oa.wang_sim_analysis(lo_gos, url_basic, url_slim)
results_lo = {'gsea_analysis': lo_final,
              'missing_genes': lo_missing,
              'community_too_small': lo_small
              }

  EXISTS: go-basic.obo
go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms; optional_attrs(relationship)
ACTR10
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
AHCTF1
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdat

In [25]:
# Save Louvain results
sl.save_dict_pickle(results_lo, "chronos_louvain_similarity_significant_495")

<br>

### Label Propagation Similarity Analysis

In [26]:
# Define Gene Ontology dictionary for Label propagation data
la_gos, la_small, la_missing = oa.make_go_dict(ncbi, la_final)
print('Label Propagation')
print(f'{len(la_gos)} genes with GSEA for community')
print(f'{len(la_small)} genes with no GSEA as community too small')

  EXISTS: gene2go
HMS:0:00:05.181509 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 
Label Propagation
403 genes with GSEA for community
0 genes with no GSEA as community too small


In [27]:
print(f"Missing Genes (total={len(la_missing)}):")
print('-' * 20)
print(f'Community Too Small (total={len(la_small)}):')

Missing Genes (total=2):
--------------------
Community Too Small (total=0):


In [28]:
# Perform semantic similarity analysis for Louvain data
la_final, la_no_go = oa.wang_sim_analysis(la_gos, url_basic, url_slim)
results_la = {'gsea_analysis': la_final,
              'missing_genes': la_missing,
              'community_too_small': la_small
              }

  EXISTS: go-basic.obo
go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms; optional_attrs(relationship)
ACTR10
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
AHCTF1
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdat

In [29]:
# Save Label Propagation results
sl.save_dict_pickle(results_la, "chronos_label_similarity_significant_495")