# Community and Gene Semantic Similarity Analysis (n=4,672)
### Aim:
Calculate Wang's Semantic Similarity Score for the GO annotations for the GOI and the overrepresented GOs for the community computed via GSEA
<br>
### Output:
Dictionary of dictionaries of results of sematic similarity calculations.
The data for each gene is stored under the following three keys
##### Dict Keys:
- **['gsea_analysis']** - (DICT) Genes (keys) and semantic similarity scores as values
- **['missing_genes']** - (LIST) List of genes missing from the DepMap datafile
- **['community too small']** - (LIST) List of genes with communities too small to perform GSEA
<br>

#### Description:
- Load NCBI associations dictionary
- For the Funk, et al. dataset compute the semantic similarity score and compile results dictionary
- Export as a pickle file

In [1]:
# Import DepMap tools and NCBI associations dictionary
import os
from DepMapTools.DataImport import SaveLoad
from DepMapTools.GeneOntology import OntologyAnalysis
from DepMapTools.Networks import Permutations
from genes_ncbi_9606_proteincoding import GENEID2NT as g_id_hum

In [2]:
# Instantiate Classes
sl = SaveLoad()
oa = OntologyAnalysis()
pm = Permutations()
# Define NCBI_DICT of gene ontology associations
ncbi = g_id_hum

In [3]:
# Define dictionary path (n=4,672)
PRD = ".."
k_path = os.path.join(PRD,
                      '2_Community_Analysis/pickle_files/chronos_k_community_funk.pickle')
k_emp = os.path.join(PRD,
                     '2_Community_Analysis/pickle_files/chronos_k_permute_funk.pickle')

In [4]:
# Load community analysis dictionaries
k_dict = sl.load_dict_pickle(k_path)
k_emp = sl.load_dict_pickle(k_emp)

In [5]:
# Make final dict
k_final = pm.make_sig_dict(k_dict, k_emp)

In [9]:
# Define URLs for OBO basic and OBO slim as variables
url_basic = 'http://purl.obolibrary.org/obo/go/go-basic.obo'
url_slim = 'http://current.geneontology.org/ontology/subsets/goslim_generic.obo'

In [10]:
# Get NCBI associations
associations = oa.get_ncbi_associations()

  EXISTS: gene2go
HMS:0:00:05.713514 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 


### Funk K-Clique Similarity Analysis

In [11]:
# Define Gene Ontology dictionary for K-Clique data
k_gos, k_small, k_missing = oa.make_go_dict(ncbi, k_final)
print('K_CLIQUE')
print(f'{len(k_gos)} genes with GSEA for community')
print(f'{len(k_small)} genes with no GSEA as community too small')

  EXISTS: gene2go
HMS:0:00:05.413919 350,074 annotations, 20,718 genes, 18,978 GOs, 1 taxids READ: gene2go 
K_CLIQUE
2986 genes with GSEA for community
0 genes with no GSEA as community too small


In [12]:
print(f"Missing Genes (total={len(k_missing)}):")
print('-'*20)
print(f'Community Too Small (total={len(k_small)}):')

Missing Genes (total=162):
--------------------
Community Too Small (total=0):


In [13]:
# Perform semantic similarity analysis for k-clique data
k_final, k_no_go = oa.wang_sim_analysis(k_gos, url_basic, url_slim)
results = {'gsea_analysis': k_final,
           'missing_genes': k_missing,
           'community_too_small': k_small
           }

  EXISTS: go-basic.obo
go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms; optional_attrs(relationship)
AAAS
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-basic.obo: fmt(1.2) rel(2022-07-01) 47,008 Terms
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/go-slim.obo: fmt(1.2) rel(None) 210 Terms
AAGAB
/Users/robbie/My Drive/2-University/2-Masters_ SussexDataScience/Semester_3/S3-Dissertation/DiscusMsc22_DepMap-Robbie/3_GO_Analysis/GOdata/g

In [14]:
# Save k-clique results
sl.save_dict_pickle(results, "chronos_k_similarity_significant_funk")