# Single Gene Analysis (Funk=4,672)
### Aim:
Perform single gene correlation analysis on set of well characterised genes from Funk et al. (4,672)

### Output:
Dictionary of dictionaries of results from single gene analysis.
Each gene is a key with the following keys
##### Dict Keys:
- **['Correlation']** - (DF) Correlation results
- **['Network']** - (Networkx Graph) STRING Network
- **['Interactions']** - (DF) List of node interactions used for edge weights

#### Description:
- Perform single gene analysis for each of the 4,672 genes to compute correlations + generate networks
- Save final dictionary of results as a pickle file in directory 'pickle_files'

In [1]:
# Import packages + DepMap tools for analysis
import os
import pandas as pd
import time
from DepMapTools.DataImport import DataDownload, SaveLoad
from DepMapTools.GOI import  GOIAnalysis

In [3]:
# Instantiate DataDownload class and download Achilles CRISPR data
dd = DataDownload()
sl = SaveLoad()
# Define CSV file path
PRD = ".."
csv_path = os.path.join(PRD,
                        'AnalysisData')
# Load clean Achilles CRISPR data
df = dd.load_data('CRISPR_gene_effect_clean.csv', 0, csv_path)

In [4]:
# Load gene list dataset
gene_path = os.path.join(PRD,
                         'AnalysisData')
gene_df = dd.load_data('essential_genes_funk_etal.csv', 0, gene_path)

In [5]:
# Reset index for rows and columns
gene_df = gene_df.reset_index().T.reset_index().T
gene_df = gene_df.rename({0:'gene'}, axis=1)
gene_df = gene_df.reset_index()
gene_df = gene_df.drop('index', axis=1)

In [6]:
# Format gene_data to make it iterable
gene_df = gene_df.dropna()
gene_df['gene'] = gene_df['gene'].astype(pd.StringDtype())
gene_df = gene_df.sort_values(by=['gene'])
gene_df = gene_df.reset_index(drop=True)

In [8]:
# Find genes not in Achilles df
ach_genes = list(df.columns.values)
not_in_df = [gene for gene in gene_df['gene'] if gene not in ach_genes]

In [9]:
# Remove genes not in Achilles df not in gene list
print(f'Starting length of gene_df = {len(gene_df)}')
print('-'*20)
for gene in not_in_df:
    gene_df = gene_df.drop(gene_df['gene'].loc[gene_df['gene'] == gene].index)
print(f'Removed {len(not_in_df)} genes from gene_df not present in Achilles df\nThere are now {len(gene_df)} genes in gene_df')
print('-'*20)
print('Drop duplicates')
length = len(gene_df)
gene_df = gene_df.drop_duplicates(ignore_index=True)
print(f'Removed {length - len(gene_df)} duplicate genes\nThere are now {len(gene_df)} genes in gene_df')

Starting length of gene_df = 5049
--------------------
Removed 377 genes from gene_df not present in Achilles df
There are now 4672 genes in gene_df
--------------------
Drop duplicates
Removed 0 duplicate genes
There are now 4672 genes in gene_df


In [11]:
# Run single gene analysis and return dict of dicts of results
print('Computing Single Gene Analysis')
gene_dict = {}
counter = 0
for gene in gene_df['gene']:
    goi = GOIAnalysis(df, gene)
    goi_dict = goi.goi_analysis(400)
    gene_dict[f'{gene}'] = goi_dict
    counter += 1
    time.sleep(1)
    if counter%500 == 0:
        print(f'{counter} genes analysed')
print('Complete')

Computing Single Gene Analysis
500 genes analysed
1000 genes analysed
1500 genes analysed
2000 genes analysed
2500 genes analysed
3000 genes analysed
3500 genes analysed
4000 genes analysed
4500 genes analysed
Complete


In [12]:
# Save dictionary of dictionaries as a pickle file in pickle folder
sl.save_dict_pickle(gene_dict, 'chronos_singlegene_funk')