# 3 Putative Disease Gene Identification
In this notebook, we want to pick the best performing algorithm and use it to find new putative disease genes as well as performing an enrichment analysis.

Make sure you do the following setup:

* install all requirements in `requirements.txt`.
* have a `ppi.txt` file in the root directory.
* have a `seed_genes.txt` file in the root directory.

## 3.1 Putative Disease Genes
Using all known GDAs as seed genes, we will obtain a list of 100 putative disease genes.

In [1]:
from diamond import diffusion
import pandas as pd
import pyperclip

In [2]:
num_disease_genes_to_find = 100
new_genes = diffusion("ppi_lcc.txt", "seed_genes.txt", num_disease_genes_to_find, time=0.01)
entrez_ids = new_genes.gene

Generating CX


## 3.2 Enrichment Analysis
First, let's get the official interactor symbols of our gene IDs, since that is what Enrichr needs. For the conversion, we can use the `GeneDB` class from our own `genes` [package](genes).

In [3]:
from genes import GeneDB

gene_db = GeneDB("data/biogrid.txt")

### Predicted Disease Genes

In [4]:
official_symbols = gene_db.to_official_symbol_interactor(entrez_ids)

Then let's copy the putative disease genes to our clipboard using `pyperclip`. That way we can just paste them into the [Enrichr](https://maayanlab.cloud/Enrichr/enrich) search form.

In [5]:
pyperclip.copy("\n".join(official_symbols))

### Known Disease Genes
Now let's do the same for the known Polydactyly disease genes. After running this cell, paste the output into the [Enrichr](https://maayanlab.cloud/Enrichr/enrich) search form.

In [8]:
disease_genes = pd.read_csv("seed_genes.txt", dtype=str)
known_genes_official_symbols = gene_db.to_official_symbol_interactor(disease_genes.iloc[:, 0])

pyperclip.copy("\n".join(known_genes_official_symbols))

# 4.1 Drugs
We should copy the first 20 putative genes, but these don't give us any results so we C
copy the first 100 putative genes instead, and paste them on [DGIDB](https://old.dgidb.org/search_interactions). 

In [6]:
pyperclip.copy("\n".join(official_symbols))