# TCRD Target Visualizations

I am personally interested in the Targets that are in the Tdark stage that are on the brink of advancing because discovering things is much more exciting, so that will be the focus of this notebook.

In [78]:
# Importing what I will be using
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style

targets = pd.read_pickle('Data/pharos_data')
generif = pd.read_pickle('Data/generif')
ortholog = pd.read_pickle('Data/ortholog')
pm_scores = pd.read_pickle('Data/pm_scores')
pt_scores = pd.read_pickle('Data/pt_scores')
ab_info = pd.read_pickle('Data/tdl_info')
style.use('seaborn-poster')


The actual defined goals of the project are to find any Tdark targets that could be ready to achieve a new TDL level, and then potentially find new features that can also help indicate that a target is close to getting a new TDL level.

To start, I will try to find potential targets that are close to achieving a new TDL level, and then compare those targets to each other to see if there are similar trends among those targets in the TCDR DB to see if I can find any new features which can also indicate a potential change in TDL level.

In [79]:
# Extracting only relevant information

targets = targets[targets['tdl'] == 'Tdark'].reset_index(drop=True)

pm_scores = pm_scores[pm_scores['protein_id'].isin(targets['id'])].reset_index(drop=True)
pt_scores = pt_scores[pt_scores['protein_id'].isin(targets['id'])].reset_index(drop=True)
generif = generif[generif['protein_id'].isin(targets['id'])].reset_index(drop=True)
ortholog = ortholog[ortholog['protein_id'].isin(targets['id'])].reset_index(drop=True)

I have filtered out all the data sets to only contain information on proteins that are Tdark, which will make it easier to compare and visualize

In [80]:
targets.head()

Unnamed: 0,id,name,ttype,description,comment,tdl,idg,fam,famext
0,2,Uncharacterized protein C7orf77,Single Protein,,,Tdark,0,,
1,3,Uncharacterized protein C8orf34,Single Protein,,,Tdark,0,,
2,5,Uncharacterized protein C8orf76,Single Protein,,,Tdark,0,,
3,10,Cyclic nucleotide-binding domain-containing pr...,Single Protein,,,Tdark,0,,
4,11,Cyclic nucleotide-binding domain-containing pr...,Single Protein,,,Tdark,0,,


In [81]:
pm_scores.head()

Unnamed: 0,id,protein_id,year,score
0,476,16473,2010,0.666667
1,477,16473,2012,0.8
2,478,16473,2015,1.0
3,479,16473,2019,0.033333
4,708,14553,2019,1.607143


In [82]:
pt_scores.head()

Unnamed: 0,id,protein_id,year,score
0,258,2594,2015,0.333333
1,259,2594,2016,0.166667
2,260,2594,2018,0.142857
3,261,9038,1974,1.0
4,262,9038,1975,1.0


In [83]:
generif.head()

Unnamed: 0,id,protein_id,pubmed_ids,text,years
0,68,5,20379614,Clinical trial of gene-disease association and...,
1,109,10,20379614,Clinical trial of gene-disease association and...,
2,110,10,18825932,Observational study of gene-disease associatio...,2008
3,111,12,20877624|16385451,Observational study of gene-disease associatio...,2010|2006
4,255,26,19864490,Consortin is a trans-Golgi network cargo recep...,2010


In [84]:
ortholog.head()

Unnamed: 0,id,protein_id,taxid,species,db_id,geneid,symbol,name,mod_url,sources
0,178771,3,9598,Chimp,VGNC:3030,464223.0,C8H8orf34,chromosome 8 C8orf34 homolog,,"Inparanoid, OMA, EggNOG"
1,178772,3,9544,Macaque,,705490.0,C8H8orf34,"chromosome 8 open reading frame, human C8orf34",,"Inparanoid, OMA, EggNOG"
2,178773,3,10090,Mouse,MGI:2444149,320492.0,A830018L16Rik,RIKEN cDNA A830018L16 gene,http://www.informatics.jax.org/marker/MGI:2444149,"Inparanoid, OMA, EggNOG"
3,178774,3,10116,Rat,RGD:1564053,500390.0,RGD1564053,similar to hypothetical protein,http://rgd.mcw.edu/rgdweb/report/gene/main.htm...,"Inparanoid, OMA, EggNOG"
4,178775,3,9615,Dog,VGNC:52544,477905.0,C29H8orf34,chromosome 29 C8orf34 homolog,,"Inparanoid, OMA, EggNOG"


# Looking at Tdark proteins