Notebook to annotate the list of DEGs when comparing 5 healthy and 5 sick RNAseq libraries from summer 2021. DEGlist includes the library counts. Will annotate with BLAST output and with uniprotSPGO

FILES:
1. BLAST output for 2015 Phel transcriptome: https://raw.githubusercontent.com/sr320/eimd-sswd/master/wd/Phel_uniprot_sprot.tab
2. Uniprot GO: http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/uniprot-SP-GO.sorted
3. DEGlist of healthy vs. sick: https://raw.githubusercontent.com/grace-ac/project_pycno/main/analyses/DESeq2/2015Phel/20230523-DEGlist_healthy-vs-sick_signif_5x5_counts.tab?token=GHSAT0AAAAAACC64DDE4B2NMR5W4HZQ6V7EZDNET4Q

In [1]:
pwd

'/Users/graciecrandall/Documents/GitHub/project_pycno/code'

In [2]:
wd = "../analyses/DESeq2/2015Phel/"

In [3]:
cd $wd

/Users/graciecrandall/Documents/GitHub/project_pycno/analyses/DESeq2/2015Phel


In [4]:
# curl `blast` ouptut from Phel transcriptome from Up in Arms
!curl --insecure https://raw.githubusercontent.com/sr320/eimd-sswd/master/wd/Phel_uniprot_sprot.tab \
    -o 2015.Phel.BLASTx

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  837k  100  837k    0     0   818k      0  0:00:01  0:00:01 --:--:--  823k


In [5]:
# curl uniprot-sprot sorted
!curl http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/uniprot-SP-GO.sorted -o uniprot-SP-GO.sorted

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3322  100  3322    0     0     32      0  0:01:43  0:01:43 --:--:--   830-:--     00    0     0      0      0 --:--:--  0:00:15 --:--:--     0  0     0 --:--:--  0:01:40 --:--:--     0


In [6]:
#check that curl-ed files are in directory
!ls

2015.Phel.BLASTx
20230523-DEGlist_healthy-vs-sick_signif_5x5.tab
20230523-DEGlist_healthy-vs-sick_signif_5x5_counts.tab
20230523_DEGlist_healthy-vs-sick_5x5.tab
20230523_DEGs-clusters-annot.tab
20230523_healthyvsick-PCA.png
Blastquery-GOslim.tab
DEGlist_annot_4x4_healthyvsick.tab
DEGlist_annot_4x4_tankexposed_V_sick.tab
DEGlist_annot_8x8_healthyvsick.tab
DEGlist_healthy-vs-sick.tab
DEGlist_healthy-vs-sick_8x8.tab
DEGlist_healthy-vs-sick_signif.tab
DEGlist_healthy-vs-sick_signif_8x8.tab
DEGlist_tankexposed_v_sick_4x4.tab
DEGlist_tankexposed_v_sick_4x4_pval.tab
DEGs_LESS-healthy-annot.tab
DEGs_MORE-healthy-annot.tab
DEGs_healthy-v-sick_20221031.png
GO-GOslim.sorted
PCA_4x4_healthyVsick.png
PCA_4x4_tankExposedVsick.png
PCA_8x8_healthVsick_labeled.png
PCA_8x8_healthyVsick.png
_blast-GO-unfolded.sorted
_blast-GO-unfolded.tab
_blast-annot.tab
_blast-sep.tab
_blast-sort.tab
_intermediate.file
blastout
degs_annot_tankexposed_v_sick_BPonly.csv
degs_healthyVsick_4

In [7]:
!sort -u -k1,1 --field-separator $'\t' 2015.Phel.BLASTx > blastout

In [8]:
!wc -l blastout

   10513 blastout


In [9]:
#set `blast` output file as variable
blastout="2015.Phel.BLASTx"

In [10]:
!head -2 blastout

Phel_contig_100	sp|Q16513|PKN2_HUMAN	81.33	332	61	1	7935	6940	653	983	5e-162	  537
Phel_contig_1000	sp|Q8R4U2|PDIA1_CRIGR	53.62	442	201	2	199	1512	31	472	5e-146	  464


In [11]:
#convert pipes to tab
!tr '|' '\t' < blastout \
> _blast-sep.tab

In [12]:
!head -2 _blast-sep.tab

Phel_contig_100	sp	Q16513	PKN2_HUMAN	81.33	332	61	1	7935	6940	653	983	5e-162	  537
Phel_contig_1000	sp	Q8R4U2	PDIA1_CRIGR	53.62	442	201	2	199	1512	31	472	5e-146	  464


In [14]:
#reducing number of columns and sorting 
!awk -v OFS='\t' '{print $3, $1, $13}' < _blast-sep.tab | sort \
> _blast-sort.tab

In [15]:
!head -2 _blast-sort.tab

A0AUR5	Phel_contig_24211	9e-67
A0AVT1	Phel_contig_12160	0.0


In [13]:
!head -2 uniprot-SP-GO.sorted

A0A023GPI8	LECA_CANBL	reviewed	Lectin alpha chain (CboL) [Cleaved into: Lectin beta chain; Lectin gamma chain]		Canavalia boliviana	237			mannose binding [GO:0005537]; metal ion binding [GO:0046872]	mannose binding [GO:0005537]; metal ion binding [GO:0046872]	GO:0005537; GO:0046872
A0A023GPJ0	CDII_ENTCC	reviewed	Immunity protein CdiI	cdiI ECL_04450.1	Enterobacter cloacae subsp. cloacae (strain ATCC 13047 / DSM 30054 / NBRC 13535 / NCDC 279-56)	145					


In [16]:
#joining blast with uniprot annoation file 
!join -t $'\t' \
_blast-sort.tab \
uniprot-SP-GO.sorted \
> _blast-annot.tab

In [17]:
!head -2 _blast-annot.tab

A0AUR5	Phel_contig_24211	9e-67	MINY3_DANRE	reviewed	Ubiquitin carboxyl-terminal hydrolase MINDY-3 (EC 3.4.19.12) (Deubiquitinating enzyme MINDY-3) (Protein CARP)	mindy3 carp fam188a zgc:153892	Danio rerio (Zebrafish) (Brachydanio rerio)	446	apoptotic process [GO:0006915]	nucleus [GO:0005634]	thiol-dependent ubiquitinyl hydrolase activity [GO:0036459]	nucleus [GO:0005634]; thiol-dependent ubiquitinyl hydrolase activity [GO:0036459]; apoptotic process [GO:0006915]	GO:0005634; GO:0006915; GO:0036459
A0AVT1	Phel_contig_12160	0.0	UBA6_HUMAN	reviewed	Ubiquitin-like modifier-activating enzyme 6 (Ubiquitin-activating enzyme 6) (EC 6.2.1.45) (Monocyte protein 4) (MOP-4) (Ubiquitin-activating enzyme E1-like protein 2) (E1-L2)	UBA6 MOP4 UBE1L2	Homo sapiens (Human)	1052	amygdala development [GO:0021764]; cellular response to DNA damage stimulus [GO:0006974]; dendritic spine development [GO:0060996]; hippocampus development [GO:0021766]; learning [GO:0007612]; locomotory behavior [GO:0007626]; pro