# miRNA Sequencing

2019.03.13

## Check the miRNA-target GO

Under this experiment we will perform an GO analysis using the GO terms of the target genes, which miRNAs are actives.

We will use only PNRD mature datasets, as always.

Target will be predicted by [psRNATarget](http://plantgrn.noble.org/psRNATarget/analysis) (submit small RNAs) with an exception limit of 3.

GO Enrichment will performed with [PANTHER](http://pantherdb.org/) with option *Statistical overrepresentation test* with default settings. System only provides results when FDR P < 0.05.


### Table of contents

* [Loading required modules and data](#Loading-required-modules-and-data)
* [Female](#Female)
* [Male](#Male)
* [Hermaphrodite](#Hermaphrodite)


### Loading required modules and data

([go to top](#miRNA-Sequencing))

In [None]:
image_size = (8, 6)

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

from matplotlib.lines import Line2D

In [None]:
biological_process = '#CEC3A8'
celular_component = '#B23831'
molecular_function = '#532F35'

custom_lines = [Line2D([0], [0], color=biological_process, lw=4),
                Line2D([0], [0], color=celular_component, lw=4),
                Line2D([0], [0], color=molecular_function, lw=4)]

legend_labbels = ['Biological process', 'Celular component', 'Molecular function']

Data will be importted from the existing table.

As we only want PNRD mature, we will drop all the others columns and rename the remaining ones to more clear names.

In [None]:
mirna = pd.read_csv('../2019_miRNA_sequencing/all_miRNAs.csv',
                    sep = '\t',
                    header = 0,
                    index_col = 0)

mirna = mirna.drop(columns = ['FB_pnrd_m', 'FEF_pnrd_m', 'FH_pnrd_m', 'MB_pnrd_m', 'MEF_pnrd_m',
       'MH_pnrd_m', 'TNB_pnrd_m', 'TNEF_pnrd_m', 'TNH_pnrd_m', 'FB_mirbase_m',
       'FEF_mirbase_m', 'FH_mirbase_m', 'MB_mirbase_m', 'MEF_mirbase_m',
       'MH_mirbase_m', 'TNB_mirbase_m', 'TNEF_mirbase_m', 'TNH_mirbase_m',
       'FB_mirbase_p', 'FEF_mirbase_p', 'FH_mirbase_p', 'MB_mirbase_p',
       'MEF_mirbase_p', 'MH_mirbase_p', 'TNB_mirbase_p', 'TNEF_mirbase_p',
       'TNH_mirbase_p', 'relevant'])

mirna.columns = ['FB', 'FEF', 'FH', 'MB', 'MEF', 'MH', 'TNB', 'TNEF', 'TNH']
mirna.head()

### Female

([go to top](#miRNA-Sequencing))

In this segment we will perform the analysis for the miRNAs that are being expressed (normalized count >= 10) on Female samples.

In [None]:
fb_active_mirnas = mirna[mirna['FB'] > 10].index
fef_active_mirnas = mirna[mirna['FEF'] > 10].index
fh_active_mirnas = mirna[mirna['FH'] > 10].index

print('''
    Number of miRNAs on FB: {}
    Number of miRNAs on FEF: {}
    Number of miRNAs on FH: {}
    '''.format(len(fb_active_mirnas), len(fef_active_mirnas), len(fh_active_mirnas)))

#### Female B

List of miRNAs active on this sample

In [None]:
with open('mirnas_active_fb.txt', 'w') as file:
    file.write("\n".join(fb_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
fb_targets = pd.read_csv('psRNATarget_fb.txt',
                         sep = '\t',
                         header = 1
                        )

fb_targets['gene_name'] = [target.split('.')[0] for target in fb_targets['Target_Acc.'].tolist()]

fb_targets = fb_targets[['gene_name']].drop_duplicates()

with open('targets_fb.txt', 'w') as file:
    file.write("\n".join(fb_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
fb_biological_process = pd.read_csv('panther_fb_bp.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fb_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fb_biological_process['color'] = biological_process
fb_biological_process = fb_biological_process.sort_values('Fold Enrichment')

In [None]:
fb_biological_process

In [None]:
fb_celular_component = pd.read_csv('panther_fb_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fb_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fb_celular_component['color'] = celular_component
fb_celular_component = fb_celular_component.sort_values('Fold Enrichment')

In [None]:
fb_molecular_function = pd.read_csv('panther_fb_mf.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fb_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fb_molecular_function['color'] = molecular_function
fb_molecular_function = fb_molecular_function.sort_values('Fold Enrichment')

In [None]:
fb_go = pd.concat([fb_biological_process,
                   fb_celular_component,
                   fb_molecular_function
                  ],
                  ignore_index=True
                 )

fb_go = fb_go[fb_go['GO'] != 'Unclassified (UNCLASSIFIED)']
fb_go['GO_term'] = [go[1].replace(')', '') for go in fb_go.GO.str.split('(').tolist()]

In [None]:
#Prepare plots
from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})

In [None]:
plt.figure(figsize = image_size)
plt.barh(fb_go['GO_term'], fb_go['Fold Enrichment'], color = fb_go['color'])
plt.title('Female B')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_fb.png', dpi=600)
plt.show()

#### Female E/F

List of miRNAs active on stage E/F from females.

In [None]:
with open('mirnas_active_fef.txt', 'w') as file:
    file.write("\n".join(fef_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
fef_targets = pd.read_csv('psRNATarget_fef.txt',
                         sep = '\t',
                         header = 1
                        )

fef_targets['gene_name'] = [target.split('.')[0] for target in fef_targets['Target_Acc.'].tolist()]

fef_targets = fef_targets[['gene_name']].drop_duplicates()

with open('targets_fef.txt', 'w') as file:
    file.write("\n".join(fef_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
fef_biological_process = pd.read_csv('panther_fef_bp.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fef_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fef_biological_process['color'] = biological_process
fef_biological_process = fef_biological_process.sort_values('Fold Enrichment')

In [None]:
fef_celular_component = pd.read_csv('panther_fef_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fef_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fef_celular_component['color'] = celular_component
fef_celular_component = fef_celular_component.sort_values('Fold Enrichment')

In [None]:
fef_molecular_function = pd.read_csv('panther_fef_mf.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fef_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fef_molecular_function['color'] = molecular_function
fef_molecular_function = fef_molecular_function.sort_values('Fold Enrichment')

In [None]:
fef_go = pd.concat([fef_biological_process,
                    fef_celular_component,
                    fef_molecular_function
                   ],
                   ignore_index=True
                  )

fef_go = fef_go[fef_go['GO'] != 'Unclassified (UNCLASSIFIED)']
fef_go['GO_term'] = [go[1].replace(')', '') for go in fef_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(fef_go['GO_term'], fef_go['Fold Enrichment'], color = fef_go['color'])
plt.title('Female E/F')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_fef.png', dpi=600)
plt.show()

#### Female H

List of miRNAs active on the last stage of Females

In [None]:
with open('mirnas_active_fh.txt', 'w') as file:
    file.write("\n".join(fh_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
fh_targets = pd.read_csv('psRNATarget_fh.txt',
                         sep = '\t',
                         header = 1
                        )

fh_targets['gene_name'] = [target.split('.')[0] for target in fh_targets['Target_Acc.'].tolist()]

fh_targets = fh_targets[['gene_name']].drop_duplicates()

with open('targets_fh.txt', 'w') as file:
    file.write("\n".join(fh_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
fh_biological_process = pd.read_csv('panther_fh_bp.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fh_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fh_biological_process['color'] = biological_process
fh_biological_process = fh_biological_process.sort_values('Fold Enrichment')

In [None]:
fh_celular_component = pd.read_csv('panther_fh_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fh_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fh_celular_component['color'] = celular_component
fh_celular_component = fh_celular_component.sort_values('Fold Enrichment')

In [None]:
fh_molecular_function = pd.read_csv('panther_fh_mf.txt',
                                    sep = '\t',
                                    header = 6
                                   )

fh_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

fh_molecular_function['color'] = molecular_function
fh_molecular_function = fh_molecular_function.sort_values('Fold Enrichment')

In [None]:
fh_go = pd.concat([fh_biological_process,
                   fh_celular_component,
                   fh_molecular_function
                  ],
                  ignore_index=True
                 )

fh_go = fh_go[fh_go['GO'] != 'Unclassified (UNCLASSIFIED)']
fh_go['GO_term'] = [go[1].replace(')', '') for go in fh_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(fh_go['GO_term'], fh_go['Fold Enrichment'], color = fh_go['color'])
plt.title('Female H')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_fh.png', dpi=600)
plt.show()

### Male

([go to top](#miRNA-Sequencing))

In this segment we will perform the analysis for the miRNAs that are being expressed (normalized count >= 10) on Male samples.

In [None]:
mb_active_mirnas = mirna[mirna['MB'] > 10].index
mef_active_mirnas = mirna[mirna['MEF'] > 10].index
mh_active_mirnas = mirna[mirna['MH'] > 10].index

print('''
    Number of miRNAs on MB: {}
    Number of miRNAs on MEF: {}
    Number of miRNAs on MH: {}
    '''.format(len(mb_active_mirnas), len(mef_active_mirnas), len(mh_active_mirnas)))

#### Male B

List of miRNAs active on this sample

In [None]:
with open('mirnas_active_mb.txt', 'w') as file:
    file.write("\n".join(mb_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
mb_targets = pd.read_csv('psRNATarget_mb.txt',
                         sep = '\t',
                         header = 1
                        )

mb_targets['gene_name'] = [target.split('.')[0] for target in mb_targets['Target_Acc.'].tolist()]

mb_targets = mb_targets[['gene_name']].drop_duplicates()

with open('targets_mb.txt', 'w') as file:
    file.write("\n".join(mb_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
mb_biological_process = pd.read_csv('panther_mb_bp.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mb_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mb_biological_process['color'] = biological_process
mb_biological_process = mb_biological_process.sort_values('Fold Enrichment')

In [None]:
mb_celular_component = pd.read_csv('panther_mb_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mb_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mb_celular_component['color'] = celular_component
mb_celular_component = mb_celular_component.sort_values('Fold Enrichment')

In [None]:
mb_molecular_function = pd.read_csv('panther_mb_mf.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mb_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mb_molecular_function['color'] = molecular_function
mb_molecular_function = mb_molecular_function.sort_values('Fold Enrichment')

In [None]:
mb_go = pd.concat([mb_biological_process,
                   mb_celular_component,
                   mb_molecular_function
                  ],
                  ignore_index=True
                 )

mb_go = mb_go[mb_go['GO'] != 'Unclassified (UNCLASSIFIED)']
mb_go['GO_term'] = [go[1].replace(')', '') for go in mb_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(mb_go['GO_term'], mb_go['Fold Enrichment'], color = mb_go['color'])
plt.title('Male B')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_mb.png', dpi=600)
plt.show()

#### Male E/F

List of miRNAs active on stage E/F from males.

In [None]:
with open('mirnas_active_mef.txt', 'w') as file:
    file.write("\n".join(mef_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
mef_targets = pd.read_csv('psRNATarget_mef.txt',
                         sep = '\t',
                         header = 1
                        )

mef_targets['gene_name'] = [target.split('.')[0] for target in mef_targets['Target_Acc.'].tolist()]

mef_targets = mef_targets[['gene_name']].drop_duplicates()

with open('targets_mef.txt', 'w') as file:
    file.write("\n".join(mef_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
mef_biological_process = pd.read_csv('panther_mef_bp.txt',
                                     sep = '\t',
                                     header = 6
                                    )

mef_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                  'Fold Enrichment', 'P-value', 'FDR'
                                 ]

mef_biological_process['color'] = biological_process
mef_biological_process = mef_biological_process.sort_values('Fold Enrichment')

In [None]:
mef_celular_component = pd.read_csv('panther_mef_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mef_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mef_celular_component['color'] = celular_component
mef_celular_component = mef_celular_component.sort_values('Fold Enrichment')

In [None]:
mef_molecular_function = pd.read_csv('panther_mef_mf.txt',
                                     sep = '\t',
                                     header = 6
                                    )

mef_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                  'Fold Enrichment', 'P-value', 'FDR'
                                 ]

mef_molecular_function['color'] = molecular_function
mef_molecular_function = mef_molecular_function.sort_values('Fold Enrichment')

In [None]:
mef_go = pd.concat([mef_biological_process,
                    mef_celular_component,
                    mef_molecular_function
                   ],
                   ignore_index=True
                  )

mef_go = mef_go[mef_go['GO'] != 'Unclassified (UNCLASSIFIED)']
mef_go['GO_term'] = [go[1].replace(')', '') for go in mef_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(mef_go['GO_term'], mef_go['Fold Enrichment'], color = mef_go['color'])
plt.title('Male E/F')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_mef.png', dpi=600)
plt.show()

#### Male H

List of miRNAs active on the last stage of males

In [None]:
with open('mirnas_active_mh.txt', 'w') as file:
    file.write("\n".join(mh_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
mh_targets = pd.read_csv('psRNATarget_mh.txt',
                         sep = '\t',
                         header = 1
                        )

mh_targets['gene_name'] = [target.split('.')[0] for target in mh_targets['Target_Acc.'].tolist()]

mh_targets = mh_targets[['gene_name']].drop_duplicates()

with open('targets_mh.txt', 'w') as file:
    file.write("\n".join(mh_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
mh_biological_process = pd.read_csv('panther_mh_bp.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mh_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mh_biological_process['color'] = biological_process
mh_biological_process = mh_biological_process.sort_values('Fold Enrichment')

In [None]:
mh_celular_component = pd.read_csv('panther_mh_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mh_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mh_celular_component['color'] = celular_component
mh_celular_component = mh_celular_component.sort_values('Fold Enrichment')

In [None]:
mh_molecular_function = pd.read_csv('panther_mh_mf.txt',
                                    sep = '\t',
                                    header = 6
                                   )

mh_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

mh_molecular_function['color'] = molecular_function
mh_molecular_function = mh_molecular_function.sort_values('Fold Enrichment')

In [None]:
mh_go = pd.concat([mh_biological_process,
                   mh_celular_component,
                   mh_molecular_function
                  ],
                  ignore_index=True
                 )

mh_go = mh_go[mh_go['GO'] != 'Unclassified (UNCLASSIFIED)']
mh_go['GO_term'] = [go[1].replace(')', '') for go in mh_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(mh_go['GO_term'], mh_go['Fold Enrichment'], color = mh_go['color'])
plt.title('Male H')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_mh.png', dpi=600)
plt.show()

### Hermaphrodite

([go to top](#miRNA-Sequencing))

In this segment we will perform the analysis for the miRNAs that are being expressed (normalized count >= 10) on Hermaphrodite (Touriga Nacional - TN) samples.

In [None]:
tnb_active_mirnas = mirna[mirna['TNB'] > 10].index
tnef_active_mirnas = mirna[mirna['TNEF'] > 10].index
tnh_active_mirnas = mirna[mirna['TNH'] > 10].index

print('''
    Number of miRNAs on TNB: {}
    Number of miRNAs on TNEF: {}
    Number of miRNAs on TNH: {}
    '''.format(len(tnb_active_mirnas), len(tnef_active_mirnas), len(tnh_active_mirnas)))

#### TN B

List of miRNAs active on this sample

In [None]:
with open('mirnas_active_tnb.txt', 'w') as file:
    file.write("\n".join(tnb_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
tnb_targets = pd.read_csv('psRNATarget_tnb.txt',
                          sep = '\t',
                          header = 1
                         )

tnb_targets['gene_name'] = [target.split('.')[0] for target in tnb_targets['Target_Acc.'].tolist()]

tnb_targets = tnb_targets[['gene_name']].drop_duplicates()

with open('targets_tnb.txt', 'w') as file:
    file.write("\n".join(tnb_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
tnb_biological_process = pd.read_csv('panther_tnb_bp.txt',
                                     sep = '\t',
                                     header = 6
                                    )

tnb_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                  'Fold Enrichment', 'P-value', 'FDR'
                                 ]

tnb_biological_process['color'] = biological_process
tnb_biological_process = tnb_biological_process.sort_values('Fold Enrichment')

In [None]:
tnb_celular_component = pd.read_csv('panther_tnb_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

tnb_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

tnb_celular_component['color'] = celular_component
tnb_celular_component = tnb_celular_component.sort_values('Fold Enrichment')

In [None]:
tnb_molecular_function = pd.read_csv('panther_tnb_mf.txt',
                                     sep = '\t',
                                     header = 6
                                    )

tnb_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

tnb_molecular_function['color'] = molecular_function
tnb_molecular_function = tnb_molecular_function.sort_values('Fold Enrichment')

In [None]:
tnb_go = pd.concat([tnb_biological_process,
                    tnb_celular_component,
                    tnb_molecular_function
                   ],
                   ignore_index=True
                  )

tnb_go = tnb_go[tnb_go['GO'] != 'Unclassified (UNCLASSIFIED)']
tnb_go['GO_term'] = [go[1].replace(')', '') for go in tnb_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(tnb_go['GO_term'], tnb_go['Fold Enrichment'], color = tnb_go['color'])
plt.title('Hermaphrodite B')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_tnb.png', dpi=600)
plt.show()

#### TN E/F

List of miRNAs active on stage E/F from hermaphrodites.

In [None]:
with open('mirnas_active_tnef.txt', 'w') as file:
    file.write("\n".join(tnef_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
tnef_targets = pd.read_csv('psRNATarget_tnef.txt',
                           sep = '\t',
                           header = 1
                          )

tnef_targets['gene_name'] = [target.split('.')[0] for target in tnef_targets['Target_Acc.'].tolist()]

tnef_targets = tnef_targets[['gene_name']].drop_duplicates()

with open('targets_tnef.txt', 'w') as file:
    file.write("\n".join(tnef_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
tnef_biological_process = pd.read_csv('panther_tnef_bp.txt',
                                      sep = '\t',
                                      header = 6
                                     )

tnef_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                   'Fold Enrichment', 'P-value', 'FDR'
                                  ]

tnef_biological_process['color'] = biological_process
tnef_biological_process = tnef_biological_process.sort_values('Fold Enrichment')

In [None]:
tnef_celular_component = pd.read_csv('panther_tnef_cc.txt',
                                     sep = '\t',
                                     header = 6
                                    )

tnef_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                  'Fold Enrichment', 'P-value', 'FDR'
                                 ]

tnef_celular_component['color'] = celular_component
tnef_celular_component = tnef_celular_component.sort_values('Fold Enrichment')

In [None]:
tnef_molecular_function = pd.read_csv('panther_tnef_mf.txt',
                                      sep = '\t',
                                      header = 6
                                     )

tnef_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                   'Fold Enrichment', 'P-value', 'FDR'
                                  ]

tnef_molecular_function['color'] = molecular_function
tnef_molecular_function = tnef_molecular_function.sort_values('Fold Enrichment')

In [None]:
tnef_go = pd.concat([tnef_biological_process,
                     tnef_celular_component,
                     tnef_molecular_function
                    ],
                    ignore_index=True
                   )

tnef_go = tnef_go[tnef_go['GO'] != 'Unclassified (UNCLASSIFIED)']
tnef_go['GO_term'] = [go[1].replace(')', '') for go in tnef_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(tnef_go['GO_term'], tnef_go['Fold Enrichment'], color = tnef_go['color'])
plt.title('Hermaphrodite E/F')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_tnef.png', dpi=600)
plt.show()

#### TN H

List of miRNAs active on the last stage of hermaphrodites

In [None]:
with open('mirnas_active_tnh.txt', 'w') as file:
    file.write("\n".join(tnh_active_mirnas.tolist()).replace('vvi-mir', 'vvi-miR'))

In [None]:
tnh_targets = pd.read_csv('psRNATarget_tnh.txt',
                          sep = '\t',
                          header = 1
                         )

tnh_targets['gene_name'] = [target.split('.')[0] for target in tnh_targets['Target_Acc.'].tolist()]

tnh_targets = tnh_targets[['gene_name']].drop_duplicates()

with open('targets_tnh.txt', 'w') as file:
    file.write("\n".join(tnh_targets['gene_name'].tolist()).replace('VIT_2', 'VIT_'))

In [None]:
tnh_biological_process = pd.read_csv('panther_tnh_bp.txt',
                                     sep = '\t',
                                     header = 6
                                    )

tnh_biological_process.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                  'Fold Enrichment', 'P-value', 'FDR'
                                 ]

tnh_biological_process['color'] = biological_process
tnh_biological_process = tnh_biological_process.sort_values('Fold Enrichment')

In [None]:
tnh_celular_component = pd.read_csv('panther_tnh_cc.txt',
                                    sep = '\t',
                                    header = 6
                                   )

tnh_celular_component.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

tnh_celular_component['color'] = celular_component
tnh_celular_component = tnh_celular_component.sort_values('Fold Enrichment')

In [None]:
tnh_molecular_function = pd.read_csv('panther_tnh_mf.txt',
                                     sep = '\t',
                                     header = 6
                                    )

tnh_molecular_function.columns = ['GO', 'Ref list', 'Sample list', 'Expected', 'Over/under',
                                 'Fold Enrichment', 'P-value', 'FDR'
                                ]

tnh_molecular_function['color'] = molecular_function
tnh_molecular_function = tnh_molecular_function.sort_values('Fold Enrichment')

In [None]:
tnh_go = pd.concat([tnh_biological_process,
                    tnh_celular_component,
                    tnh_molecular_function
                   ],
                   ignore_index=True
                  )

tnh_go = tnh_go[tnh_go['GO'] != 'Unclassified (UNCLASSIFIED)']
tnh_go['GO_term'] = [go[1].replace(')', '') for go in tnh_go.GO.str.split('(').tolist()]

In [None]:
plt.figure(figsize = image_size)
plt.barh(tnh_go['GO_term'], tnh_go['Fold Enrichment'], color = tnh_go['color'])
plt.title('Hermaphrodite H')
plt.xlabel('Fold Enrichment')
plt.legend(custom_lines[::-1], legend_labbels[::-1])
plt.savefig('plot_tnh.png', dpi=600)
plt.show()

## GO TERMS TABLE

In [None]:
# list all gos
go_list = []
go_list = go_list + [go[1].replace(')', '') for go in fb_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in fef_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in fh_go.GO.str.split('(').tolist()]

go_list = go_list + [go[1].replace(')', '') for go in mb_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in mef_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in mh_go.GO.str.split('(').tolist()]

go_list = go_list + [go[1].replace(')', '') for go in tnb_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in tnef_go.GO.str.split('(').tolist()]
go_list = go_list + [go[1].replace(')', '') for go in tnh_go.GO.str.split('(').tolist()]

go_list = list(set(go_list))
go_list.sort()
go_list

In [None]:
import requests
from multiprocessing import Pool
import time

In [None]:
start_time = time.time()
go_table = []
for go in go_list:
    results = requests.get(f'https://www.ebi.ac.uk/QuickGO/services/ontology/go/terms/{go}')
    go_info = results.json()
    go_table.append([go, go_info['results'][0]['name'].capitalize(), go_info['results'][0]['aspect']])
elapsed_time = time.time() - start_time
print(elapsed_time)

In [None]:
go_table

In [None]:
start_time = time.time()

def gogo(go):
    results = requests.get(f'https://www.ebi.ac.uk/QuickGO/services/ontology/go/terms/{go}')
    go_info = results.json()
    return [go, go_info['results'][0]['name'].capitalize(), go_info['results'][0]['aspect'].replace('_', ' ').capitalize()]

pool = Pool()
go_table = list(pool.map(gogo, go_list))

elapsed_time = time.time() - start_time
print(elapsed_time)

In [None]:
go_table

In [None]:
go_df = pd.DataFrame(
    go_table,
    columns =['GO', 'Name', 'Aspect']
)
go_df

In [None]:
go_df = go_df.sort_values(['Aspect', 'GO'])
go_df

In [None]:
go_df.to_csv(
    'go_table.csv',
    sep = '\t',
    index = None
)