Not all proteins listed in the paper 'Patterns of Conservation and Diversification in the Fungal Polarization Network' are present or have a clear functional analogue in *Saccharomyces cerevisiae*, making it impossible to directly integrate them into the interaction network that we get from SGD. This code is aimed at finding the which *Saccharomyces* proteins these proteins interact with.

The proteins/genes that do not occur in Saccharomyces are: 

**SEPA FOR3 RAC1 SEPDIA**

## SEPA

SEPA is a gene *Neurospora crassa* that has no identified orthologue in *Saccharomyces cerevisiae* (based on the FungiDB database, https://fungidb.org/fungidb/app/record/gene/UMAG_01141#category:taxonomy) (**Actually, SEPA may be an ortologue of BNI1**). The interactions of SEPA with other proteins/genes in *Neurospora crassa* can be found on the STRINGS database ()

In [211]:
import pandas as pd
import numpy as np
import csv

Interactions_Datafile = pd.ExcelFile('Interactions_Non_Saccharomyces/SEPA_Interactions.xlsx')
Interactions_SEPA = pd.read_excel(Interactions_Datafile, header=None)

Gene_ID = []

for i in range(0,len(Interactions_SEPA),1):
    
    Indx = Interactions_SEPA.iloc[i,:].str.contains('ncr')
    Indx = Indx == 1
    Value = Interactions_SEPA.loc[i,Indx].tolist()
    
    if Value!=[]:
        Gene_ID.append(Value[0])
    
print(Gene_ID)


with open('Interactions_Non_Saccharomyces\Gene_ID_Sepa_Int.tsv', 'w', newline='') as f_output:
    tsv_output = csv.writer(f_output, delimiter='\t')
    tsv_output.writerow(Gene_ID)


['ncr:NCU00440', 'ncr:NCU03485', 'ncr:NCU03563', 'ncr:NCU03115', 'ncr:NCU03092', 'ncr:NCU02393', 'ncr:NCU04247', 'ncr:NCU04173', 'ncr:NCU10927', 'ncr:NCU05206', 'ncr:NCU06397', 'ncr:NCU06454', 'ncr:NCU06493', 'ncr:NCU09132', 'ncr:NCU06729', 'ncr:NCU06943', 'ncr:NCU01431', 'ncr:NCU08468', 'ncr:NCU08840', 'ncr:NCU09468', 'ncr:NCU09696', 'ncr:NCU01484']


## FOR3

FOR3 is a gene from *Saccharomyces Pombe* and has no identified orthologue in *Saccharomyces cerevisiae* (https://www.pombase.org/gene/SPCC895.05). Here I will attempt to find which of the genes from our (sub) polarity network FOR3 interacts with.

In [119]:
import pandas as pd
import csv

# Import the physical and genetic interactors of FOR3 from separate tsv files. These tsv files contain the
# Budding yeast orthologs for each interactor of FOR3

Physical_Int_FOR3 = pd.read_csv('Interactions_Non_Saccharomyces/Physical_Interactions_FOR3.tsv',sep='\t')
Genetic_Int_FOR3 = pd.read_csv('Interactions_Non_Saccharomyces/Genetic_Interactions_FOR3.tsv',sep='\t')

#Select the column that contains the Budding yeast orthologs, ommiting the rows that have a NaN value 
# (those are the interactors of pombe that do not have a cerevisiae ortholog) 
Phys_Int_cer = Physical_Int_FOR3['Budding yeast orthologs'].dropna()
Gen_Int_cer = Genetic_Int_FOR3['Budding yeast orthologs'].dropna()

# Split the occurences where a pombe gene has 2 or more orthologues in cerevisiae 
# (this means that if an interactor of FOR3 has more than 1 orthlogue in cerevisiae, we assume that 
# FOR3 will interact with both of them)
Phys_Int_Sep = Phys_Int_cer.str.split(',',expand=True)
Gen_Int_Sep = Gen_Int_cer.str.split(',',expand=True)

# Now append the different columns of the multi-column dataframes into a single column
# First do it for the physical interactions
Phys_Int_FOR3 = pd.DataFrame({"0":Phys_Int_Sep.iloc[:,0]}) 

for i in range(1,Phys_Int_Sep.shape[1],1):
    Temp_DataFrame_Phys = pd.DataFrame({"0":Phys_Int_Sep.iloc[:,i]})
    Phys_Int_FOR3 = Phys_Int_FOR3.append(Temp_DataFrame_Phys,ignore_index=True).dropna()
    
    
# Then do it for the genetic interactions
Gen_Int_FOR3 = pd.DataFrame({"0":Gen_Int_Sep.iloc[:,0]})

for j in range(1,Gen_Int_Sep.shape[1],1):
    Temp_DataFrame_Gen = pd.DataFrame({"0":Gen_Int_Sep.iloc[:,1]})
    Gen_Int_FOR3 = Gen_Int_FOR3.append(Temp_DataFrame_Gen,ignore_index=True).dropna()

# Add a second column to the dataframes to describe the type of interaction
Phys_Int_FOR3["Interaction"] = 'Physical'
Gen_Int_FOR3["Interaction"] = 'Genetic'

# Create new dataframe that combines the physical and genetic interaction
Interactions_FOR3 = Phys_Int_FOR3
Interactions_FOR3 = Interactions_FOR3.append(Gen_Int_FOR3,ignore_index=True)

print(Interactions_FOR3)

# Save the interactions in as  
with open('Interactions_Non_Saccharomyces\Gene_ID_Sepa_Int.tsv', 'w', newline='') as f_output:
    tsv_output = csv.writer(f_output, delimiter='\t')
    tsv_output.writerow(Gene_ID)

         0 Interaction
0     SRP1    Physical
1     KEL2    Physical
2     CTH1    Physical
3     SEC3    Physical
4     BOI1    Physical
5    BUD14    Physical
6     MYO4    Physical
7    CDC42    Physical
8     RHO3    Physical
9     KEL1    Physical
10   TIS11    Physical
11    BOI2    Physical
12    MYO2    Physical
13    BNI1     Genetic
14    BNR1     Genetic
15    MIH1     Genetic
16   EXO70     Genetic
17    KIN1     Genetic
18    KEL2     Genetic
19    DCV1     Genetic
20    BUD4     Genetic
21   SCS22     Genetic
22   SCS22     Genetic
23    SEC8     Genetic
24   BUD14     Genetic
25    GPD1     Genetic
26    RGA2     Genetic
27    INN1     Genetic
28    KIP2     Genetic
29    MYO4     Genetic
..     ...         ...
56  RVS161     Genetic
57    DOA1     Genetic
58    SSK1     Genetic
59    HST2     Genetic
60   PHO85     Genetic
61    SET1     Genetic
62    SWC5     Genetic
63    SPP1     Genetic
64    MSS4     Genetic
65    IQG1     Genetic
66    MLC2     Genetic
67    MLC1 

## RAC1

Rac1 is a gene from *Ustilago maydis* that seems to have a function that is similar to Cdc42: the deletion of either Rac1 or Cdc42 in *Ustilago maydis* causes morphological defects, but only the deletion of both is lethal. The systematic name for Rac1 in *Ustilago maydis* is UMAG_00774 and its interactors can be found in the STRING database (https://string-db.org/cgi/network.pl?taskId=DXu1omnaoRMI, use the systematic name when searching). Downloaded the interactors from the STRING database (inccluded only the interactions from curated databases and experiments, medium confidence score:0.4).

In [1]:
import pandas as pd

Interactions = pd.read_csv("Interactions_Non_Saccharomyces/Rac1_Interactions.tsv",sep='\n',header=None)
Interactions = Interactions[0].str.split('uma:', expand=True)

print(Interactions[1])

0           None
1     UMAG_00295
2     UMAG_00356
3     UMAG_00736
4     UMAG_00774
5     UMAG_00986
6     UMAG_01141
7     UMAG_01178
8     UMAG_10803
9     UMAG_11909
10    UMAG_10145
11    UMAG_02422
12    UMAG_11476
13    UMAG_03687
14    UMAG_03864
15    UMAG_12254
16    UMAG_12272
17    UMAG_10200
18    UMAG_05693
19    UMAG_10934
20    UMAG_06013
21    UMAG_06412
Name: 1, dtype: object


These gene identifiers can be converted to their *Saccharomyces* orthologues by using the database from https://fungidb.org/fungidb/

## Drf1

Drf1 is a protein from *Ustilago maydis* that has no identified orthologue in *Saccharomyces cerevisiae* (Based on the FungiDB database,https://fungidb.org/fungidb/app/record/gene/UMAG_01141). The interactions of Drf1 with other proteins in *Ustilago maydis* can be found in the STRINGS database (https://string-db.org/cgi/network.pl?taskId=116PvqGDcUOT, only included interactions from experiments and other databases with a medium confidence score).  

In [212]:
import pandas as pd

Interactions = pd.read_csv("Interactions_Non_Saccharomyces/Drf1_Interactions.tsv",sep='\n',header=None)
Interactions = Interactions[0].str.split('uma:', expand=True)

print(Interactions[1])




0           None
1     UMAG_00295
2     UMAG_00774
3     UMAG_01141
4     UMAG_11657
5     UMAG_02494
6           None
7     UMAG_11076
8     UMAG_10663
9     UMAG_04070
10    UMAG_04411
11    UMAG_11985
12    UMAG_11792
13    UMAG_05734
14    UMAG_11232
Name: 1, dtype: object


These gene identifiers can be converted to their *Saccharomyces* orthologues by using the database from https://fungidb.org/fungidb/