## Input data construction notebook
This notebook can be shared within the lab but is basically for pre-processing the data. There is no need to re-run this, but we include the notebook for completeness. Further, much of the analysis here can only be run using private tools associated with the Holehouse lab's internal stack. We include this notebook here for completeness. 

## dependencies
This notbeook uses a few different packages:

1. [**protfasta**](https://github.com/holehouse-lab/protfasta/) - package for reading/writing FASTA files. Installable via

        pip install protfasta

   
2. [**metapredict**](https://github.com/idptools/metapredict) - package for predicting disorder and structure from sequence. Installable via

        pip install metapredict


3. **housetools** - housetools is a private Python package developed by the Holehouse lab for our internal tooling. This is not publically accessible.

4. **yeastevo** - yeastevo is a private Python package developed by the Holehouse lab for organize data from the 2012 (version 7) of the Yeast Genome Order Browser. 



In [14]:
# publica packages
import protfasta
import metapredict as meta
from tqdm.auto import tqdm 

In [2]:
# private packages
from yeastevo import Pillars
from housetools.sequence_tools.sequence_conservation import ConservationCalculator

# activate and pre-load the conservation object
CC = ConservationCalculator()

# build and read in the sequence information
fungi_matrix = Pillars()

# get the set of IDs and read in alligned sequences
all_valid_IDs = fungi_matrix.all_aligned_scerevisiae_YSNs()


Reading in all proteomes....
... DONE!
Reading in 5436 separate FASTA files - this may take 30-40 seconds...
... DONE!


In [4]:
def return_sc_conservation_score(sequence_dict, CC):
    """
    Function which takes in a sequence dictionary and returns a list of per-residue
    conservation scores aligned via the S. cerevisiae info
    """

    k = list(sequence_dict.keys())
    
    for i in k:
        if i.split('_')[1] == 'Scerevisiae':
            target = i            
            break
    return (target, CC.calculate_conservation(sequence_dict, target=target, gap_cutoff=0.999).target_normalized_scores)
        
    

In [30]:
%%capture
# we include %%capture here because internally this function generates a lot of noise.
# worth noting this cell will take ~5 minuts to run.

output_seqs = {}
output_conservation_scores = {}
ortholog_count = {}
for ysn in tqdm(all_valid_IDs):    
    
    full_seq = fungi_matrix.get_orthologs(ysn)['Scerevisiae'][0][1]
    
    seq_dict = fungi_matrix.get_aligned_scerevisiae_sequences(ysn)
    
    (target_id,cons_scores) = return_sc_conservation_score(seq_dict, CC)
    output_seqs[target_id] = full_seq
    output_conservation_scores[target_id] = cons_scores

    ortholog_count[target_id] = len(seq_dict)
    

YPR101W
No hits found in S. cerevisiae Position 2
YCL068C
No hits found in S. cerevisiae Position 2
YPL094C
No hits found in S. cerevisiae Position 2
YMR124W
No hits found in S. cerevisiae Position 1
YJL171C
No hits found in S. cerevisiae Position 2
YBR200W
No hits found in S. cerevisiae Position 2
YOR069W
No hits found in S. cerevisiae Position 1
YMR032W
No hits found in S. cerevisiae Position 2
YDR096W
No hits found in S. cerevisiae Position 2
YDR194C
No hits found in S. cerevisiae Position 2
YML014W
No hits found in S. cerevisiae Position 2
YGR261C
No hits found in S. cerevisiae Position 2
YDR520C
No hits found in S. cerevisiae Position 2
YGL154C
No hits found in S. cerevisiae Position 2
YDR113C
No hits found in S. cerevisiae Position 2
YER069W
No hits found in S. cerevisiae Position 2
YDR079W
No hits found in S. cerevisiae Position 2
YGL106W
No hits found in S. cerevisiae Position 2
YCL054W
No hits found in S. cerevisiae Position 2
YGR118W
No hits found in S. cerevisiae Position 2


YNL259C
No hits found in S. cerevisiae Position 2
YBR171W
No hits found in S. cerevisiae Position 2
YGL014W
No hits found in S. cerevisiae Position 2
YML117W
No hits found in S. cerevisiae Position 2
YOL124C
No hits found in S. cerevisiae Position 2
YBR101C
No hits found in S. cerevisiae Position 2
YMR220W
No hits found in S. cerevisiae Position 2
YEL031W
No hits found in S. cerevisiae Position 2
YAL011W
No hits found in S. cerevisiae Position 2
YPR042C
No hits found in S. cerevisiae Position 1
YGR104C
No hits found in S. cerevisiae Position 2
YGL086W
No hits found in S. cerevisiae Position 2
YOR047C
No hits found in S. cerevisiae Position 1
YNL221C
No hits found in S. cerevisiae Position 2
YJL093C
No hits found in S. cerevisiae Position 2
YBR172C
No hits found in S. cerevisiae Position 2
YML104C
No hits found in S. cerevisiae Position 2
YLR133W
No hits found in S. cerevisiae Position 1
YOR049C
No hits found in S. cerevisiae Position 2
YIL097W
No hits found in S. cerevisiae Position 2


YOR308C
No hits found in S. cerevisiae Position 2
YLL027W
No hits found in S. cerevisiae Position 2
YML032C
No hits found in S. cerevisiae Position 2
YPR116W
No hits found in S. cerevisiae Position 2
YER114C
No hits found in S. cerevisiae Position 1
YPL010W
No hits found in S. cerevisiae Position 2
YOR247W
No hits found in S. cerevisiae Position 2
YOL065C
No hits found in S. cerevisiae Position 2
YPR144C
No hits found in S. cerevisiae Position 2
YKL029C
No hits found in S. cerevisiae Position 2
YLR016C
No hits found in S. cerevisiae Position 2
YHL020C
No hits found in S. cerevisiae Position 2
YLR222C
No hits found in S. cerevisiae Position 2
YOR113W
No hits found in S. cerevisiae Position 2
YLR179C
No hits found in S. cerevisiae Position 2
YLR382C
No hits found in S. cerevisiae Position 2
YMR273C
No hits found in S. cerevisiae Position 1
YML050W
No hits found in S. cerevisiae Position 2
YHR116W
No hits found in S. cerevisiae Position 2
YGL227W
No hits found in S. cerevisiae Position 2


YGR262C
No hits found in S. cerevisiae Position 2
YOR332W
No hits found in S. cerevisiae Position 2
YGL044C
No hits found in S. cerevisiae Position 2
YLR373C
No hits found in S. cerevisiae Position 1
YEL037C
No hits found in S. cerevisiae Position 2
YGR199W
No hits found in S. cerevisiae Position 2
YGL017W
No hits found in S. cerevisiae Position 2
YDL074C
No hits found in S. cerevisiae Position 2
YPL063W
No hits found in S. cerevisiae Position 2
YJL138C
No hits found in S. cerevisiae Position 2
YPL219W
No hits found in S. cerevisiae Position 1
YCR042C
No hits found in S. cerevisiae Position 2
YOL061W
No hits found in S. cerevisiae Position 2
YML062C
No hits found in S. cerevisiae Position 2
YLR435W
No hits found in S. cerevisiae Position 2
YKL068W
No hits found in S. cerevisiae Position 2
YLR192C
No hits found in S. cerevisiae Position 2
YLR099C
No hits found in S. cerevisiae Position 1
YGR006W
No hits found in S. cerevisiae Position 2
YJL063C
No hits found in S. cerevisiae Position 2


YDL046W
No hits found in S. cerevisiae Position 2
YPL050C
No hits found in S. cerevisiae Position 2
YOR157C
No hits found in S. cerevisiae Position 2
YAL039C
No hits found in S. cerevisiae Position 2
YOR383C
No hits found in S. cerevisiae Position 2
YHR096C
No hits found in S. cerevisiae Position 2
YGL094C
No hits found in S. cerevisiae Position 2
YOR025W
No hits found in S. cerevisiae Position 2
YOL011W
No hits found in S. cerevisiae Position 1
YHR050W
No hits found in S. cerevisiae Position 2
YLR086W
No hits found in S. cerevisiae Position 2
YDR204W
No hits found in S. cerevisiae Position 2
YOR349W
No hits found in S. cerevisiae Position 2
YDR172W
No hits found in S. cerevisiae Position 2
YOL055C
No hits found in S. cerevisiae Position 2
YGL242C
No hits found in S. cerevisiae Position 2
YIL030C
No hits found in S. cerevisiae Position 2
YBL060W
No hits found in S. cerevisiae Position 2
YOR110W
No hits found in S. cerevisiae Position 1
YBL009W
No hits found in S. cerevisiae Position 2


YCL024W
No hits found in S. cerevisiae Position 2
YNL272C
No hits found in S. cerevisiae Position 2
YDR002W
No hits found in S. cerevisiae Position 2
YFR049W
No hits found in S. cerevisiae Position 2
YJR129C
No hits found in S. cerevisiae Position 2
YIL033C
No hits found in S. cerevisiae Position 2
YNL026W
No hits found in S. cerevisiae Position 2
YDR484W
No hits found in S. cerevisiae Position 2
YMR304W
No hits found in S. cerevisiae Position 2
YOL148C
No hits found in S. cerevisiae Position 2
YGR241C
No hits found in S. cerevisiae Position 2
YOR124C
No hits found in S. cerevisiae Position 2
YFL041W
No hits found in S. cerevisiae Position 2
YFR007W
No hits found in S. cerevisiae Position 2
YOR299W
No hits found in S. cerevisiae Position 1
YFR039C
No hits found in S. cerevisiae Position 2
YIL042C
No hits found in S. cerevisiae Position 2
YJL196C
No hits found in S. cerevisiae Position 1
YKL085W
No hits found in S. cerevisiae Position 2
YHR013C
No hits found in S. cerevisiae Position 2


YKR042W
No hits found in S. cerevisiae Position 1
YMR185W
No hits found in S. cerevisiae Position 2
YIR033W
No hits found in S. cerevisiae Position 2
YMR166C
No hits found in S. cerevisiae Position 2
YGR275W
No hits found in S. cerevisiae Position 2
YHR144C
No hits found in S. cerevisiae Position 2
YDL200C
No hits found in S. cerevisiae Position 2
YOR022C
No hits found in S. cerevisiae Position 2
YJL183W
No hits found in S. cerevisiae Position 2
YGL040C
No hits found in S. cerevisiae Position 2
YOR291W
No hits found in S. cerevisiae Position 2
YBL007C
No hits found in S. cerevisiae Position 2
YBR161W
No hits found in S. cerevisiae Position 2
YPL209C
No hits found in S. cerevisiae Position 2
YDL048C
No hits found in S. cerevisiae Position 2
YDL045W-A
No hits found in S. cerevisiae Position 2
YJL129C
No hits found in S. cerevisiae Position 2
YJR065C
No hits found in S. cerevisiae Position 2
YBR256C
No hits found in S. cerevisiae Position 2
YKL094W
No hits found in S. cerevisiae Position 

YKL106W
No hits found in S. cerevisiae Position 2
YGL114W
No hits found in S. cerevisiae Position 2
YMR131C
No hits found in S. cerevisiae Position 2
YOR027W
No hits found in S. cerevisiae Position 2
YDR310C
No hits found in S. cerevisiae Position 2
YLR277C
No hits found in S. cerevisiae Position 2
YJL091C
No hits found in S. cerevisiae Position 2
YPL214C
No hits found in S. cerevisiae Position 2
YDR339C
No hits found in S. cerevisiae Position 2
YPL061W
No hits found in S. cerevisiae Position 2
YOL120C
No hits found in S. cerevisiae Position 1
YDR123C
No hits found in S. cerevisiae Position 2
YLR168C
No hits found in S. cerevisiae Position 1
YIL135C
No hits found in S. cerevisiae Position 2
YDR292C
No hits found in S. cerevisiae Position 2
YNL280C
No hits found in S. cerevisiae Position 2
YPR022C
No hits found in S. cerevisiae Position 2
YPR100W
No hits found in S. cerevisiae Position 2
YGR246C
No hits found in S. cerevisiae Position 2
YAL044C
No hits found in S. cerevisiae Position 2


YNL216W
No hits found in S. cerevisiae Position 2
YPL230W
No hits found in S. cerevisiae Position 1
YKL117W
No hits found in S. cerevisiae Position 2
YEL057C
No hits found in S. cerevisiae Position 2
YDL136W
No hits found in S. cerevisiae Position 1
YBL105C
No hits found in S. cerevisiae Position 2
YER101C
No hits found in S. cerevisiae Position 1
YER113C
No hits found in S. cerevisiae Position 2
YIL149C
No hits found in S. cerevisiae Position 2
YER073W
No hits found in S. cerevisiae Position 2
YLR332W
No hits found in S. cerevisiae Position 1
YMR236W
No hits found in S. cerevisiae Position 2
YIL017C
No hits found in S. cerevisiae Position 2
YPL260W
No hits found in S. cerevisiae Position 2
YMR126C
No hits found in S. cerevisiae Position 2
YKL157W
No hits found in S. cerevisiae Position 1
YOR255W
No hits found in S. cerevisiae Position 2
YLR107W
No hits found in S. cerevisiae Position 2
YDL049C
No hits found in S. cerevisiae Position 2
YCL064C
No hits found in S. cerevisiae Position 2


YGR162W
No hits found in S. cerevisiae Position 1
YOR361C
No hits found in S. cerevisiae Position 2
YML124C
No hits found in S. cerevisiae Position 2
YHR193C
No hits found in S. cerevisiae Position 2
YKL159C
No hits found in S. cerevisiae Position 2
YDR465C
No hits found in S. cerevisiae Position 2
YBR294W
No hits found in S. cerevisiae Position 2
YDR100W
No hits found in S. cerevisiae Position 2
YPL071C
No hits found in S. cerevisiae Position 2
YDR268W
No hits found in S. cerevisiae Position 2
YGL178W
No hits found in S. cerevisiae Position 2
YJR148W
No hits found in S. cerevisiae Position 2
YEL007W
No hits found in S. cerevisiae Position 2
YMR295C
No hits found in S. cerevisiae Position 1
YJR122W
No hits found in S. cerevisiae Position 2
YMR194W
No hits found in S. cerevisiae Position 2
YKR095W-A
No hits found in S. cerevisiae Position 2
YMR069W
No hits found in S. cerevisiae Position 2
YGR205W
No hits found in S. cerevisiae Position 2
YPL250C
No hits found in S. cerevisiae Position 

YHR199C
No hits found in S. cerevisiae Position 1
YOL021C
No hits found in S. cerevisiae Position 2
YJR030C
No hits found in S. cerevisiae Position 1
YDL169C
No hits found in S. cerevisiae Position 2
YDR472W
No hits found in S. cerevisiae Position 2
YGR041W
No hits found in S. cerevisiae Position 2
YPL199C
No hits found in S. cerevisiae Position 2
YPR026W
No hits found in S. cerevisiae Position 2
YCL038C
No hits found in S. cerevisiae Position 2
YMR198W
No hits found in S. cerevisiae Position 2
YHR090C
No hits found in S. cerevisiae Position 2
YDR217C
No hits found in S. cerevisiae Position 2
YKL095W
No hits found in S. cerevisiae Position 2
YHR036W
No hits found in S. cerevisiae Position 2
YGL254W
No hits found in S. cerevisiae Position 2
YER040W
No hits found in S. cerevisiae Position 2
YMR020W
No hits found in S. cerevisiae Position 2
YKL170W
No hits found in S. cerevisiae Position 2
YDR311W
No hits found in S. cerevisiae Position 2
YGR127W
No hits found in S. cerevisiae Position 2


YNL180C
No hits found in S. cerevisiae Position 2
YER180C-A
No hits found in S. cerevisiae Position 2
YIL007C
No hits found in S. cerevisiae Position 2
YMR319C
No hits found in S. cerevisiae Position 2
YLR096W
No hits found in S. cerevisiae Position 1
YNL287W
No hits found in S. cerevisiae Position 2
YKR087C
No hits found in S. cerevisiae Position 2
YLR061W
No hits found in S. cerevisiae Position 1
YDR432W
No hits found in S. cerevisiae Position 2
YJR008W
No hits found in S. cerevisiae Position 2
YPL153C
No hits found in S. cerevisiae Position 2
YKL073W
No hits found in S. cerevisiae Position 2
YPL133C
No hits found in S. cerevisiae Position 2
YOR122C
No hits found in S. cerevisiae Position 2
YIL050W
No hits found in S. cerevisiae Position 1
YGL196W
No hits found in S. cerevisiae Position 2
YOL083W
No hits found in S. cerevisiae Position 2
YMR129W
No hits found in S. cerevisiae Position 2
YER179W
No hits found in S. cerevisiae Position 2
YFR009W
No hits found in S. cerevisiae Position 

YJR059W
No hits found in S. cerevisiae Position 2
YIL132C
No hits found in S. cerevisiae Position 2
YKL069W
No hits found in S. cerevisiae Position 2
YDL087C
No hits found in S. cerevisiae Position 2
YGR245C
No hits found in S. cerevisiae Position 2
YGL009C
No hits found in S. cerevisiae Position 2
YJL025W
No hits found in S. cerevisiae Position 2
YEL052W
No hits found in S. cerevisiae Position 2
YNL042W
No hits found in S. cerevisiae Position 2
YLR131C
No hits found in S. cerevisiae Position 1
YMR060C
No hits found in S. cerevisiae Position 2
YDL160C-A
No hits found in S. cerevisiae Position 2
YKL088W
No hits found in S. cerevisiae Position 2
YKL034W
No hits found in S. cerevisiae Position 2
YMR218C
No hits found in S. cerevisiae Position 2
YKR038C
No hits found in S. cerevisiae Position 2
YJR062C
No hits found in S. cerevisiae Position 2
YLL057C
No hits found in S. cerevisiae Position 2
YDR390C
No hits found in S. cerevisiae Position 2
YNL320W
No hits found in S. cerevisiae Position 

YDR104C
No hits found in S. cerevisiae Position 2
YGR193C
No hits found in S. cerevisiae Position 2
YLR371W
No hits found in S. cerevisiae Position 1
YFL010C
No hits found in S. cerevisiae Position 2
YPL018W
No hits found in S. cerevisiae Position 2
YDL194W
No hits found in S. cerevisiae Position 2
YIR031C
No hits found in S. cerevisiae Position 2
YPL274W
No hits found in S. cerevisiae Position 1
YIL109C
No hits found in S. cerevisiae Position 2
YHR129C
No hits found in S. cerevisiae Position 2
YMR036C
No hits found in S. cerevisiae Position 2
YNL016W
No hits found in S. cerevisiae Position 2
YPL207W
No hits found in S. cerevisiae Position 2
YOR177C
No hits found in S. cerevisiae Position 2
YJR123W
No hits found in S. cerevisiae Position 2
YKL092C
No hits found in S. cerevisiae Position 2
YOR377W
No hits found in S. cerevisiae Position 2
YDR320C
No hits found in S. cerevisiae Position 2
YDR505C
No hits found in S. cerevisiae Position 2
YDR017C
No hits found in S. cerevisiae Position 2


YLR278C
No hits found in S. cerevisiae Position 2
YJR006W
No hits found in S. cerevisiae Position 2
YNR044W
No hits found in S. cerevisiae Position 1
YPL076W
No hits found in S. cerevisiae Position 2
YOR298W
No hits found in S. cerevisiae Position 2
YLR312W-A
No hits found in S. cerevisiae Position 2
YAR020C
No hits found in S. cerevisiae Position 2
YJL137C
No hits found in S. cerevisiae Position 2
YIL129C
No hits found in S. cerevisiae Position 2
YHR192W
No hits found in S. cerevisiae Position 2
YDR383C
No hits found in S. cerevisiae Position 2
YNL238W
No hits found in S. cerevisiae Position 2
YLL010C
No hits found in S. cerevisiae Position 2
YOR074C
No hits found in S. cerevisiae Position 2
YKR059W
No hits found in S. cerevisiae Position 1
YLR303W
No hits found in S. cerevisiae Position 1
YHR118C
No hits found in S. cerevisiae Position 2
YNL156C
No hits found in S. cerevisiae Position 1
YDR216W
No hits found in S. cerevisiae Position 2
YOR278W
No hits found in S. cerevisiae Position 

YDL079C
No hits found in S. cerevisiae Position 2
YJR094W-A
No hits found in S. cerevisiae Position 2
YDR319C
No hits found in S. cerevisiae Position 2
YPR070W
No hits found in S. cerevisiae Position 2
YNL185C
No hits found in S. cerevisiae Position 2
YPL151C
No hits found in S. cerevisiae Position 2
YKL172W
No hits found in S. cerevisiae Position 2
YEL005C
No hits found in S. cerevisiae Position 2
YNR039C
No hits found in S. cerevisiae Position 2
YJR076C
No hits found in S. cerevisiae Position 1
YDR208W
No hits found in S. cerevisiae Position 2
YML015C
No hits found in S. cerevisiae Position 2
YMR062C
No hits found in S. cerevisiae Position 2
YGL166W
No hits found in S. cerevisiae Position 2
YOR238W
No hits found in S. cerevisiae Position 2
YIL162W
No hits found in S. cerevisiae Position 2
YBR085C-A
No hits found in S. cerevisiae Position 2
YDR044W
No hits found in S. cerevisiae Position 2
YDR245W
No hits found in S. cerevisiae Position 2
YHR016C
No hits found in S. cerevisiae Positio

YML006C
No hits found in S. cerevisiae Position 2
YKR014C
No hits found in S. cerevisiae Position 2
YBR084W
No hits found in S. cerevisiae Position 2
YML116W
No hits found in S. cerevisiae Position 2
YGR101W
No hits found in S. cerevisiae Position 2
YOR306C
No hits found in S. cerevisiae Position 2
YKL155C
No hits found in S. cerevisiae Position 2
YPR060C
No hits found in S. cerevisiae Position 2
YFR033C
No hits found in S. cerevisiae Position 2
YMR137C
No hits found in S. cerevisiae Position 2
YDL081C
No hits found in S. cerevisiae Position 1
YJL081C
No hits found in S. cerevisiae Position 2
YKR029C
No hits found in S. cerevisiae Position 1
YER077C
No hits found in S. cerevisiae Position 2
YLR113W
No hits found in S. cerevisiae Position 2
YHR097C
No hits found in S. cerevisiae Position 1
YBR021W
No hits found in S. cerevisiae Position 1
YML074C
No hits found in S. cerevisiae Position 1
YOL084W
No hits found in S. cerevisiae Position 2
YOR115C
No hits found in S. cerevisiae Position 2


YFR030W
No hits found in S. cerevisiae Position 2
YPL081W
No hits found in S. cerevisiae Position 1
YGR148C
No hits found in S. cerevisiae Position 1
YIL099W
No hits found in S. cerevisiae Position 2
YPL194W
No hits found in S. cerevisiae Position 2
YLR054C
No hits found in S. cerevisiae Position 2
YCL052C
No hits found in S. cerevisiae Position 2
YMR121C
No hits found in S. cerevisiae Position 1
YNL190W
No hits found in S. cerevisiae Position 2
YER180C
No hits found in S. cerevisiae Position 2
YHR041C
No hits found in S. cerevisiae Position 2
YBR145W
No hits found in S. cerevisiae Position 2
YKL188C
No hits found in S. cerevisiae Position 2
YLR175W
No hits found in S. cerevisiae Position 2
YDR143C
No hits found in S. cerevisiae Position 2
YIL155C
No hits found in S. cerevisiae Position 2
YDR481C
No hits found in S. cerevisiae Position 2
YMR181C
No hits found in S. cerevisiae Position 2
YAL015C
No hits found in S. cerevisiae Position 2
YGR249W
No hits found in S. cerevisiae Position 1


YPR188C
No hits found in S. cerevisiae Position 2
YPR072W
No hits found in S. cerevisiae Position 2
YHR007C
No hits found in S. cerevisiae Position 2
YOL112W
No hits found in S. cerevisiae Position 1
YLR194C
No hits found in S. cerevisiae Position 2
YGL163C
No hits found in S. cerevisiae Position 2
YDR524C
No hits found in S. cerevisiae Position 2
YKL122C
No hits found in S. cerevisiae Position 2
YDL030W
No hits found in S. cerevisiae Position 2
YJL210W
No hits found in S. cerevisiae Position 2
YNR056C
No hits found in S. cerevisiae Position 2
YPL190C
No hits found in S. cerevisiae Position 2
YOR284W
No hits found in S. cerevisiae Position 2
YBR058C-A
No hits found in S. cerevisiae Position 2
YBR236C
No hits found in S. cerevisiae Position 2
YOR207C
No hits found in S. cerevisiae Position 2
YDR479C
No hits found in S. cerevisiae Position 2
YDL045C
No hits found in S. cerevisiae Position 2
YLR409C
No hits found in S. cerevisiae Position 2
YKL203C
No hits found in S. cerevisiae Position 

YFL034C-B
No hits found in S. cerevisiae Position 2
YER178W
No hits found in S. cerevisiae Position 2
YER183C
No hits found in S. cerevisiae Position 2
YHR072W
No hits found in S. cerevisiae Position 2
YLR353W
No hits found in S. cerevisiae Position 1
YGR112W
No hits found in S. cerevisiae Position 2
YBR211C
No hits found in S. cerevisiae Position 2
YMR195W
No hits found in S. cerevisiae Position 1
YPR036W-A
No hits found in S. cerevisiae Position 2
YBL107C
No hits found in S. cerevisiae Position 2
YLR228C
No hits found in S. cerevisiae Position 1
YDR249C
No hits found in S. cerevisiae Position 2
YML129C
No hits found in S. cerevisiae Position 2
YLR274W
No hits found in S. cerevisiae Position 2
YOR129C
No hits found in S. cerevisiae Position 2
YFL027C
No hits found in S. cerevisiae Position 2
YAL024C
No hits found in S. cerevisiae Position 2
YNL212W
No hits found in S. cerevisiae Position 2
YJL168C
No hits found in S. cerevisiae Position 2
YLR352W
No hits found in S. cerevisiae Positio

YNL152W
No hits found in S. cerevisiae Position 2
YDL069C
No hits found in S. cerevisiae Position 2
YDL100C
No hits found in S. cerevisiae Position 2
YJL156C
No hits found in S. cerevisiae Position 2
YJL134W
No hits found in S. cerevisiae Position 2
YCR047C
No hits found in S. cerevisiae Position 2
YJR121W
No hits found in S. cerevisiae Position 2
YKL003C
No hits found in S. cerevisiae Position 2
YBL058W
No hits found in S. cerevisiae Position 2
YLR342W
No hits found in S. cerevisiae Position 1
YAL012W
No hits found in S. cerevisiae Position 2
YIR007W
No hits found in S. cerevisiae Position 2
YOR176W
No hits found in S. cerevisiae Position 2
YMR061W
No hits found in S. cerevisiae Position 2
YAR031W
No hits found in S. cerevisiae Position 2
YAL056W
No hits found in S. cerevisiae Position 2
YFL001W
No hits found in S. cerevisiae Position 2
YDR180W
No hits found in S. cerevisiae Position 2
YLR446W
No hits found in S. cerevisiae Position 2
YIL133C
No hits found in S. cerevisiae Position 2


YDR021W
No hits found in S. cerevisiae Position 2
YFL047W
No hits found in S. cerevisiae Position 2
YHR168W
No hits found in S. cerevisiae Position 2
YOR126C
No hits found in S. cerevisiae Position 2
YHR158C
No hits found in S. cerevisiae Position 1
YOR035C
No hits found in S. cerevisiae Position 2
YBR297W
No hits found in S. cerevisiae Position 2
YDR081C
No hits found in S. cerevisiae Position 2
YBR018C
No hits found in S. cerevisiae Position 2
YLR310C
No hits found in S. cerevisiae Position 1
YKL109W
No hits found in S. cerevisiae Position 2
YPL026C
No hits found in S. cerevisiae Position 1
YKL072W
No hits found in S. cerevisiae Position 2
YDR419W
No hits found in S. cerevisiae Position 2
YDL004W
No hits found in S. cerevisiae Position 2
YFL038C
No hits found in S. cerevisiae Position 2
YDR267C
No hits found in S. cerevisiae Position 2
YIL044C
No hits found in S. cerevisiae Position 2
YBR173C
No hits found in S. cerevisiae Position 2
YGL244W
No hits found in S. cerevisiae Position 2


YCL066W
No hits found in S. cerevisiae Position 2
YBR158W
No hits found in S. cerevisiae Position 2
YMR011W
No hits found in S. cerevisiae Position 2
YBR164C
No hits found in S. cerevisiae Position 2
YHR091C
No hits found in S. cerevisiae Position 1
YJL174W
No hits found in S. cerevisiae Position 2
YPL007C
No hits found in S. cerevisiae Position 2
YLR307W
No hits found in S. cerevisiae Position 1
YOL040C
No hits found in S. cerevisiae Position 2
YGL082W
No hits found in S. cerevisiae Position 2
YDR221W
No hits found in S. cerevisiae Position 2
YEL070W
No hits found in S. cerevisiae Position 2
YPL070W
No hits found in S. cerevisiae Position 2
YHR172W
No hits found in S. cerevisiae Position 2
YMR212C
No hits found in S. cerevisiae Position 2
YHR005C-A
No hits found in S. cerevisiae Position 2
YLR134W
No hits found in S. cerevisiae Position 2
YBR213W
No hits found in S. cerevisiae Position 2
YBR176W
No hits found in S. cerevisiae Position 2
YJR117W
No hits found in S. cerevisiae Position 

YHL023C
No hits found in S. cerevisiae Position 2
YNR016C
No hits found in S. cerevisiae Position 1
YKL042W
No hits found in S. cerevisiae Position 2
YOR307C
No hits found in S. cerevisiae Position 1
YHR108W
No hits found in S. cerevisiae Position 1
YJR057W
No hits found in S. cerevisiae Position 2
YKR048C
No hits found in S. cerevisiae Position 2
YER057C
No hits found in S. cerevisiae Position 2
YGL181W
No hits found in S. cerevisiae Position 2
YPL020C
No hits found in S. cerevisiae Position 2
YGL255W
No hits found in S. cerevisiae Position 2
YPL042C
No hits found in S. cerevisiae Position 2
YNL138W-A
No hits found in S. cerevisiae Position 2
YDR059C
No hits found in S. cerevisiae Position 1
YDL112W
No hits found in S. cerevisiae Position 2
YPR074C
No hits found in S. cerevisiae Position 1
YNL022C
No hits found in S. cerevisiae Position 2
YDR083W
No hits found in S. cerevisiae Position 2
YMR240C
No hits found in S. cerevisiae Position 2
YOL113W
No hits found in S. cerevisiae Position 

YDR009W
No hits found in S. cerevisiae Position 1
YBR014C
No hits found in S. cerevisiae Position 2
YOL131W
No hits found in S. cerevisiae Position 1
YDR333C
No hits found in S. cerevisiae Position 2
YGL226C-A
No hits found in S. cerevisiae Position 2
YDR126W
No hits found in S. cerevisiae Position 2
YDR325W
No hits found in S. cerevisiae Position 2
YDL018C
No hits found in S. cerevisiae Position 2
YDR032C
No hits found in S. cerevisiae Position 1
YDR035W
No hits found in S. cerevisiae Position 2
YBL072C
No hits found in S. cerevisiae Position 2
YLL013C
No hits found in S. cerevisiae Position 2
YOR279C
No hits found in S. cerevisiae Position 2
YLR167W
No hits found in S. cerevisiae Position 2
YJL140W
No hits found in S. cerevisiae Position 2
YGR292W
No hits found in S. cerevisiae Position 2
YDL150W
No hits found in S. cerevisiae Position 2
YNL236W
No hits found in S. cerevisiae Position 2
YBR273C
No hits found in S. cerevisiae Position 2
YGR146C
No hits found in S. cerevisiae Position 

YNR065C
No hits found in S. cerevisiae Position 2
YNL223W
No hits found in S. cerevisiae Position 2
YIL157C
No hits found in S. cerevisiae Position 2
YBL027W
No hits found in S. cerevisiae Position 2
YGR284C
No hits found in S. cerevisiae Position 2
YMR161W
No hits found in S. cerevisiae Position 2
YDR332W
No hits found in S. cerevisiae Position 2
YGL019W
No hits found in S. cerevisiae Position 2
YNL294C
No hits found in S. cerevisiae Position 2
YAL028W
No hits found in S. cerevisiae Position 2
YOR230W
No hits found in S. cerevisiae Position 2
YJL051W
No hits found in S. cerevisiae Position 2
YKL186C
No hits found in S. cerevisiae Position 2
YDL090C
No hits found in S. cerevisiae Position 2
YGL230C
No hits found in S. cerevisiae Position 1
YDR490C
No hits found in S. cerevisiae Position 2
YPL089C
No hits found in S. cerevisiae Position 1
YHR106W
No hits found in S. cerevisiae Position 1
YER098W
No hits found in S. cerevisiae Position 1
YBR143C
No hits found in S. cerevisiae Position 2


YJR150C
No hits found in S. cerevisiae Position 2
YJR119C
No hits found in S. cerevisiae Position 2
YGR255C
No hits found in S. cerevisiae Position 2
YOR359W
No hits found in S. cerevisiae Position 2
YGR068C
No hits found in S. cerevisiae Position 2
YKL160W
No hits found in S. cerevisiae Position 2
YJL194W
No hits found in S. cerevisiae Position 2
YDR517W
No hits found in S. cerevisiae Position 2
YDR518W
No hits found in S. cerevisiae Position 1
YMR296C
No hits found in S. cerevisiae Position 2
YKR098C
No hits found in S. cerevisiae Position 1
YLR144C
No hits found in S. cerevisiae Position 2
YDL126C
No hits found in S. cerevisiae Position 2
YJR155W
No hits found in S. cerevisiae Position 2
YLR427W
No hits found in S. cerevisiae Position 2
YML054C
No hits found in S. cerevisiae Position 2
YMR260C
No hits found in S. cerevisiae Position 2
YHR162W
No hits found in S. cerevisiae Position 1
YLR405W
No hits found in S. cerevisiae Position 2
YLL039C
No hits found in S. cerevisiae Position 2


YPL244C
No hits found in S. cerevisiae Position 2
YOR269W
No hits found in S. cerevisiae Position 2
YHR200W
No hits found in S. cerevisiae Position 2
YPL095C
No hits found in S. cerevisiae Position 1
YNL224C
No hits found in S. cerevisiae Position 2
YPL046C
No hits found in S. cerevisiae Position 2
YLR262C-A
No hits found in S. cerevisiae Position 2
YLR259C
No hits found in S. cerevisiae Position 2
YDR177W
No hits found in S. cerevisiae Position 2
YML105C
No hits found in S. cerevisiae Position 2
YNL119W
No hits found in S. cerevisiae Position 2
YPR145C-A
No hits found in S. cerevisiae Position 2
YBR065C
No hits found in S. cerevisiae Position 2
YDR464W
No hits found in S. cerevisiae Position 2
YPL066W
No hits found in S. cerevisiae Position 2
YPL226W
No hits found in S. cerevisiae Position 2
YNL010W
No hits found in S. cerevisiae Position 2
YGR031W
No hits found in S. cerevisiae Position 2
YNL071W
No hits found in S. cerevisiae Position 2
YLR421C
No hits found in S. cerevisiae Positio

YGR268C
No hits found in S. cerevisiae Position 2
YOR147W
No hits found in S. cerevisiae Position 2
YKR006C
No hits found in S. cerevisiae Position 2
YML013W
No hits found in S. cerevisiae Position 2
YOR239W
No hits found in S. cerevisiae Position 2
YPR181C
No hits found in S. cerevisiae Position 2
YPL057C
No hits found in S. cerevisiae Position 1
YER134C
No hits found in S. cerevisiae Position 2
YJR127C
No hits found in S. cerevisiae Position 2
YEL048C
No hits found in S. cerevisiae Position 2
YFL030W
No hits found in S. cerevisiae Position 2
YER075C
No hits found in S. cerevisiae Position 2
YGL013C
No hits found in S. cerevisiae Position 1
YER159C
No hits found in S. cerevisiae Position 2
YLR095C
No hits found in S. cerevisiae Position 2
YLR220W
No hits found in S. cerevisiae Position 2
YLR241W
No hits found in S. cerevisiae Position 2
YLR200W
No hits found in S. cerevisiae Position 2
YKR046C
No hits found in S. cerevisiae Position 2
YDR243C
No hits found in S. cerevisiae Position 2


YBR091C
No hits found in S. cerevisiae Position 2
YOL078W
No hits found in S. cerevisiae Position 2
YAR014C
No hits found in S. cerevisiae Position 2
YDL137W
No hits found in S. cerevisiae Position 2
YNL004W
No hits found in S. cerevisiae Position 1
YCR089W
No hits found in S. cerevisiae Position 2
YOR357C
No hits found in S. cerevisiae Position 2
YGL233W
No hits found in S. cerevisiae Position 2
YCL028W
No hits found in S. cerevisiae Position 2
YLR047C
No hits found in S. cerevisiae Position 2
YDR462W
No hits found in S. cerevisiae Position 2
YDR202C
No hits found in S. cerevisiae Position 2
YFL018C
No hits found in S. cerevisiae Position 2
YEL049W
No hits found in S. cerevisiae Position 2
YER053C
No hits found in S. cerevisiae Position 2
YLR447C
No hits found in S. cerevisiae Position 2
YGL240W
No hits found in S. cerevisiae Position 2
YNR035C
No hits found in S. cerevisiae Position 2
YMR091C
No hits found in S. cerevisiae Position 2
YIL106W
No hits found in S. cerevisiae Position 2


YAL001C
No hits found in S. cerevisiae Position 2
YPR002W
No hits found in S. cerevisiae Position 2
YFR041C
No hits found in S. cerevisiae Position 2
YOR342C
No hits found in S. cerevisiae Position 1
YKL018W
No hits found in S. cerevisiae Position 2
YJL218W
No hits found in S. cerevisiae Position 2
YBR047W
No hits found in S. cerevisiae Position 2
YDR070C
No hits found in S. cerevisiae Position 2
YAL009W
No hits found in S. cerevisiae Position 2
YAR007C
No hits found in S. cerevisiae Position 2
YKL064W
No hits found in S. cerevisiae Position 2
YGR278W
No hits found in S. cerevisiae Position 2
YOR288C
No hits found in S. cerevisiae Position 2
YDL127W
No hits found in S. cerevisiae Position 1
YNL295W
No hits found in S. cerevisiae Position 2
YBL108C-A
No hits found in S. cerevisiae Position 2
YGL219C
No hits found in S. cerevisiae Position 2
YLR384C
No hits found in S. cerevisiae Position 2
YBR191W
No hits found in S. cerevisiae Position 2
YNL129W
No hits found in S. cerevisiae Position 

YDR011W
No hits found in S. cerevisiae Position 2
YDL166C
No hits found in S. cerevisiae Position 2
YNL063W
No hits found in S. cerevisiae Position 2
YFL011W
No hits found in S. cerevisiae Position 2
YNL077W
No hits found in S. cerevisiae Position 2
YBR231C
No hits found in S. cerevisiae Position 2
YBR182C
No hits found in S. cerevisiae Position 2
YHR087W
No hits found in S. cerevisiae Position 2
YDR527W
No hits found in S. cerevisiae Position 2
YBR008C
No hits found in S. cerevisiae Position 2
YPL242C
No hits found in S. cerevisiae Position 2
YGR218W
No hits found in S. cerevisiae Position 2
YNR034W
No hits found in S. cerevisiae Position 1
YBR159W
No hits found in S. cerevisiae Position 2
YMR214W
No hits found in S. cerevisiae Position 2
YLR141W
No hits found in S. cerevisiae Position 2
YDR122W
No hits found in S. cerevisiae Position 2
YNR026C
No hits found in S. cerevisiae Position 1
YJL179W
No hits found in S. cerevisiae Position 2
YGL194C-A
No hits found in S. cerevisiae Position 

In [9]:
protfasta.write_fasta(output_seqs,'data/2yeast_sequence_dataset.fasta')

In [15]:
disorder_scores = {}
pPLDDT_scores = {}
for s in tqdm(output_seqs):
    disorder_scores[s] = meta.predict_disorder(output_seqs[s],version=1)
    pPLDDT_scores[s] = meta.predict_pLDDT(output_seqs[s])
    

  0%|          | 0/5430 [00:00<?, ?it/s]

In [16]:
with open('data/conservation_scores_SHPRD.tsv', 'w') as fh:    
    for s in output_conservation_scores:
        fh.write(f'{s}\tconservation\t')
        for c in output_conservation_scores[s]:
            fh.write('%1.3f\t'%(c))
        fh.write('\n')


In [18]:
with open('data/2025_disorder_scores_SHPRD.tsv', 'w') as fh:    
    for s in disorder_scores:
        fh.write(f'{s}\tdisorder\t')
        for c in disorder_scores[s]:
            fh.write('%1.3f\t'%(c))
        fh.write('\n')
        

In [20]:
with open('data/2025_pLDDT_scores_SHPRD.tsv', 'w') as fh:    
    for s in pPLDDT_scores:
        fh.write(f'{s}\tpLDDT\t')
        for c in pPLDDT_scores[s]:
            fh.write('%1.1f\t'%(c))
        fh.write('\n')


In [21]:
with open('data/number_of_orthologs_SHPRD.tsv', 'w') as fh:    
    for s in ortholog_count:        
        fh.write(f'{s}\tortholog_count:{ortholog_count[s]}\n')


In [23]:
# essential genes 
essential_genes_file = 'data/essential_proteins_annotation.tsv'

with open(essential_genes_file,'r') as fh:
    content = fh.readlines()

essential_YSNs = []
for i in content:
    uniprot_id = i.split('\t')[0]
    

    # we do this because two of the uniprot IDs (P02309, P61830) correspond to
    # TWO YSN values (['YBR009C', 'YNL030W']) and (['YBR010W', 'YNL031C'])
    for ysn in fungi_matrix._Pillars__cerevisiae_UNIPROT_to_YSN[uniprot_id]:
        essential_YSNs.append(ysn)


In [25]:
with open('data/essential_proteins_SHPRD.tsv', 'w') as fh:
    
    for s in essential_YSNs:
        
        name = f'{s}_Scerevisiae_{s}'
        fh.write(f'{name}\tessential_protein:True\n')
        


In [26]:
abf1_orthologs_al = fungi_matrix.get_aligned_scerevisiae_sequences('YKL112W')


In [27]:
abf1_orthologs = {}
for i in abf1_orthologs_al:
    abf1_orthologs[i] = abf1_orthologs_al[i].replace('-','')

In [28]:
protfasta.write_fasta(abf1_orthologs, 'data/ygob_abf1_orthologs.fasta',)
protfasta.write_fasta(abf1_orthologs_al, 'data/ygob_abf1_orthologs_aligned.fasta',)