# Secretory Pathway Features Retrieval
This notebook retrieves information diferent identifiers from the NIH database. and adds it to the **"Secretory Pathway Recon" Google Sheet**.

### Load packages and define datasets

In [1]:
import pandas as pd
from Bio import Entrez
import Request_Utilis
from google_sheet import GoogleSheet

Entrez.email = "a.antonakoudis@sartorius.com"

In [2]:
##### ----- Generate datasets from Google Sheet ----- #####

#Credential file
KEY_FILE_PATH = 'credentials.json'

#CHO Network Reconstruction + Recon3D_v3 Google Sheet ID
SPREADSHEET_ID = '1DaAdZlvMYDqb7g31I5dw-ZCZH52Xj_W3FnQMFUzqmiQ'

# Initialize the GoogleSheet object
gsheet_file = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sec_recon_sheet = 'SecRecon'
sec_recon = gsheet_file.read_google_sheet(sec_recon_sheet)
# Create a copy of the dataset
sec_recon_dc = sec_recon.copy()

## 1. Retrieve Human CHO and Mouse Entrez IDs
Here we use the fucntion get_entrez_id from the **Request Utilis** module to fetch the Entrez IDs for Human and then use this as input to retrieve information for CHO and Mouse.

### 1.1 Human Entriz ID
Here we used the Human Entrez IDs from the step before to get the CHO and Mouse orthologs

In [None]:
# Update Human Entrez IDs
for i,row in sec_recon_dc.iterrows():
    if pd.isnull(row['HUMAN ENTREZID']) or row['HUMAN ENTREZID'] == '':
        human_entrez = Request_Utilis.get_entrez_id(row['GENE SYMBOL'])
        sec_recon_dc.at[i, 'HUMAN ENTREZID'] = human_entrez

if not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print("Google Sheet updated.")
else:
    print('Human Entrez IDs are up-to-date')

### 1.2 CHO Entrez IDs from other databases
Before running the **get_gene_id** function on CHO genes, we first populate some of the CHO genes with a mapping of orthologs based on our own dataset comprised from different databases.

In [None]:
# Map Human IDs to CHO IDs from the "cho2human_mapping" dataset

cho2human_mapping = pd.read_csv("../Orthologs/cho2human_mapping.tsv", sep='\t')
cho2human_mapping2 = pd.read_excel("../Orthologs/orthologs.xlsx", index_col=0)
cho2human_mapping2['Human GeneID'] = pd.to_numeric(cho2human_mapping2['Human GeneID'], errors='coerce')
cho2human_mapping2['Human GeneID'] = cho2human_mapping2['Human GeneID'].astype('Int64')

cho_id_lookup = dict(zip(cho2human_mapping['HUMAN_ID'], cho2human_mapping['CHO_ID'])) #convert to dict for mapping
cho_id_lookup2 = dict(zip(cho2human_mapping2['Human GeneID'], cho2human_mapping2['CHO GeneID'])) #convert to dict for mapping

for index, row in sec_recon_dc.iterrows():
    if pd.isna(row['CHO ENTREZID']) or row['CHO ENTREZID'] == '':
        try:
            human_id = int(row['HUMAN ENTREZID'])
            cho_id = cho_id_lookup.get(human_id)
            if cho_id is not None:
                sec_recon_dc.at[index, 'CHO ENTREZID'] = cho_id
            else:
                try:
                    cho_id = cho_id_lookup2.get(human_id)
                    if cho_id is not None:
                        sec_recon_dc.at[index, 'CHO ENTREZID'] = cho_id
                except ValueError:
                    print(f'{human_id} is not a valid Human Entrez ID')      
        except ValueError:
            print(f'{human_id} is not a valid Human Entrez ID')
            continue        

if not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print("Google Sheet updated on CHO Entrez IDs from cho2human dataset")
else:
    print('CHO Entrez IDs from "cho2human_mapping" dataset are up-to-date')

### 1.3 CHO and Mouse Entrez IDs 
Finally we run the **get_gene_ids** function to retrieve CHO and Mouse Entrez IDs by mapping the orthologs using the Human Entrez IDs as input.

In [None]:
## -- CHO Entrez IDs -- ##

for index, row in sec_recon_dc.iterrows():
    if pd.isna(row['CHO ENTREZID']) or row['CHO ENTREZID'] == '':
        human_id = row['HUMAN ENTREZID']
        cho_ortholog_EntrezID = Request_Utilis.get_gene_ids(human_id, '10029')
        if cho_ortholog_EntrezID is not None:
            sec_recon_dc.at[index, 'CHO ENTREZID'] = cho_ortholog_EntrezID
            
if not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print("Google Sheet updated on CHO Entrez IDs from NIH database")
else:
    print('CHO Entrez IDs from NIH database are up-to-date')

In [None]:
## -- Mouse Entrez IDs -- ##

loop_counter = 0
update_threshold = 50

for index, row in sec_recon_dc.iterrows():
    if pd.isna(row['MOUSE ENTREZID']) or row['MOUSE ENTREZID'] == '':
        human_id = row['HUMAN ENTREZID']
        mouse_ortholog_EntrezID = Request_Utilis.get_gene_ids(human_id, '10090')
        if mouse_ortholog_EntrezID is not None:
            sec_recon_dc.at[index, 'MOUSE ENTREZID'] = mouse_ortholog_EntrezID
            loop_counter += 1

        if loop_counter >= update_threshold:
            if not sec_recon_dc.equals(sec_recon):
                gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                print(f"Google Sheet updated on Mouse Entrez IDs from NIH database after {loop_counter} updates")
            else:
                print('Mouse Entrez IDs from NIH database are up-to-date')
            loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on Mouse Entrez IDs from NIH database after {loop_counter} updates")


## 2. Ensembl IDs
In this section we retrieve Ensembl IDs fron NIH database using the **Gene_Info_from_EntrezID** function from the Request Utilis module. Secondarily, we retrieve extra information from other identifiers to fill missing data in our dataset.

### 2.1 Human Ensembl IDs and Extra Identifiers
Here we retrieve the Human Ensembl IDs and Gene Alises and Gene Names.

In [None]:
# Collect missing information from NIH database

updates = []
for i, gene in sec_recon_dc.iterrows():
    human_entrezID = gene['HUMAN ENTREZID']
    gene_symbol = gene['GENE SYMBOL']
    if gene['ALIAS'] == '' or gene['GENENAME'] == '' or gene['HUMAN ENSEMBL'] == '':
        print(gene_symbol)
        try:
            org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(human_entrezID)
            updates.append((i, gene_synonyms, gene_name, gene_ensemble))
        except ValueError:
            print(f'No valid Entrez ID for gene {gene_symbol}')

# Apply the updates outside the loop
for i, gene_synonyms, gene_name, gene_ensemble in updates:
    sec_recon_dc.at[i, 'ALIAS'] = gene_synonyms
    sec_recon_dc.at[i, 'GENENAME'] = gene_name
    sec_recon_dc.at[i, 'HUMAN ENSEMBL'] = gene_ensemble
    
sec_recon_dc['ALIAS'] = sec_recon_dc['ALIAS'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
sec_recon_dc['GENENAME'] = sec_recon_dc['GENENAME'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
sec_recon_dc['HUMAN ENSEMBL'] = sec_recon_dc['HUMAN ENSEMBL'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
 
    
if not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print("Google Sheet updated.")
else:
    print('Human identifiers are up-to-date')

### 2.2 CHO and Mouse Ensembl IDs and Gene Symbols
Using the same functionw we retrieve Ensembl IDs and Gene Symbols for CHO and Mouse

In [None]:
## -- CHO Ensembl IDs and Gene Symbol -- ##

loop_counter = 0
update_threshold = 50

# Collect missing information for CHO identifiers
for i, gene in sec_recon_dc.iterrows():
    cho_entrezID = str(gene['CHO ENTREZID'])
    if cho_entrezID != '':
        if (pd.isna(gene['CHO ENSEMBL']) or gene['CHO ENSEMBL'] == '') or (pd.isna(gene['CHO GENE SYMBOL']) or gene['CHO GENE SYMBOL'] == ''):
            try:
                org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(cho_entrezID)
                if (pd.isna(gene['CHO GENE SYMBOL']) or gene['CHO GENE SYMBOL'] == ''):
                    sec_recon_dc.at[i, 'CHO GENE SYMBOL'] = gene_symbol
                if (pd.isna(gene['CHO ENSEMBL']) or gene['CHO ENSEMBL'] == ''):
                    sec_recon_dc.at[i, 'CHO ENSEMBL'] = gene_ensemble
            except ValueError:
                print(f'No valid Entrez ID for gene {gene_symbol}')
            loop_counter += 1

            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on CHO Ensembl IDs after {loop_counter} updates")
                else:
                    print('CHO Ensembl IDs are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on CHO Ensembl IDs after {loop_counter} updates")

In [None]:
## -- Mouse Ensembl IDs and Gene Symbol-- ##

loop_counter = 0
update_threshold = 50

# Collect missing information for CHO identifiers
for i, gene in sec_recon_dc.iterrows():
    mouse_entrezID = str(gene['MOUSE ENTREZID'])
    if mouse_entrezID != '':
        if (pd.isna(gene['MOUSE ENSEMBL']) or gene['MOUSE ENSEMBL'] == '') or (pd.isna(gene['MOUSE GENE SYMBOL']) or gene['MOUSE GENE SYMBOL'] == ''):
            try:
                org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(mouse_entrezID)
                if (pd.isna(gene['MOUSE GENE SYMBOL']) or gene['MOUSE GENE SYMBOL'] == ''):
                    sec_recon_dc.at[i, 'MOUSE GENE SYMBOL'] = gene_symbol
                if (pd.isna(gene['MOUSE ENSEMBL']) or gene['MOUSE ENSEMBL'] == ''):
                    sec_recon_dc.at[i, 'MOUSE ENSEMBL'] = gene_ensemble
            except ValueError:
                print(f'No valid Entrez ID for gene {gene_symbol}')
            loop_counter += 1

            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on Mouse Ensembl IDs after {loop_counter} updates")
                else:
                    print('Mouse Ensembl IDs are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on Mouse Ensembl IDs after {loop_counter} updates")

## 3. Uniprot IDs
In this section we retrieve all the Uniprot IDs linked to each gene Entrez ID from NIH database, using the **Gene_Info_from_EntrezID** function from the Request Utilis module.

In [None]:
## -- Human Uniprot IDs -- ##

loop_counter = 0
update_threshold = 50

# Collect missing information for CHO identifiers
for i, gene in sec_recon_dc.iterrows():
    human_entrezID = str(gene['HUMAN ENTREZID'])
    if human_entrezID != '':
        if (pd.isna(gene['HUMAN UNIPROT']) or gene['HUMAN UNIPROT'] == ''):
            try:
                org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(human_entrezID)
                unique_uniprotids = list(set([item for sublist in [x[2] for x in gene_products] for item in sublist]))
                sec_recon_dc.at[i, 'HUMAN UNIPROT'] = unique_uniprotids
                print(loop_counter+1, gene_symbol, human_entrezID, unique_uniprotids)
            except ValueError:
                print(f'No valid Entrez ID for gene {gene_symbol}')
            loop_counter += 1

            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    sec_recon_dc['HUMAN UNIPROT'] = sec_recon_dc['HUMAN UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on Human Uniprot IDs after {loop_counter} updates")
                else:
                    print('HUMAN Uniprot IDs are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    sec_recon_dc['HUMAN UNIPROT'] = sec_recon_dc['HUMAN UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on Human Uniprot IDs after {loop_counter} updates")

In [None]:
## -- CHO Uniprot IDs -- ##

loop_counter = 0
update_threshold = 50

# Collect missing information for CHO identifiers
for i, gene in sec_recon_dc.iterrows():
    cho_entrezID = str(gene['CHO ENTREZID'])
    if cho_entrezID != '':
        if (pd.isna(gene['CHO UNIPROT']) or gene['CHO UNIPROT'] == ''):
            try:
                org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(cho_entrezID)
                unique_uniprotids = list(set([item for sublist in [x[2] for x in gene_products] for item in sublist]))
                sec_recon_dc.at[i, 'CHO UNIPROT'] = unique_uniprotids
                print(loop_counter+1, gene_symbol, cho_entrezID, unique_uniprotids)
            except ValueError:
                print(f'No valid Entrez ID for gene {gene_symbol}')
            loop_counter += 1

            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    sec_recon_dc['CHO UNIPROT'] = sec_recon_dc['CHO UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on CHO Uniprot IDs after {loop_counter} updates")
                else:
                    print('CHO Uniprot IDs are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    sec_recon_dc['CHO UNIPROT'] = sec_recon_dc['CHO UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on CHO Uniprot IDs after {loop_counter} updates")

In [None]:
## -- Mouse Uniprot IDs -- ##

loop_counter = 0
update_threshold = 50

# Collect missing information for CHO identifiers
for i, gene in sec_recon_dc.iterrows():
    mouse_entrezID = str(gene['MOUSE ENTREZID'])
    if mouse_entrezID != '':
        if (pd.isna(gene['MOUSE UNIPROT']) or gene['MOUSE UNIPROT'] == ''):
            try:
                org, gene_symbol, gene_name, gene_synonyms, gene_ensemble, gene_products = Request_Utilis.Gene_Info_from_EntrezID(mouse_entrezID)
                unique_uniprotids = list(set([item for sublist in [x[2] for x in gene_products] for item in sublist]))
                sec_recon_dc.at[i, 'MOUSE UNIPROT'] = unique_uniprotids
                print(loop_counter+1, gene_symbol, mouse_entrezID, unique_uniprotids)
            except ValueError:
                print(f'No valid Entrez ID for gene {gene_symbol}')
            loop_counter += 1

            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    sec_recon_dc['MOUSE UNIPROT'] = sec_recon_dc['MOUSE UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on Mouse Uniprot IDs after {loop_counter} updates")
                else:
                    print('Mouse Uniprot IDs are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    sec_recon_dc['MOUSE UNIPROT'] = sec_recon_dc['MOUSE UNIPROT'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on Mouse Uniprot IDs after {loop_counter} updates")

## 4. Subcellular Localization
The subcellular localization is divided into two parts. First, we map the subcellular localization to all the genes from the data provided in the paper "[Global organelle profiling reveals subcellular localization and remodeling at proteome scale](https://www.biorxiv.org/content/10.1101/2023.12.18.572249v1)". Then, we use the **get_subcellular_localization** from the Request Utilis module to retrieve the subcellular localization of each gene using as input the Uniprot IDs retrieved previously.

In [None]:
# Generate "subcell_dict" for direct mapping into our dataset
subcell = pd.read_csv("../Input/subcellular_localization.csv")
subcell_dict = dict(zip(subcell['Gene_name_canonical'], subcell['consensus graph-based annotation (this study)']))

# Standarization of the subcellular compartments to be merged with the compartments in the Sec Recon dataset
for key in subcell_dict:
    if subcell_dict[key] == 'early_endosome':
        subcell_dict[key] = 'Early Endosome'
    elif subcell_dict[key] == 'centrosome':
        subcell_dict[key] = 'Centrosome'
    elif subcell_dict[key] == 'ER':
        subcell_dict[key] = 'Endoplasmic Reticulum'
    elif subcell_dict[key] == 'mitochondrion':
        subcell_dict[key] = 'Mitochondria'
    elif subcell_dict[key] == 'stress_granule':
        subcell_dict[key] = 'Stress Granule'
    elif subcell_dict[key] == 'unclassified':
        subcell_dict[key] = None
    elif subcell_dict[key] == 'peroxisome':
        subcell_dict[key] = 'Peroxisome'
    elif subcell_dict[key] == '14-3-3_scaffold':
        subcell_dict[key] = None
    elif subcell_dict[key] == 'recycling_endosome':
        subcell_dict[key] = 'Recycling Endosome'
    elif subcell_dict[key] == 'plasma_membrane':
        subcell_dict[key] = 'Plasma Membrane'
    elif subcell_dict[key] == 'lysosome':
        subcell_dict[key] = 'Lysosome'
    elif subcell_dict[key] == 'translation':
        subcell_dict[key] = 'Translation'
    elif subcell_dict[key] == 'actin_cytoskeleton':
        subcell_dict[key] = 'Actin Cytoskeleton'
    elif subcell_dict[key] == 'cytosol':
        subcell_dict[key] = 'Cytosol'
    elif subcell_dict[key] == 'nucleus':
        subcell_dict[key] = 'Nucleus'
    elif subcell_dict[key] == 'ERGIC':
        subcell_dict[key] = 'ERGIC'
    elif subcell_dict[key] == 'p-body':
        subcell_dict[key] = 'P-Body'
    elif subcell_dict[key] == 'trans-Golgi':
        subcell_dict[key] = 'trans-Golgi'
    elif subcell_dict[key] == 'nucleolus':
        subcell_dict[key] = 'Nucleolus'
    elif subcell_dict[key] == 'proteasome':
        subcell_dict[key] = 'Proteasome'
    elif subcell_dict[key] == 'Golgi':
        subcell_dict[key] = 'Golgi'

# Map subcellular localization to the dataset
sec_recon_dc['Subcellular Localization'] = sec_recon_dc['GENE SYMBOL'].map(subcell_dict)

# Update the Google Sheet file
if not sec_recon_dc.equals(sec_recon):
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print('Google Sheet updated on Subcellular Localization from "subcellular_localization.csv" dataset')
else:
    print('Subcellular Localizations from "subcellular_localization.csv" dataset are up-to-date')

In [10]:
#Retrieval of Subcellular localizations from Uniprot

loop_counter = 0
update_threshold = 50

for i, row in sec_recon_dc.iterrows():
    gene = row['GENE SYMBOL']
    # Subcellular compartments are extracted using the Human Uniprot ID
    uniprot_ids = row['HUMAN UNIPROT'].split(", ")
    if (pd.isna(row['Subcellular Localization']) or row['Subcellular Localization'] == ''):
         if uniprot_ids != ['']:
            for uni_id in uniprot_ids:
                sub_loc = Request_Utilis.get_subcellular_localization(uni_id)
                if sub_loc is not None:
                    new_sub_loc = []
                    for sloc in sub_loc:
                        # Standarization of the subcellular compartments to be included in the Sec Recon dataset
                        match_found = False
                        if sloc.startswith('Recycling endosome'):
                            sloc = 'Recycling Endosome'
                            match_found = True
                        if sloc.startswith('Late endosome'):
                            sloc = 'Late Endosome'
                            match_found = True
                        if sloc.startswith('Endosome membrane'):
                            sloc = 'Endosome'
                            match_found = True
                        if sloc.startswith('Early endosome'):
                            sloc = 'Early Endosome'
                            match_found = True
                        elif sloc.startswith('Endoplasmic Reticulum-Golgi'): 
                            sloc = 'ERGIC'    
                            match_found = True
                        elif sloc.startswith('Endoplasmic reticulum'):
                            sloc = 'Endoplasmic Reticulum'
                            match_found = True
                        elif 'COPII' in sloc:
                            sloc = 'ERGIC'    
                            match_found = True
                        elif 'cytoskeleton' in sloc:
                            sloc = 'Actin Cytoskeleton'
                            match_found = True
                        elif sloc.startswith('Cytoplasm'):
                            sloc = 'Cytoplasm'
                            match_found = True
                        elif 'trans-Golgi' in sloc:
                            sloc = 'trans-Golgi'
                            match_found = True
                        elif 'cis-Golgi' in sloc:
                            sloc = 'cis-Golgi'
                            match_found = True
                        elif sloc.startswith('Golgi apparatus'):
                            sloc = 'Golgi'
                            match_found = True
                        elif 'nucleolus' in sloc:
                            sloc = 'Nucleolus'
                            match_found = True
                        elif sloc.startswith('Nucleus'):
                            sloc = 'Nucleus'
                            match_found = True
                        elif sloc.startswith('Mitochondrion'):
                            sloc = 'Mitochondria'
                            match_found = True
                        elif sloc == 'Membrane' or sloc == 'Cell membrane':
                            sloc = 'Plasma Membrane'
                            match_found = True
                        elif sloc.startswith('Lysosome'):
                            sloc = 'Lysosome'
                            match_found = True
                        elif sloc == 'Secreted':
                            match_found = True
                        if not match_found:
                            continue
                            
                        new_sub_loc.append(sloc)
                            
                    break
            print(f'Subcellular localization of {gene} is {list(set(new_sub_loc))}')
            sec_recon_dc.at[i, 'Subcellular Localization'] = list(set(new_sub_loc))
            loop_counter += 1
            
            # After 50 iterations of the loop, update the Google Sheet file
            if loop_counter >= update_threshold:
                if not sec_recon_dc.equals(sec_recon):
                    sec_recon_dc['Subcellular Localization'] = sec_recon_dc['Subcellular Localization'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
                    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
                    print(f"Google Sheet updated on Subcellular Localizations from Uniprot after {loop_counter} updates")
                else:
                    print('Subcellular Localizations from Uniprot are up-to-date')
                loop_counter = 0

# Check if there are any remaining updates after exiting the loop
if loop_counter > 0 and not sec_recon_dc.equals(sec_recon):
    sec_recon_dc['Subcellular Localization'] = sec_recon_dc['Subcellular Localization'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
    gsheet_file.update_google_sheet(sec_recon_sheet, sec_recon_dc)
    print(f"Google Sheet updated on Subcellular Localizations from Uniprot after {loop_counter} updates")

Subcellular localization of A3GALT2 is ['Golgi']
Subcellular localization of A4GALT is ['Golgi', 'Plasma Membrane']
Subcellular localization of A4GNT is ['Golgi']
Subcellular localization of ABL2 is ['Actin Cytoskeleton']
Subcellular localization of ABO is ['Golgi', 'Secreted']
Subcellular localization of ACAP1 is ['Recycling Endosome']
Subcellular localization of ACAP3 is ['Recycling Endosome']
Subcellular localization of AGAP1 is ['Cytoplasm']
Subcellular localization of AGAP2 is ['Nucleus', 'Cytoplasm']
Subcellular localization of AGAP3 is ['Cytoplasm']
Subcellular localization of AGR2 is ['Secreted']
Subcellular localization of AGR3 is ['Endoplasmic Reticulum']
Subcellular localization of AKAP10 is ['Mitochondria', 'Cytoplasm', 'Plasma Membrane']
Subcellular localization of ALG10B is ['Plasma Membrane']
Subcellular localization of ALG1L2 is ['Plasma Membrane']
Subcellular localization of AP1M2 is ['Golgi', 'Cytoplasm']
Subcellular localization of AP3B2 is ['Golgi', 'Cytoplasm']
Sub

Subcellular localization of GALNT3 is ['Plasma Membrane']
Subcellular localization of GALNT5 is ['Golgi']
Subcellular localization of GALNT8 is ['Golgi']
Subcellular localization of GALNT9 is ['Golgi']
Subcellular localization of GALNT20 is ['Late Endosome']
Subcellular localization of GANAB is ['Golgi', 'Endoplasmic Reticulum']
Subcellular localization of GBF1 is ['Golgi']
Subcellular localization of GBGT1 is ['Golgi']
Subcellular localization of GCNT1 is ['Golgi']
Subcellular localization of GCNT2 is ['Golgi']
Subcellular localization of GCNT3 is ['Golgi']
Subcellular localization of GCNT4 is ['Golgi']
Google Sheet updated on Subcellular Localizations from Uniprot after 50 updates
Subcellular localization of GCNT7 is ['Golgi']
Subcellular localization of GGA3 is ['trans-Golgi', 'Early Endosome', 'Plasma Membrane', 'Endosome']
Subcellular localization of GLT1D1 is ['Secreted']
Subcellular localization of GLT6D1 is ['Plasma Membrane']
Subcellular localization of GLT8D2 is ['Plasma Memb

Subcellular localization of RAB37 is ['Cytoplasm']
Subcellular localization of RAB38 is ['Plasma Membrane']
Subcellular localization of RAB3C is ['Plasma Membrane']
Subcellular localization of RAB40A is ['Plasma Membrane']
Subcellular localization of RAB40AL is ['Mitochondria', 'Cytoplasm', 'Plasma Membrane']
Subcellular localization of RAB40B is ['Plasma Membrane']
Subcellular localization of RAB40C is ['Plasma Membrane']
Subcellular localization of RAB41 is ['Cytoplasm']
Subcellular localization of RAB42 is ['Plasma Membrane']
Subcellular localization of RAB44 is ['Plasma Membrane']
Subcellular localization of RAB6A is ['Golgi', 'Cytoplasm']
Subcellular localization of RAB6C is ['Nucleus', 'Actin Cytoskeleton', 'Cytoplasm']
Subcellular localization of RABGAP1L is ['Early Endosome', 'Golgi', 'Cytoplasm']
Subcellular localization of RASEF is ['Cytoplasm']
Subcellular localization of RBX1 is ['Nucleus', 'Cytoplasm']
Subcellular localization of RNF185 is ['Mitochondria', 'Endoplasmic Ret

Subcellular localization of CAPN6 is ['Actin Cytoskeleton', 'Cytoplasm']
Subcellular localization of CAPN8 is ['Golgi', 'Cytoplasm']
Subcellular localization of CAPN9 is ['Golgi', 'Cytoplasm']
Subcellular localization of CASP12 is ['Golgi', 'Cytoplasm']
Subcellular localization of CASQ1 is ['Mitochondria', 'Endoplasmic Reticulum']
Subcellular localization of CASQ2 is []
Subcellular localization of CLPB is ['Mitochondria']
Google Sheet updated on Subcellular Localizations from Uniprot after 50 updates
Subcellular localization of CNIH is ['Golgi', 'Endoplasmic Reticulum']
Subcellular localization of CNIH2 is ['Endoplasmic Reticulum', 'Plasma Membrane']
Subcellular localization of CNIH3 is []
Subcellular localization of COG3 is ['Golgi']
Subcellular localization of COG5 is ['Golgi', 'Cytoplasm']
Subcellular localization of COPS1 is ['Nucleus', 'Cytoplasm']
Subcellular localization of CALR3 is ['Endoplasmic Reticulum']
Subcellular localization of DNAJA4 is ['Plasma Membrane']
Subcellular l