# Use of hu.MAP 2.0 data programmatically with Python, taking advantage of Jupyter features

Work through the cells in this notebook that go through prepartaion steps and then some example rounds of queries. This will give you an idea of what is occuring to programmatically mine and annotate proteins identified in complexes in human cells. You could then edit this to run queries on the complexes of your favorite human proteins.  
However, I would suggest before spending a lot of time on that to see [the next notebook in this series](Making_many_hu.MAP2_reports_easily_using_Snakemake.ipynb) as you probably are interested in at least a handul of proteins and that will provide a more convenient way to query about the complexes of multiple proteins. What will be produced by the next notebook in this series is very similar to what you see here, yet all you need to do is provide a list of protein/gene identifiers and let it automatically process the identifiers in the list to produce report summaries like this file for each valid identifier in the list you provide. (That system has a check each round that what you provided is valid, and so if you are hitting issues here with your identifiers not working, see that one. There is no 'check' step in this demonstration Jupyter notebook.)

-------

What this Jupyter notebook file will produce when run:

- This Jupyter `.ipynb` file containing the following information:
    - list of all the proteins found in complexes along with the examined protein.
    - details about the individual complexes the examined protein occurs in, featuring extra information from UniProt.
    - list of proteins not observed to directly complex with the examined protein, yet complex with proteins that do directly complex.
- tab-separated files containing the details in the first two bullet points listed above
- an HTML file with the details about the individual complexes the examined protein occurs in, featuring extra information from UniProt. (The idea is you can use this anytime independent of Jupyter, to possibly share with others or convert to PDF & then share.)

#### Preparation

##### Get the complexes with confidence scores

Because the author-provided source didn't work for the hu.MAP 3.0 data, I expected `curl -OL "http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_complexes_20200809.txt"` to work on my local machine, yet fail on MyBinder because the involved port may be blocked on MyBinder for getting it from the original resource. Because of that expectation, I made a copy at https://gist.githubusercontent.com/fomightez/af3edda957e4d71acbaa30192e74e9af/raw/108a8c3fb3374a74ef3ca5d772a9dfe96e996c93/humap2_complexes_20200809.txt where MyBinder would have access. However, the curl of the original source works!! 
(Keeping a note about my copy now but using original source.)  
Run the next cell to get the data:

In [1]:
!curl -OL https://raw.githubusercontent.com/fomightez/humap2-binder/refs/heads/main/additional_nbs/standardizing_initial_data/humap2_complexes_20200809InOrderMatched.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  502k  100  502k    0     0  1171k      0 --:--:-- --:--:-- --:--:-- 1171k


Get an accessory script for adding information about the proteins in the complexes:

In [2]:
!curl -OL https://raw.githubusercontent.com/fomightez/structurework/refs/heads/master/humap3-utilities/make_lookup_table_for_extra_info4complexes.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2893  100  2893    0     0  11682      0 --:--:-- --:--:-- --:--:-- 11712


##### Put the data on the complexes into Pandas dataframe

(I'm using uv here just because I want to learn about it. I could have run the code in the script right in this notebook, and skipped the pickling and read pickle steps.)

Get the script to use with `uv` to read in the raw data and make a dataframe.

In [3]:
!curl -OL https://raw.githubusercontent.com/fomightez/structurework/refs/heads/master/humap3-utilities/complexes_rawCSV_to_df.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1007  100  1007    0     0    553      0  0:00:01  0:00:01 --:--:--   553


In [4]:
!uv run complexes_rawCSV_to_df.py humap2_complexes_20200809InOrderMatched.csv
import pandas as pd
rd_df = pd.read_pickle('raw_complexes_pickled_df.pkl')
rd_df

Reading inline script metadata from `[36mcomplexes_rawCSV_to_df.py[39m`
[2K[37m⠙[0m [2m                                                                              [0m

Unnamed: 0,HuMAP2_ID,Confidence,Uniprot_ACCs,genenames
0,HuMAP2_00000,3,O95900 Q9BQS8,TRUB2 FYCO1
1,HuMAP2_00001,4,Q15102 P68402 Q15797 P08133 Q99426 Q9H4M9,PAFAH1B3 PAFAH1B2 SMAD1 ANXA6 TBCB EHD1
2,HuMAP2_00002,5,Q9UF11 A1KXE4 Q6ZRY4 Q9Y6M7 Q15038 O43251 Q930...,PLEKHB1 FAM168B RBPMS2 SLC4A7 DAZAP2 RBFOX2 RB...
3,HuMAP2_00003,5,O14974 Q8WUM9 Q9Y5Y0 Q16563 Q14919 Q15836 Q299...,PPP1R12A SLC20A1 FLVCR1 SYPL1 DRAP1 VAMP3 MICA...
4,HuMAP2_00004,4,Q8WV99 Q49A92 Q9NQT8 Q9H672 P20774,ZFAND2B C8orf34 KIF13B ASB7 OGN
...,...,...,...,...
6960,HuMAP2_07014,4,Q9HC97 P31152 Q6S8J3 P13727 Q92871,GPR35 MAPK4 POTEE PRG2 PMM1
6961,HuMAP2_07015,4,Q96E29 Q8N5N7 Q96I51 Q9H5L6 O75127 Q9NPE2,MTERF3 MRPL50 RCC1L THAP9 PTCD1 NGRN
6962,HuMAP2_07016,5,O75319 Q96HN2 Q8NE31 O43865 P52952 Q2T9J0 Q9UP...,DUSP11 AHCYL2 FAM13C AHCYL1 NKX2-5 TYSND1 PDZR...
6963,HuMAP2_07017,2,Q96GP6 Q53GT1 P49448,SCARF2 KLHL22 GLUD2


That's a lot of complexes!

--------

## Analyze complexes for a protein

Let's start in this notebook with a single protein and use Python/Pandas to access the data easily.  

The next line will define the identifier for this protein to be used as the search term for this entire notebook. (The query can be done with the human gene name or the UniProt accession number.) 

In [5]:
search_term = "ROGDI"

With that set, the rest of this notebook will do three things:  
- First we'll just list all the proteins found in complexes along with that corresponding protein.
- Then we'll detail the individual complexes themselves, adding extra information from UniProt.
- Finally, we'll fo out another 'layer' by collecting a list of proteins observed in complexes with proteins identified in the hu.MAP 3,0 data to be complexed with the query protein, yet don't appear as members of the complexes along the query protein.

-----

### Show all proteins in related complexes with details added from Uniprot

Run the following cell to initiate the query that will collect all the.

In [6]:
# run the query collecting all proteins it occurs with
pattern = fr'\b{search_term}\b' # Create a regex pattern with word boundaries
rows_with_term = rd_df[rd_df['Uniprot_ACCs'].str.contains(pattern, case=False, regex=True) | rd_df['genenames'].str.contains(pattern, case=False, regex=True)]
list_all_associated_acc_name_tuples = []
for row in rows_with_term.itertuples():
    #print(row)
    list_all_associated_acc_name_tuples.extend((item1, item2) for item1, item2 in zip(row.Uniprot_ACCs.split(), row.genenames.split()))
partners_df = pd.DataFrame(set(list_all_associated_acc_name_tuples), columns=['Uniprot_ACCs', 'genenames'])
import rich
rich.print(f"\n[bold black]THE {len(partners_df)} PROTEINS OCCURING IN COMPLEXES WITH '{search_term}':[/bold black]\n")
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(partners_df.style.hide())

Uniprot_ACCs,genenames
Q9Y4E6,WDR7
P50993,ATP1A2
Q8TDJ6,DMXL2
Q9Y485,DMXL1
Q9GZN7,ROGDI


(Note: if you try to set the `search_term` to a non-valid identifier, you'll see as the ouptut here `NameError: name 'rd_df' is not defined`. The easiest way to see what you are using is a valid identifier & present in the data is to double-click in the file browser pane on the left side on the file `humap2_complexes_20200809InOrderMatched.csv` retrieved in the preparation step above. And then with that `.CSV` file open from the menu choose `Edit` > `Find` and entire your identifier in the box in the upper right and hit enter. If you hit enter and nothing happens, that identifier is not valid or no data is available. There is a programmatic check for this in the next notebook in the series.)

For now, the information is inclusive, meaning the search term protein is listed among them. I could easily change that.

The convenience of Pandas makes that easy to store for later use as a tab-separated file that will work with Excel.  
Make sure to download it to your local machine.

In [7]:
import datetime
now = datetime.datetime.now()
partners_df.to_csv(f'{search_term}_complexes_partners_humap3{now.strftime("_%Y_%m_%d")}.tsv', sep='\t',index = False) 

Show the file made:

In [8]:
ls *_complexes_partners_*

ROGDI_complexes_partners_humap3_2024_11_12.tsv


Make sure to download that if it is useful because this session is temporary.
That file should open in Excel just fine. (I could actually produce Excel files using openpyxl but leave that for later expansion.)

### Show all complexes that protein is in with extra information

The extra annotation information will come from the UniProt KnowledgeBase, using the package unipressed. This starts to show how doing this in Python/Jupyter can add convenience.

Run the following cell, and then those below it in this seciton to perform the query for this round.

In [9]:
# Next few cells will run the query collecting all complexes it occurs with and adding details
pattern = fr'\b{search_term}\b' # Create a regex pattern with word boundaries
rows_with_term_df = rd_df[rd_df['Uniprot_ACCs'].str.contains(pattern, case=False, regex=True) | rd_df['genenames'].str.contains(pattern, case=False, regex=True)].copy()
# make the dataframe have each row be a single protein
# to prepare to use pandas `explode()` to do that, first make the content in be lists
rows_with_term_df['Uniprot_ACCs'] = rows_with_term_df['Uniprot_ACCs'].str.split()
rows_with_term_df['genenames'] = rows_with_term_df['genenames'].str.split()
# Now use explode to create a new row for each element in both columns
df_expanded = rows_with_term_df.explode(['Uniprot_ACCs', 'genenames']).copy()
# Reset the index 
df_expanded = df_expanded.reset_index(drop=True)
# Display the first few rows of the expanded dataframe
print(df_expanded.tail())
# Next add extra information from UniProt for each protein

       HuMAP2_ID  Confidence Uniprot_ACCs genenames
6   HuMAP2_01834           3       Q9Y4E6      WDR7
7   HuMAP2_03388           2       Q9Y485     DMXL1
8   HuMAP2_03388           2       Q9GZN7     ROGDI
9   HuMAP2_03388           2       Q8TDJ6     DMXL2
10  HuMAP2_03388           2       Q9Y4E6      WDR7


In [10]:
# This cell makes lookup table with the extra information; it takes a while to run & so is in a cell on its own to save time during development
%run -i make_lookup_table_for_extra_info4complexes.py

In [11]:
# USe collected information to enhance the dataframe
pn_dict = {k: v['protein_name'] for k, v in lookup_dict.items()}
disease_dict = {k: v['disease'] for k, v in lookup_dict.items()}
synonyms_dict = {k: v['synonyms'] for k, v in lookup_dict.items()}
df_expanded['synonyms'] = df_expanded['Uniprot_ACCs'].map(synonyms_dict)
df_expanded['protein_name'] = df_expanded['Uniprot_ACCs'].map(pn_dict)
df_expanded['disease'] = df_expanded['Uniprot_ACCs'].map(disease_dict)
conf_val2text_dict = {
    1: 'Extremely High',
    2: 'Very High',
    3: 'High',
    4: 'Moderate High',
    5: 'Medium High',
    6: 'Medium'
}
# Use vectorized mapping to convert confidence values to text
df_expanded['Confidence'] = df_expanded['Confidence'].map(conf_val2text_dict)
base_uniprot_url = 'https://www.uniprot.org/uniprotkb/'
format_str = '{}{}/'
df_expanded = df_expanded.assign(Link=base_uniprot_url + df_expanded['Uniprot_ACCs'])
df_expanded

Unnamed: 0,HuMAP2_ID,Confidence,Uniprot_ACCs,genenames,synonyms,protein_name,disease,Link
0,HuMAP2_01148,Moderate High,Q9GZN7,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
1,HuMAP2_01148,Moderate High,Q8TDJ6,DMXL2,KIAA0856,DmX-like protein 2,Polyendocrine-polyneuropathy syndrome; Deafnes...,https://www.uniprot.org/uniprotkb/Q8TDJ6
2,HuMAP2_01834,High,P50993,ATP1A2,KIAA0778,Sodium/potassium-transporting ATPase subunit a...,"Migraine, familial hemiplegic, 2; Alternating ...",https://www.uniprot.org/uniprotkb/P50993
3,HuMAP2_01834,High,Q9Y485,DMXL1,XL1,DmX-like protein 1,None reported,https://www.uniprot.org/uniprotkb/Q9Y485
4,HuMAP2_01834,High,Q9GZN7,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
5,HuMAP2_01834,High,Q8TDJ6,DMXL2,KIAA0856,DmX-like protein 2,Polyendocrine-polyneuropathy syndrome; Deafnes...,https://www.uniprot.org/uniprotkb/Q8TDJ6
6,HuMAP2_01834,High,Q9Y4E6,WDR7,KIAA0541; TRAG,WD repeat-containing protein 7,None reported,https://www.uniprot.org/uniprotkb/Q9Y4E6
7,HuMAP2_03388,Very High,Q9Y485,DMXL1,XL1,DmX-like protein 1,None reported,https://www.uniprot.org/uniprotkb/Q9Y485
8,HuMAP2_03388,Very High,Q9GZN7,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
9,HuMAP2_03388,Very High,Q8TDJ6,DMXL2,KIAA0856,DmX-like protein 2,Polyendocrine-polyneuropathy syndrome; Deafnes...,https://www.uniprot.org/uniprotkb/Q8TDJ6


**Note diseases are limited to the first two listed at UniProt.**  
The data will be displayed below arranged better so don't worry about studying this output yet. 

Saving that as tab-separated data.

In [12]:
import datetime
now = datetime.datetime.now()
df_expanded.to_csv(f'{search_term}_complexesHUMAP2{now.strftime("_%Y_%m_%d")}.tsv', sep='\t',index = False) 

In [13]:
ls *_complexesHUMAP2*

ROGDI_complexesHUMAP2_2024_11_12.tsv


If you are doing these steps with settings other than the demonstration, you may wish to save that to your local machine as this session is temporary.

-------------

### Detailing all the complexes nicely

Now with that dataframe in hand, we can group them by the individual complex and display each nicely and completely.

In [14]:
grouped = df_expanded.groupby(['HuMAP2_ID','Confidence'])
import datetime
now = datetime.datetime.now()
for complex, grouped_df in grouped:
    import rich
    rich.print(f"Complex: [bold black]{complex[0]}[/bold black]\tConfidence: [bold black]{complex[1]}[/bold black]\tProteins: [bold black]{len(grouped_df)}[/bold black]")
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
        display(grouped_df [grouped_df .columns[3:]].reset_index(drop=True))
        grouped_df.to_csv(f'{complex[0]}_{search_term}_complx_CONF_{"_".join(complex[1].split())}_{len(grouped_df)}_proteins{now.strftime("_%Y_%m_%d")}.tsv', sep='\t',index = False) 

Unnamed: 0,genenames,synonyms,protein_name,disease,Link
0,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
1,DMXL2,KIAA0856,DmX-like protein 2,"Polyendocrine-polyneuropathy syndrome; Deafness, autosomal dominant, 71",https://www.uniprot.org/uniprotkb/Q8TDJ6


Unnamed: 0,genenames,synonyms,protein_name,disease,Link
0,ATP1A2,KIAA0778,Sodium/potassium-transporting ATPase subunit alpha-2,"Migraine, familial hemiplegic, 2; Alternating hemiplegia of childhood 1",https://www.uniprot.org/uniprotkb/P50993
1,DMXL1,XL1,DmX-like protein 1,None reported,https://www.uniprot.org/uniprotkb/Q9Y485
2,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
3,DMXL2,KIAA0856,DmX-like protein 2,"Polyendocrine-polyneuropathy syndrome; Deafness, autosomal dominant, 71",https://www.uniprot.org/uniprotkb/Q8TDJ6
4,WDR7,KIAA0541; TRAG,WD repeat-containing protein 7,None reported,https://www.uniprot.org/uniprotkb/Q9Y4E6


Unnamed: 0,genenames,synonyms,protein_name,disease,Link
0,DMXL1,XL1,DmX-like protein 1,None reported,https://www.uniprot.org/uniprotkb/Q9Y485
1,ROGDI,None reported,Protein rogdi homolog,Kohlschuetter-Toenz syndrome,https://www.uniprot.org/uniprotkb/Q9GZN7
2,DMXL2,KIAA0856,DmX-like protein 2,"Polyendocrine-polyneuropathy syndrome; Deafness, autosomal dominant, 71",https://www.uniprot.org/uniprotkb/Q8TDJ6
3,WDR7,KIAA0541; TRAG,WD repeat-containing protein 7,None reported,https://www.uniprot.org/uniprotkb/Q9Y4E6


**Keep in mind the disease entries are limited to the first two listed at UniProt.** 

These have been saved as tab-separated data. You may wish to download them, although the same information is already present in the prior saved tab-sepaarated data.

In [15]:
ls *complx_CONF_*

HuMAP2_01148_ROGDI_complx_CONF_Moderate_High_2_proteins_2024_11_12.tsv
HuMAP2_01834_ROGDI_complx_CONF_High_5_proteins_2024_11_12.tsv
HuMAP2_03388_ROGDI_complx_CONF_Very_High_4_proteins_2024_11_12.tsv


However, you might be admiring the output above concerning each complex and wish you didn't have to deal with Jupyter to see that. Or want to share that with someone or print it out separate from this notebook.  
Running the next cell will make an HTML file that will make that easier:

In [16]:
def getTableHTML(df):
    """
    From https://stackoverflow.com/a/49687866/2007153
    
    Get a Jupyter like html of pandas dataframe with header underline (except index)
    """
    styles = [
        #table properties
        dict(selector=" ", 
             props=[("margin","0"),
                    ("font-family",'"Helvetica", "Arial", sans-serif'),
                    ("border-collapse", "collapse"),
                    ("border","none")]),
        #background shading
        dict(selector="tbody tr:nth-child(even)",
             props=[("background-color", "#f4f4f4")]), # TO SHOW IT IS BEING USED AND NOT NORMAL PANDAS COLORING, change this from `#eee` to `#fee` # to add reddish tinge
        dict(selector="tbody tr:nth-child(odd)",
             props=[("background-color", "#fff")]),  
        #cell spacing
        dict(selector="td", 
             props=[("padding", ".5em")]),
        #header cell properties (excluding index)
        dict(selector="thead th:not(:first-child)", 
             props=[("font-size", "80%"),
                    ("text-align", "center"),
                    ("border-bottom", "2px solid #666"),
                    ("padding", ".5em")]),
        #index header cell properties (no border)
        dict(selector="thead th:first-child", 
             props=[("font-size", "80%"),
                    ("text-align", "center"),
                    ("padding", ".5em")]),
    ]
    return (df.style.set_table_styles(styles)).to_html()
collected_html = ""
for complex, grouped_df in grouped:
    collected_html += (f"Complex: <strong>{complex[0]}</strong>&emsp;Confidence: <strong>{complex[1]}</strong>&emsp;Proteins: <strong>{len(grouped_df)}</strong></br>")
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):
        #display(grouped_df [grouped_df .columns[3:]].reset_index(drop=True))
        collected_html += getTableHTML(grouped_df)
        collected_html += "</br></br></br>"
collected_html_fn = f'{search_term}_individ_complexes_details{now.strftime("_%Y_%m_%d")}.html'
%store collected_html >{collected_html_fn}

Writing 'collected_html' (str) to file 'ROGDI_individ_complexes_details_2024_11_12.html'.


Download that file so you can then share that HTML file and tell the recipient to open it with a browser. Or if you prefer a PDF, after downloading it to your own machine, open it in your browser and print to PDF. (**TIP**: when using `'File'` > `'Print'` for printing to a PDF on a Mac, toggle on '`Background graphics`' to get the nice shading you see in the HTML file; to find '`Background graphics`', in the Print Dialog box, click the drop-down for '`More Settings`' to reveal at the bottom '`Options`'  with '`Background graphics`' to the right of it.)

-----

### 'Adjacent-complex' proteins?

What can be done to scale up and go beyond just what complexes your favorite protein occurs in starts revealing the power of having this in conjunction with Python use?

We have a list of proteins our favorite protein is known to occur with in the complexes, i.e. 'complexed proteins'. What if we then go out another layer and collect all the complexes those 'complexed proteins' are in and highlight any new proteins represented? This would build a list of those proteins that share a complex protein but aren't in the query's protein complex.  This would build up a network of interactions our favorite protein may be involved in directly influencing.

So how do we get a list of 'Adjacent-complex' proteins observed in the complexes in the data? This is meant to show with Python, such a thing is easy and quick.

First, we define two groups. One is those we want to skip looking into further, with the query protein from above being top of the list. (You can further modify the `skip_proteins` one as you see fit.) We also need to collect all the other proteins to use to start collecting the proteins that interact with those. Since we left the query protein in or list above, we need to be sure to filter that out, that is where the first list also comes in. 

In [17]:
skip_proteins = [search_term] # you can put any other genenames or accession after
# that in quotes to also skip those for example: `skip_proteins = [search_term, 'ATP6V0A4']`
# They idea being to leave out any you expect to make it easier to clue in on any new.

In [18]:
ptuples = [(row['Uniprot_ACCs'], row['genenames']) for index, row in df_expanded.iterrows()]
unique_ptuples = list(set(ptuples))
# skip any in the `skip_proteins` list
unique_ptuples = [ptuple for ptuple in ptuples if all(element not in skip_proteins for element in ptuple)]

Now with those two lists in hand, go through those and collect the proteins shared in the complexes those we observed complexed with our query protein and see if any new ones come up.

In [19]:
# run the query on each collecting all proteins it occurs with and removing any already in skip_proteins or those in the complexes directly with the query protein
rd_df = pd.read_pickle('raw_complexes_pickled_df.pkl') # make sure in memory
adjacent_proteins_dfs = []
for current_acc, pn in unique_ptuples:
    pattern = fr'\b{current_acc}\b' # Create a regex pattern with word boundaries
    rows_with_term_df = rd_df[rd_df['Uniprot_ACCs'].str.contains(pattern, case=False, regex=True)].copy()
    # explode these to be entries per row
    # to prepare to use pandas `explode()` to do that, first make the content in be lists
    rows_with_term_df['Uniprot_ACCs'] = rows_with_term_df['Uniprot_ACCs'].str.split()
    rows_with_term_df['genenames'] = rows_with_term_df['genenames'].str.split()
    # Now use explode to create a new row for each element in both columns
    df_expanded2 = rows_with_term_df.explode(['Uniprot_ACCs', 'genenames']).copy()
    # Reset the index 
    df_expanded2 = df_expanded2.reset_index(drop=True)
    #remove those that are in `skip_proteins` list or already in the ptuples
    accs_in_ptuples = [i[0] for i in unique_ptuples]
    new_partners_df = df_expanded2[~df_expanded2['Uniprot_ACCs'].isin(accs_in_ptuples)]
    new_partners_df = new_partners_df[~new_partners_df['Uniprot_ACCs'].isin(skip_proteins)]
    new_partners_df = new_partners_df[~new_partners_df['genenames'].isin(skip_proteins)]
    adjacent_proteins_dfs.append(new_partners_df)
if adjacent_proteins_dfs:
    final_new_partners_df = pd.concat(adjacent_proteins_dfs, ignore_index=True)
else:
    rich.print("Nothing 'adjacent' identified.")

In [20]:
try:
    list_all_associated_adj_name_tuples = []
    for row in final_new_partners_df.itertuples():
        #print(row)
        list_all_associated_adj_name_tuples.extend((item1, item2) for item1, item2 in zip(row.Uniprot_ACCs.split(), row.genenames.split()))
    adj_df = pd.DataFrame(set(list_all_associated_adj_name_tuples), columns=['Uniprot_ACCs', 'genenames'])
    import rich
    rich.print(f"\n[bold black]THE {len(adj_df)} PROTEINS THAT AREN'T IN '{search_term}' COMPLEXES THAT ARE\nOBSERVED IN OTHER COMPLEXES WITH PROTEINS FOUND IN '{search_term}' COMPLEXES:[/bold black]\n")
    with pd.option_context('display.max_rows', None, 'display.max_columns', None):
        display(adj_df.style.hide())
except NameError:
    rich.print("Likely, nothing 'adjacent' identified; see above cell.")

Uniprot_ACCs,genenames
Q13733,ATP1A4
Q9BYI3,HYCC1
Q5VV41,ARHGEF16
Q86VZ1,P2RY8


--------

Enjoy!

While you could edit the search term above and start analyzing the data for a protein of interest, it is unlikely you are interested in just one proteins. Therefore it is suggested that first you continue on with next notebook in this series because it makes it clear how to programmatically do that for more than one protein and bundles the collected data for easy download, see 'Available Notebooks' listed [here](../index.ipynb). Or click [here to open the next notebook in this series](Making_many_hu.MAP3_reports_easily_using_Snakemake.ipynb).

See my [humap2-binder repo](https://github.com/fomightez/humap2-binder), my [humap3-binder repo](https://github.com/fomightez/humap3-binder), and [humap3-utilities](https://github.com/fomightez/structurework/humap3-utilities) for related information & resources for this notebook.