Import important module for usage

To install uniprotparser you can use pip with the following command

```shell
python -m pip install uniprotparser
```

In [None]:
from uniprotparser import betaparser
import pandas as pd
import asyncio
import io

Then load the file and create a set of uniprot accession id from the uniprot accession id column

In this file the uniprot accession id column is `PG.ProteinGroups`

Each entry in the column would first be split by `;` in order to parse all synonyms and variant of the protein. The parsed Uniprot accession would be added into `acc` `set`

In [None]:
d = pd.read_csv(r"C:\Users\Toan Phung\Downloads\test_Copies_02.txt", sep="\t")
acc = set()
for a in d["PG.ProteinGroups"]:
    if pd.notnull(a):
        for i in a.split(";"):
            accession = betaparser.UniprotSequence(i.strip(), parse_acc=True)
            if accession.accession:
                acc.add(accession.accession)


Using `UniprotParser` class to get Uniprot data from the Uniprot web database through its legacy REST api and return a `tab` file which is tabulated text format.

The file would be read into a `pandas` DataFrame.

However, differ to the synchronous method of parsing which use `requests` as the http client, asynchronous parsing requires the library `aiohttp` and with that it would also require a different syntax.

In [None]:
async def main():
    parser = betaparser.UniprotParser()
    df = []
    async for r in parser.parse_async(ids=acc):
        df.append(pd.read_csv(io.StringIO(r), sep="\t"))
    # Again iterating through the original file, splitting the id and obtain accession from the entry. Check if the accession is in the obtain Uniprot data.
    # If the accession is in, add the molecular weight to the original dataframe and move on to the next entry.
    if len(df) > 0:
        df = pd.concat(df, ignore_index=True)
    else:
        df = df[0]

    for i, r in d.iterrows():
        acc = r["PG.ProteinGroups"].split(";")
        for a in acc:
            accession = parser.UniprotSequence(a.strip(), parse_acc=True)
            df2 = df[df["Entry"] == accession.accession]
            for i2, r2 in df2.iterrows():
                d.at[i, "Mol.wt"] = r2["Mass"]
                break
    # Write the modified dataframe into another file.
    d.to_csv(r"C:\Users\Toan Phung\Downloads\test_Copies_02.txt", index=False, sep="\t")

asyncio.run(main())