# METASPACE bulk reannotation
This notebook shows how to reannotate multiple METASPACE datasets against a new database.

<div class="alert alert-info"> 

You can also download this notebook [here](https://git.embl.de/mattausc/outer-spacem/-/raw/master/docs/examples/intracell_selection/bulk_reannotation.ipynb?inline=false) or as part of our [collection of SpaceM analysis notebooks](https://git.embl.de/grp-alexandrov/spacem-analysis-notebooks).

</div>

### Setup

Before running this notebook, ensure that you have [set up your API key](https://metaspace2020.readthedocs.io/en/latest/content/examples/fetch-dataset-annotations.html#Connect-to-the-sm-server) for METASPACE!

In [1]:
from metaspace import SMInstance

In [2]:
sm = SMInstance()

In [6]:
# IDs of datasets to reannotate

datasets = [
    "2021-10-27_00h20m47s", # Well 8
    # "2021-10-27_00h05m07s" # Well 3
] 

If you want to reannotate all datasets within a project, you can also download the project's metadata as a CSV file:  

![](project_export.png)


...which you can then import into this notebook to get the dataset ids:

In [7]:
# import pandas as pd

# metadata = pd.read_csv("metaspace_datasets.csv", skiprows=2)
# metadata.head()

In [8]:
# datasets = metadata.datasetId.to_list()

### Selecting the database for reannotation

If you are unsure which ID corresponds to the database you want to reannotate against, you can determine it based on its name and version:

In [9]:
# sm.database(name="Gastrosome_DrugW8_FeedingW3_intra_ions", version="v1").id

<div class="alert alert-info"> 

**Note:** If this returns nothing this database/version does not exist!

</div>

Once you do have your database's ID, enter it here:

In [10]:
new_db_id = 532 # (Well8)
# new_db_id = 531 # (Well3)

<div class="alert alert-info"> 

**Note:** the dataset(s) will be reannotated against the new databases **in addition to the ones already annotated against.**

</div>

### Submitting datasets for reannotation

In [11]:
for ds_id in datasets:
    ds = sm.dataset(id=ds_id)
    print(ds.name)
    database_ids = [db["id"] for db in ds.database_details]
    database_ids
    if new_db_id not in database_ids:
        new_databases = database_ids + [new_db_id]
        print("Adding new db...")
        sm.update_dataset_dbs(ds.id, new_databases, ds.adducts)
    else:
        print("Dataset has already been annotated against this database!")

2021-28-09_Gastrosome_Slide6Drugs_Well8_150x150_a29ss25_DHBpos
Adding new db...


Once METASPACE has finished reannotion of your datasets, open up SpaceM again, load the reannotated dataset and move to the Dataset Reprocessing step, where you will now be able to select the new database.