# Migrate from `v7.8.0` to `v8.0.0`

## Prerequisites

1. Start a MongoDB server on your local machine (or in a Docker container) and ensure it does **not** contain a database named `nmdc`.
2. Create and populate a **notebook configuration file** named `.notebook.env`.
    1. You can use the `.notebook.env.example` file as a template:
       ```shell
       $ cp .notebook.env.example .notebook.env
       ```
3. Create and populate **Mongo configuration files** for connecting to the origin and transformer Mongo servers.
    1. You can use the `.mongo.yaml.example` file as a template:
       ```shell
       $ cp .mongo.yaml.example .mongo.origin.yaml
       $ cp .mongo.yaml.example .mongo.transformer.yaml
       ```
       > When populating the file for the origin Mongo server, use root credentials since this notebook will be manipulating user roles on that server.
4. Run the cells in this notebook in order.

## Procedure

### Install dependencies

Install the third-party Python packages upon which this notebook depends.

In [None]:
!python -m pip install -r requirements.txt

Import the Python objects upon which this notebook depends.

In [None]:
# Standard library packages:
from pathlib import Path
from pprint import pformat
from shutil import rmtree
from tempfile import NamedTemporaryFile

# Third-party packages:
import pymongo

# First-party packages:
from helpers import Config

### Parse configuration files

Parse the notebook and Mongo configuration files.

In [None]:
cfg = Config()

### Create MongoDB clients

Create MongoDB clients you can use to access the "origin" MongoDB server (i.e. the one containing the database you want to migrate) and the "transformer" MongoDB server (i.e. the one you want to use to perform the data transformations).

In [None]:
# MongoDB client for origin MongoDB server.
origin_mongo_client = pymongo.MongoClient(host=cfg.origin_mongo_server_uri, directConnection=True)

# MongoDB client for transformer MongoDB server.
transformer_mongo_client = pymongo.MongoClient(host=cfg.transformer_mongo_server_uri)

### Disable writing to the origin MongoDB database

To disable writing to the database, I will eventually set all users' roles (except the admin user) to `read` (i.e. read-only) with respect to the database. Before I carry out that plan, though, I will store the original users for future reference (so I can restore their original roles later).

Note: `pymongo` does not offer [`db.getUsers()`](https://www.mongodb.com/docs/manual/reference/method/db.getUsers/).

In [None]:
result: dict = origin_mongo_client["admin"].command("usersInfo")
users_initial = result["users"]

# Create temporary file in the notebook's folder, containing the initial users.
users_file = NamedTemporaryFile(delete=False, dir=str(Path.cwd()), prefix="tmp.origin_users_initial.")
users_file.write(bytes(pformat(users_initial), "utf-8"))
users_file.close()

Now that I've stored their original roles, I'll convert every `readWrite` role (with respect to the `nmdc` database) into just plain `read`.

In [None]:
for user in users_initial:

    break  # Abort! TODO: Remove me when I'm ready to run this notebook for real.

    if any((role["db"] == "nmdc") for role in user["roles"]):
        origin_mongo_client["admin"].command("grantRolesToUser", user["user"], roles=[{ "role": "read", "db": "nmdc" }])
        origin_mongo_client["admin"].command("revokeRolesFromUser", user["user"], roles=[{ "role": "readWrite", "db": "nmdc" }])


### Dump the necessary collections from the origin database

In this case, I'll dump the `extraction_set` collection only.

References:
- https://www.mongodb.com/docs/database-tools/mongodump/
- https://www.mongodb.com/docs/database-tools/mongodump/#std-option-mongodump.--config (`--config` option)

In [None]:
# Dump the database from the origin MongoDB server.
!{mongodump} \
  --config="{cfg.origin_mongo_config_file_path}" \
  --db="nmdc" \
  --gzip \
  --collection="extraction_set" \
  --out="{cfg.origin_dump_folder_path}"

### Restore the database into the transformer MongoDB server

References:
- https://www.mongodb.com/docs/database-tools/mongorestore/
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--config (`--config` option)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--drop (`--drop` to drop the existing collection)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--preserveUUID (`--preserveUUID` to use the existing UUIDs from the dump)

In [None]:
# Restore the database to the transformer MongoDB server.
!{mongorestore} \
  --config="{cfg.transformer_mongo_config_file_path}" \
  --gzip \
  --drop --preserveUUID \
  --dir="{cfg.origin_dump_folder_path}"

### Transform the database

Now that the transformer database contains a copy of the subject database, we can transform it there.

Source: https://github.com/microbiomedata/nmdc-schema/blob/7802f295cfc80d056f9c73c79636802926be40ee/nmdc_schema/migration_recursion.py#L38
- Note: source will need to be changed when the above link is merged into main.
- Note: "NEXT" in the function's name will need to be changed to correct version number
- removed logger calls
- removed commented out line

References:
- https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.replace_one

In [None]:
# <copy_pasted_snippet from="https://github.com/microbiomedata/nmdc-schema/blob/7802f295cfc80d056f9c73c79636802926be40ee/nmdc_schema/migration_recursion.py#L38">

def migrate_extractions_7_8_0_to_8_0_0(retrieved_extraction):

    if "sample_mass" in retrieved_extraction:
        retrieved_extraction['input_mass'] = retrieved_extraction.pop('sample_mass')
        
    return retrieved_extraction

# </copy_pasted_snippet>

In [None]:
# Make a transformed version of each extraction in the transformer database.
transformed_extractions = []
for extraction in transformer_mongo_client["nmdc"]["extraction_set"].find():
    print(extraction)
    transformed_extraction = migrate_extractions_7_8_0_to_8_0_0(extraction)
    transformed_extractions.append(transformed_extraction)
    print(transformed_extraction)

# Replace the original versions with the transformed versions of themselves (in the transformer database).
for transformed_extraction in transformed_extractions:
    transformer_mongo_client["nmdc"]["extraction_set"].replace_one({"id": {"$eq": transformed_extraction["id"]}}, transformed_extraction)


### Validate the transformed database

In [None]:
# TODO

### Dump the transformed database

In [None]:
# Dump the database from the transformer MongoDB server.
!{mongodump} \
  --config="{cfg.transformer_mongo_config_file_path}" \
  --db="nmdc" \
  --gzip \
  --out="{cfg.transformer_dump_folder_path}"

### Put the transformed data into the origin MongoDB server

In the case of this migration, given how focused the transformation was (i.e. only the `extraction_set` collection was affected), I will restore **only** the `extraction_set` collection to the origin server.

References:
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--nsInclude (`--nsInclude` to specify which collections to affect)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--dryRun (`--dryRun` can be used to preview the outcome)

In [None]:
# Drop the original `extraction_set` collection from the origin server,
# and restore the transformed `extraction_set` collection into its place.
!{mongorestore} \
  --config="{cfg.origin_mongo_config_file_path}" \
  --gzip \
  --verbose \
  --dir="{cfg.transformer_dump_folder_path}" \
  --nsInclude="nmdc.extraction_set" \
  --drop --preserveUUID

Now that I've restored the database, I'll restore the original user roles (with respect to the `nmdc` database).

In [None]:
for user in users_initial:

    break  # Abort! TODO: Remove me when I'm ready to run this notebook for real.

    if any((role["db"] == "nmdc" and role["role"] == "readWrite") for role in user["roles"]):
        origin_mongo_client["admin"].command("grantRolesToUser", user["user"], roles=[{ "role": "readWrite", "db": "nmdc" }])
        origin_mongo_client["admin"].command("revokeRolesFromUser", user["user"], roles=[{ "role": "read", "db": "nmdc" }])

## Clean up

Delete the Mongo dumps created by this notebook.

> Note: You can skip this step, in case you want to examine the folders and delete them manually.

In [None]:
paths_to_folders_to_delete = [
    cfg.origin_dump_folder_path,
    cfg.transformer_dump_folder_path,
]

for path in paths_to_folders_to_delete:
    try:
        rmtree(path)
        print(f"Deleted: {path}")
    except:
        print(f"Failed to delete: {path}")