# Migrate from `v7.8.0` to `v8.0.0`

## Prerequisites

1. Start a MongoDB server on your local machine (or in a Docker container) and ensure it does **not** contain a database named `nmdc`.
2. Create and populate a **notebook configuration file** named `.notebook.env`.
    1. You can use the `.notebook.env.example` file as a template:
       ```shell
       cp .notebook.env.example .notebook.env
       ```
3. Create and populate **Mongo configuration files** for connecting to the origin and transformer Mongo servers.
    1. You can use the `.mongo.yaml.example` file as a template:
       ```shell
       cp .mongo.yaml.example .mongo.origin.yaml
       cp .mongo.yaml.example .mongo.transformer.yaml
       ```
       > When populating the file for the origin Mongo server, use root credentials since this notebook will be manipulating user roles on that server.
4. Run the cells in this notebook in order.

## Procedure

Install the third-party Python packages upon which this notebook (and `./helpers.py`) depends.

In [57]:
!python -m pip install -r requirements.txt

annotated-types==0.5.0
antlr4-python3-runtime==4.9.3
appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1649077682618/work
arrow==1.2.3
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1670263926556/work
attrs==23.1.0
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work
build==0.10.0
certifi==2023.5.7
cffi @ file:///Users/runner/miniforge3/conda-bld/cffi_1671179893800/work
CFGraph==0.2.1
chardet==5.2.0
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1678108872112/work
click==8.1.7
click-log==0.4.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1679481329611/work
conda==23.1.0
conda-package-handling @ file:///home/conda/feedstock_root/build

Import the Python objects upon which this notebook depends.

In [51]:
# Standard library packages:
from pprint import pformat
from pathlib import Path
from tempfile import NamedTemporaryFile

# Third-party packages:
import pymongo

# First-party packages:
from helpers import Config as Cfg

### Create MongoDB clients

Create MongoDB clients you can use to access the "origin" MongoDB server (i.e. the one containing the database you want to migrate) and the "transformer" MongoDB server (i.e. the one you want to use to perform the data transformations).

In [48]:
# MongoDB client for origin MongoDB server.
origin_mongo_client = pymongo.MongoClient(host=Cfg.origin_mongo_server_uri, directConnection=True)

# MongoDB client for transformer MongoDB server.
transformer_mongo_client = pymongo.MongoClient(host=Cfg.transformer_mongo_server_uri)

### Disable writing to the origin MongoDB database

To disable writing to the database, I will eventually set all users' roles (except the admin user) to `read` (i.e. read-only) with respect to the database. Before I carry out that plan, though, I will store the original users for future reference (so I can restore their original roles later).

Note: `pymongo` does not offer [`db.getUsers()`](https://www.mongodb.com/docs/manual/reference/method/db.getUsers/).

In [30]:
result: dict = origin_mongo_client["admin"].command("usersInfo")
users_initial = result["users"]

# Create temporary file in the notebook's folder, containing the initial users.
users_file = NamedTemporaryFile(delete=False, dir=str(Path.cwd()), prefix="tmp.origin_users_initial.")
users_file.write(bytes(pformat(users_initial), "utf-8"))
users_file.close()

Now that I've stored their original roles, I'll convert every `readWrite` role (with respect to the `nmdc` database) into just plain `read`.

In [None]:
for user in users_initial:

    break  # Abort! TODO: Remove me when I'm ready to run this notebook for real.

    if any((role["db"] == "nmdc") for role in user["roles"]):
        origin_mongo_client["admin"].command("grantRolesToUser", user["user"], roles=[{ "role": "read", "db": "nmdc" }])
        origin_mongo_client["admin"].command("revokeRolesFromUser", user["user"], roles=[{ "role": "readWrite", "db": "nmdc" }])


### Dump the necessary collections from the origin database

In this case, I'll dump the `extraction_set` collection only.

References:
- https://www.mongodb.com/docs/database-tools/mongodump/
- https://www.mongodb.com/docs/database-tools/mongodump/#std-option-mongodump.--config (`--config` option)

In [None]:
# Dump the database from the origin MongoDB server.
!{mongodump} \
  --config="{Cfg.origin_mongo_config_file_path}" \
  --db="nmdc" \
  --gzip \
  --collection="extraction_set" \
  --out="{Cfg.origin_dump_folder_path}"

### Restore the database into the transformer MongoDB server

References:
- https://www.mongodb.com/docs/database-tools/mongorestore/
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--config (`--config` option)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--drop (`--drop` to drop the existing collection)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--preserveUUID (`--preserveUUID` to use the existing UUIDs from the dump)

In [None]:
# Restore the database to the transformer MongoDB server.
!{mongorestore} \
  --config="{Cfg.transformer_mongo_config_file_path}" \
  --gzip \
  --drop --preserveUUID \
  --dir="{Cfg.origin_dump_folder_path}"

### Transform the database

Now that the transformer database contains a copy of the subject database, we can transform it there.

Source: https://github.com/microbiomedata/nmdc-schema/blob/7802f295cfc80d056f9c73c79636802926be40ee/nmdc_schema/migration_recursion.py#L38
- Note: source will need to be changed when the above link is merged into main.
- Note: "NEXT" in the function's name will need to be changed to correct version number
- removed logger calls
- removed commented out line

References:
- https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.replace_one

In [None]:
# <copy_pasted_snippet from="https://github.com/microbiomedata/nmdc-schema/blob/7802f295cfc80d056f9c73c79636802926be40ee/nmdc_schema/migration_recursion.py#L38">

def migrate_extractions_7_8_0_to_8_0_0(retrieved_extraction):

    if "sample_mass" in retrieved_extraction:
        retrieved_extraction['input_mass'] = retrieved_extraction.pop('sample_mass')
        
    return retrieved_extraction

# </copy_pasted_snippet>

In [None]:
# Make a transformed version of each extraction in the transformer database.
transformed_extractions = []
for extraction in transformer_mongo_client["nmdc"]["extraction_set"].find():
    print(extraction)
    transformed_extraction = migrate_extractions_7_8_0_to_8_0_0(extraction)
    transformed_extractions.append(transformed_extraction)
    print(transformed_extraction)

# Replace the original versions with the transformed versions of themselves (in the transformer database).
for transformed_extraction in transformed_extractions:
    transformer_mongo_client["nmdc"]["extraction_set"].replace_one({"id": {"$eq": transformed_extraction["id"]}}, transformed_extraction)


### Validate the transformed database

In [None]:
# TODO

### Dump the transformed database

In [None]:
# Dump the database from the transformer MongoDB server.
!{mongodump} \
  --config="{Cfg.transformer_mongo_config_file_path}" \
  --db="nmdc" \
  --gzip \
  --out="{Cfg.transformer_dump_folder_path}"

### Put the transformed data into the origin MongoDB server

In the case of this migration, given how focused the transformation was (i.e. only the `extraction_set` collection was affected), I will restore **only** the `extraction_set` collection to the origin server.

References:
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--nsInclude (`--nsInclude` to specify which collections to affect)
- https://www.mongodb.com/docs/database-tools/mongorestore/#std-option-mongorestore.--dryRun (`--dryRun` can be used to preview the outcome)

In [None]:
# Drop the original `extraction_set` collection from the origin server,
# and restore the transformed `extraction_set` collection into its place.
!{mongorestore} \
  --config="{Cfg.origin_mongo_config_file_path}" \
  --gzip \
  --verbose \
  --dir="{Cfg.transformer_dump_folder_path}" \
  --nsInclude="nmdc.extraction_set" \
  --drop --preserveUUID

Now that I've restored the database, I'll restore the original user roles (with respect to the `nmdc` database).

In [None]:
for user in users_initial:

    break  # Abort! TODO: Remove me when I'm ready to run this notebook for real.

    if any((role["db"] == "nmdc" and role["role"] == "readWrite") for role in user["roles"]):
        origin_mongo_client["admin"].command("grantRolesToUser", user["user"], roles=[{ "role": "readWrite", "db": "nmdc" }])
        origin_mongo_client["admin"].command("revokeRolesFromUser", user["user"], roles=[{ "role": "read", "db": "nmdc" }])

### About db.fsyncLock() and db.fsyncUnlock()

I chose not to use `db.fsyncLock()`/`db.fsyncUnlock()` as the method of disabling/re-enabling write access, because I want to be able to `mongorestore` a database while write access is still disabled. `db.fsyncLock()` would have disabled write access at the `mongod` level, preventing database-level write operations (but still allowing a system administrator to "backup" database **files** via `cp`, `scp`, `tar`, etc.

Reference: https://www.mongodb.com/docs/manual/reference/method/db.fsyncLock/#mongodb-method-db.fsyncLock

## Clean up

You may want to manually delete the `.tmp.*` files that this notebook created in its folder. Some of them contain MongoDB passwords.