In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'  # default is ‘last_expr'

%load_ext autoreload
%autoreload 2

In [2]:
import os

from azure.cosmos.cosmos_client import CosmosClient

# Upsert items

This notebook is used to update or add an item to the `datasets` table.

## Connect to the Cosmos DB instance

`COSMOS_ENDPOINT` and `COSMOS_WRITE_KEY` need to be environment variables. 

In [4]:
# Initialize Cosmos DB client
url = os.environ['COSMOS_ENDPOINT']
key = os.environ['COSMOS_WRITE_KEY']
client = CosmosClient(url, credential=key)

database = client.get_database_client('camera-trap')
container_datasets = database.get_container_client('datasets')
container_sequences = database.get_container_client('sequences')  # not used here

## Upsert an item
in the `datasets` container.

When you're *updating* an existing item instead of *inserting* a new item, you need to find its `id` and include it in the `item_to_update`.

In [21]:
# dict-like object representing the item to update or insert
item_to_upsert = {
    "access": [
      "internal"
    ],
    "comment": "...",
    "container": "container_name",
    "container_sas_key": "...",
    "dataset_name": "dataset_name",
    "path_prefix": "prefix",
    "storage_account": "storage_account"
  }

In [None]:
%%time

container_datasets.upsert_item(item_to_upsert)

Check that the `datasets` table is updated. 

The view in the Data Explorer on Azure Portal will remain outdated for a while it seems...

In [23]:
%%time

query = '''SELECT * FROM datasets d'''

result_iterable = container_datasets.query_items(query=query, enable_cross_partition_query=True)

datasets = {i['dataset_name']:{k: v for k, v in i.items() if not k.startswith('_')} for i in iter(result_iterable)}

print('Length of results:', len(datasets))

Length of results: 22
CPU times: user 5.71 ms, sys: 1.66 ms, total: 7.37 ms
Wall time: 113 ms


In [24]:
len(datasets)

22