## IOT Readings - Embbed all readings or not?
Test dataset to evaluate RU cost for collocating IOT readings in a single document or to have multiple documents with N readings. </br>
Maximum number of readings: 10000

### Prepare key references for notebook
Environment variables COSMOS_ACCOUNT_URI and COSMOS_ACCOUNT_KEY should exist

In [1]:
from azure.cosmos import CosmosClient, PartitionKey
from faker import Faker

import os
import json
import uuid

cosmosAccountURI = os.environ['COSMOS_ACCOUNT_URI']
cosmosAccountKey = os.environ['COSMOS_ACCOUNT_KEY']

databaseName = 'Models'
containerName = 'IOTEnergyTelemetry'
partitionKeypath = '/PartitionKey'

Faker.seed(0)
fake = Faker(['en-US'])

### Create DB and collection for sample data
If resources already exists, just get references for database and container.

In [2]:
client = CosmosClient(cosmosAccountURI, cosmosAccountKey)
db = client.create_database_if_not_exists(databaseName)

pkPath = PartitionKey(path=partitionKeypath)
ctr = db.create_container_if_not_exists(id=containerName, partition_key=pkPath, offer_throughput=1000) 

### Repeatable Reference lists
Contains documents that should have consistent across cell/operations. </br>
For example: Patient, IOT devices, tax payer information, ...

In [5]:
from collections import OrderedDict
maxRange = 10000
IOTSources = []

os.makedirs(os.path.dirname('./OutputFiles/'), exist_ok=True)
with open('./OutputFiles/' + containerName + '_referenceData.json', 'w') as jsonFile:
    for i in range(maxRange):
        entity = {            
            'Name': fake.bothify('????_############') ## SiteId_WellId
            , 'Type': fake.random_element(elements=('Type1', 'Type2', 'Type3', 'Type4', 'Type5', 'Type6'))
        }
        IOTSources.append(entity)

        # Save patients for reference
        json.dump(entity, jsonFile)
        if (i < maxRange):
            jsonFile.write(',\n')

### Load sample documents in the container
Save generated documents in 'Output' directory

In [7]:
RUCharges = []
docs = []
maxRange = 100

for j in range(maxRange):
    IOTSrc = IOTSources[fake.random_int(min=0, max=9999)]    
    readings = []

    # *** Produce N readings
    # Execute this repeatedly by changing range from 1, 100, 1000 and 10000
    for r in range(100):        
        readings.append(
            {
                'Dimension': fake.random_element(elements=('D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9'))
                , 'Value': fake.random_number(digits=5)
                , 'Timestamp': fake.date_time_this_year().isoformat()
            })

    doc = {
        'id': str(uuid.uuid4())
        , 'PartitionKey': fake.bothify('##_') + IOTSrc['Name'] + '_' + IOTSrc['Type']
        , 'Entity': IOTSrc
        , 'Readings': readings
        , 'class': fake.random_element(elements=OrderedDict([("A", 0.40), ("B", 0.35), ("C", 0.15), ("D", 0.05), ("E", 0.05)]))
    }

    # Create items, record charges and store docs locally (optional)
    # print(doc)
    ctr.create_item(doc)
    RUCharges.append(float(ctr.client_connection.last_response_headers['x-ms-request-charge']))
    # print(ctr.client_connection.last_response_headers['x-ms-request-charge'])
    docs.append(doc)

with open('./OutputFiles/' + containerName + '_docs.json', 'w') as jf:
    for d in docs:
        json.dump(d, jf)
        jf.write(',\n')

In [9]:
# print(RUCharges)
print('Average RU charge: ' + str(sum(RUCharges) / len(RUCharges)))

Average RU charge:9.519999999999998


### Adjust index policy
The readings being stored in that container should not require indexing, as most of the operations should fetch documents based on Device/Time and then compute the values. Therefore application will not benefit from default indexing. <br/>
Execute cell below to adjust indexPolicy and then re-execute cell to load data (adjusting number of readings)

In [None]:
# containerPath = 'dbs/'+ databaseName +'/colls/' + containerName
# container = db.get_container_client(container=containerName)

indexPolicy = {
    "indexingMode":"consistent",
    "includedPaths":[
        {"path":"/PartitionKey/?"}
        , {"path":"/Entity/*"}
        , {"path":"/Class/?"}
        , {"path":"/_ts/?"}
        ]
    , "excludedPaths":[{"path":"/*"}]
}

db.replace_container(containerName, pkPath, indexing_policy=indexPolicy)


<ContainerProxy [dbs/Models/colls/IOTEnergyTelemetry]>


### Clean up code

In [10]:
# Assume objects are instantiated
db.delete_container(containerName)