# Semantic Heterogeneous Database Simulations

Let's start generating random records and semantic operations, which will be used to execute performance tests

In [12]:
import time
import pandas as pd
import numpy as np
from database_generator import DatabaseGenerator
from datetime import datetime

## Load Phase

Inserting all records generated in the semantic heterogeneous database. Please note PyMongo library may diminish performance of insertions. However, because the simulator internally uses it, it is fair to also use it on our baseline test, so these delays might net. 

### First scenario
Inserting all records and adding the semantic operations afterwards. 

In [13]:
d = DatabaseGenerator()
d.generate(number_of_records=100000, number_of_versions=1, number_of_fields=11,number_of_values_in_domain=20)
records = pd.DataFrame(d.records)

In [14]:

start = time.time()
d.collection.insert_many_by_dataframe(records, 'valid_from_date')
end = time.time()
load_phase_s = end - start
print(str(load_phase_s))

9.252664804458618


In [15]:
for i in range(4):
    d.generate_version()
    
start = time.time()
for operation in d.operations:    
    d.collection.execute_operation(operation[0],operation[1],operation[2])
end = time.time()
load_phase_versions = end-start
print(load_phase_versions)

22.656521320343018


## Test 1 - Querying

Performance of querying generated database using only one field when adding more versions. 

Every test is executed 100 times. 95% confidence interval of execution time is taken

## Test 2 - Insertion

Performance of inserting new records 