Loading concrete strength data using MatODM in ArangoDB
==========================================================
For this example we take UCI concrete strength database (https://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength). The UCI concrete strength database consist of one excel file with 9 columns with roughly 1029 data entries as shown below.  

In [1]:
import pandas as pd
df = pd.read_excel (r'data/Concrete_Data.xls', sheet_name='Sheet1')
df

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.052780
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075
...,...,...,...,...,...,...,...,...,...
1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28,44.284354
1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28,31.178794
1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28,23.696601
1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28,32.768036


Lets rename columns so that it is easy to call later

In [2]:
df = df.reset_index()
df = df.rename(columns={"Cement (component 1)(kg in a m^3 mixture)": "Cement", "Blast Furnace Slag (component 2)(kg in a m^3 mixture)": "BlastFurnaceSlag","Fly Ash (component 3)(kg in a m^3 mixture)":"FlyAsh",
                "Water  (component 4)(kg in a m^3 mixture)":"Water","Superplasticizer (component 5)(kg in a m^3 mixture)":"Superplasticizer","Coarse Aggregate  (component 6)(kg in a m^3 mixture)":"CoarseAgg",
                "Fine Aggregate (component 7)(kg in a m^3 mixture)":"FineAgg", "Age (day)":"Age","Concrete compressive strength(MPa, megapascals) ":"Strength"})
df


Unnamed: 0,index,Cement,BlastFurnaceSlag,FlyAsh,Water,Superplasticizer,CoarseAgg,FineAgg,Age,Strength
0,0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.052780
4,4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075
...,...,...,...,...,...,...,...,...,...,...
1025,1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28,44.284354
1026,1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28,31.178794
1027,1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28,23.696601
1028,1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28,32.768036


To store this data in database we create a very simple data model with only one document type storing each row. However to demonstrate key features of MatODM we develop a bit more complex data model which consists of two collections Mixes and Strengths. Each collection Mixes and strengths for this database has one type of doucument class. Mixes contains Mix and Strengths  contains strength. In addition we need two additional user defined physical quantites amt_in_mix and strength with preferred unit to store in database as kg/m^-3 and MPa, respectively. This data model is shown in UML as below 

![alt text](figs/concrete_data_model_uml.png "Concrete data model (UML)")

The equivalent implementation for MatODM is shown below. This Data model is independent of the database management system and implemented in DataModel.py. With this now we can start inserting data in the database

In [3]:
import sys
sys.path.append("..")
from MatODM import Documents, Fields
#we register fields and documents defined in DataModel.py
Documents.add_user_doc("Mix","DataModels.UCIConcreteDataModel")
Documents.add_user_doc("Strength","DataModels.UCIConcreteDataModel")
Fields.add_user_quantites("amt_in_mix", "kg m^-3")
Fields.add_user_quantites("strength", "MPa")
Fields.add_user_quantites("age","day")
#lets reload fields
from importlib import reload
reload(Fields)
reload(Documents)
Fields.strength?

[1;31mInit signature:[0m
[0mFields[0m[1;33m.[0m[0mstrength[0m[1;33m([0m[1;33m
[0m    [0mvalue[0m[1;33m:[0m [0mUnion[0m[1;33m[[0m[0mfloat[0m[1;33m,[0m [0mint[0m[1;33m,[0m [0mlist[0m[1;33m,[0m [0mnumpy[0m[1;33m.[0m[0mndarray[0m[1;33m][0m[1;33m,[0m[1;33m
[0m    [0munit[0m[1;33m:[0m [0mstr[0m[1;33m,[0m[1;33m
[0m    [0mstd_dev[0m[1;33m:[0m [0mUnion[0m[1;33m[[0m[0mfloat[0m[1;33m,[0m [0mint[0m[1;33m][0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mexperimental_technique[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpreferred_unit[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m [1;33m->[0m [1;32mNone[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      UserDefinedPhysicalQty(value: Union[float, int, list, numpy.ndarray], unit: str, std_dev: Union[float, int] = None, experimental_technique: str = None, preferred_unit: 

In [4]:
Documents.Mix?

[1;31mInit signature:[0m
[0mDocuments[0m[1;33m.[0m[0mMix[0m[1;33m([0m[1;33m
[0m    [0mname[0m[1;33m:[0m [0mstr[0m[1;33m,[0m[1;33m
[0m    [0mconstituent_amounts[0m[1;33m:[0m [0mDict[0m[1;33m[[0m[0mstr[0m[1;33m,[0m [0mMatODM[0m[1;33m.[0m[0mFields[0m[1;33m.[0m[0mPhysicalQty[0m[1;33m][0m[1;33m,[0m[1;33m
[0m    [0mhas_fly_ash[0m[1;33m:[0m [0mbool[0m[1;33m,[0m[1;33m
[0m    [0mhas_superplasticizer[0m[1;33m:[0m [0mbool[0m[1;33m,[0m[1;33m
[0m    [0mhas_blast_furnace_slag[0m[1;33m:[0m [0mbool[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m [1;33m->[0m [1;32mNone[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      Mix(name: str, constituent_amounts: Dict[str, MatODM.Fields.PhysicalQty], has_fly_ash: bool, has_superplasticizer: bool, has_blast_furnace_slag: bool)
[1;31mFile:[0m           d:\switchdrive\codes\matodm\tutorials\datamodels\uciconcretedatamodel.py
[1;31mType:[0m           MetaODM
[1;31mSubclasses:[0m     


In [5]:
Documents.Strength?

[1;31mInit signature:[0m
[0mDocuments[0m[1;33m.[0m[0mStrength[0m[1;33m([0m[1;33m
[0m    [0mmix[0m[1;33m:[0m [0mDataModels[0m[1;33m.[0m[0mUCIConcreteDataModel[0m[1;33m.[0m[0mMix[0m[1;33m,[0m[1;33m
[0m    [0mage[0m[1;33m:[0m [0mMatODM[0m[1;33m.[0m[0mFields[0m[1;33m.[0m[0mPhysicalQty[0m[1;33m,[0m[1;33m
[0m    [0mstrength[0m[1;33m:[0m [0mMatODM[0m[1;33m.[0m[0mFields[0m[1;33m.[0m[0mPhysicalQty[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m [1;33m->[0m [1;32mNone[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      Strength(mix: DataModels.UCIConcreteDataModel.Mix, age: MatODM.Fields.PhysicalQty, strength: MatODM.Fields.PhysicalQty)
[1;31mFile:[0m           d:\switchdrive\codes\matodm\tutorials\datamodels\uciconcretedatamodel.py
[1;31mType:[0m           MetaODM
[1;31mSubclasses:[0m     


The data model is now implemented and user_fields.json and user_docs.json has been created. These files are read when intializing Document and Fields module of MatODM. Advanced user can directly change this files instead of using add_user_quantites or add_user_doc functionality. 

Lets us now connect arangodb and initalize collections in the database.

In [6]:
from MatODM.Databases import ArangoDatabase
import json
with open("db_login_data.json","r") as f:
    logindata = json.load(f)
db = ArangoDatabase(logindata["dbname"], logindata["url"],logindata["user"], logindata["password"],verify_override =False) #verify_override to avoid SSL ceritifcate verification 
#lets create new collections 
if db.has_collection("Mixes"):
    db.delete_all_documents_from_collection("Mixes")
else:
    db.create_collection("Mixes")

if db.has_collection("Strengths"):
    db.delete_all_documents_from_collection("Strengths")
else:
    db.create_collection("Strengths")

Now lets start inserting all data into the database

In [7]:
for index, row in df.iterrows():
    mix_name = f"Mix{index}"
    constituents = {}
    constituents["cement"] = Fields.amt_in_mix(row["Cement"],"kg m^-3")
    constituents["water"]  = Fields.amt_in_mix(row["Water"],"kg m^-3")
    constituents["coarse_agg"] = Fields.amt_in_mix(row["CoarseAgg"],"kg m^-3")
    constituents["fine_agg"] = Fields.amt_in_mix(row["FineAgg"],"kg m^-3")
    if row["BlastFurnaceSlag"] > 0 :
        constituents["blast_furnance_slag"]= Fields.amt_in_mix(row["BlastFurnaceSlag"],"kg m^-3")
    if row["FlyAsh"] > 0: 
        constituents["fly_ash"] = Fields.amt_in_mix(row["FlyAsh"],"kg m^-3")
    if row["Superplasticizer"] > 0:
        constituents["superplasticizer"] = Fields.amt_in_mix(row["Superplasticizer"],"kg m^-3")
    mix = Documents.Mix(name= mix_name, constituent_amounts=constituents,has_fly_ash=bool(row["FlyAsh"] > 0), has_superplasticizer=bool(row["Superplasticizer"] > 0),has_blast_furnace_slag = bool(row["BlastFurnaceSlag"]>0))
    mix = db.insert(mix)
    strength = Documents.Strength(mix=mix,age=Fields.age(row["Age"], "day"),strength=Fields.strength(row["Strength"],"MPa"))
    strength = db.insert(strength)

To check if data  was correctly inserted in both collections we can check total number of documents in them and make sure it ends up being 1030 documents in each collection

In [8]:
print(f'Documents in collection Mixes: {db.collections["Mixes"].ndocs}')
print(f'Documents in collection Mixes: {db.collections["Strengths"].ndocs}')

Documents in collection Mixes: 1030
Documents in collection Mixes: 1030
