# DSI Diana Tutorial and getting started

The goal of the Data Science Infrastructure Project ([DSI](https://github.com/lanl/dsi)) is to provide a flexible, AI-ready metadata query capability which returns data subject to strict, POSIX-enforced file security. In this tutorial, you will learn how to:
 - initialize a DSI instance
 - load Tier 1 metadata into DSI
 - check the metadata loaded
 - query the metadata
 - load Tier 2 and Tier 3 metadata into DSI
 - apply a complex schema
 - use DSI Sync to index and move data and metadata

This tutorial uses data from the [Cloverleaf3D](https://github.com/UK-MAC/CloverLeaf3D) Lagrangian-Eulerian hydrodynamics solver. Data is provided in dsi/examples/clover3d/. Prior to running the tutorial, follow the instructions in the [Quick Start: Installation](https://lanl.github.io/dsi/installation.html) to set up DSI.



In [None]:
from dsi.dsi import DSI

In [None]:
# Create instance of DSI
baseline = DSI()

# Reading Metadata into DSI

For this tutorial, we will use cloverleaf 3d data available in our repository. 

* To pull the repository, you wil need to git clone https://github.com/lanl/dsi.git
* To access, go to examples/clover3d

The data is from [Cloverleaf3D](https://github.com/UK-MAC/CloverLeaf3D), a Lagrangian-Eulerian hydrodynamics solver.

The data is an **ensemble** of 8 runs, and has 5 metadata products of interest:

* genesis_datacard.xlsx - data card
* clover.in - input deck
* clover.out - simulation statistics
* timestamps.txt - time when simulation was launched on slurm
* viz files - insitu outputs in vtk format


In [None]:
from IPython.display import HTML

HTML("""
<video width="256" height="208" controls loop>
  <source src="clover3d/movie.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")


To begin the ingest:

In [None]:
# Target backend defaults to SQLite since not defined
store = DSI("dsi-diana-tutorial.db")

# Read in Datacard (Tier 1)
store.read("clover3d/genesis_datacard.xlsx", 'GenesisDatacard')

# Read in Tier 2 and Tier 3 metadata
# dsi.read(path, reader)
store.read("./clover3d/", 'Cloverleaf')

# Exploring the loaded metadata

In [None]:
# How many tables do we have
store.num_tables()

In [None]:
# Let's see what tables were created
store.list()

In [None]:
# Let's get more details about the data
store.summary()

In [None]:
# Preview the contents of the visualization files
store.display("viz_files")

# DSI Find to search within the metadata

DSI's find capability lets you explore your data by performing queries with the following modifiers, such as >, <, >=, <=, =, ==, ~ (contains), ~~ (contains), !=, and (X, Y) for a range between values X and Y. Additionally, by adding a "True" input will return you a collection.

In [None]:
# Search string or value within all tables
store.find("wall_clock > 0.10")

In [None]:
# Perform a find and receive a collection
find_list = store.find("state2_density==8.0", True) # Use True to return a collection

In [None]:
# Simply display what this collection (pandas dataframe) looks like
find_list

In [None]:
find_list = store.find("time>3.0", True)

In [None]:
find_list

In [None]:
find_list = store.find("time(1.0,1.1)", True)

In [None]:
find_list

# Query DSI

DSI Supports direct SQL queries to the metadata that you have ingested

In [None]:
# Use sql statement to directly query the backend store
store.query("SELECT sim_id, xmin, ymin, xmax, ymax, state2_density FROM input") # Adding 'True' gives a collection

In [None]:
store.list()

In [None]:
# alternative to "query()" if you want to get a whole table
store.get_table("genesis_datacard", True) # Adding 'True' gives a collection

# DSI Write - Complex Schemas

By formatting your metadata and putting it into DSI, you have essentially created a schema. DSI also has support to represent complex schemas by defining relations. For example, if you would like to relate the different tables together you can use the schema reader which takes in a .json file.

* schema.json

Before defining and ingesting a complex schema, what does an Entity Relationship Diagram look like in our initial schema?

* To run this portion of the example, the graphviz package is required

pip install graphviz

(optional) brew install graphviz

In [None]:
store.write("clover_er_diagram_no_schema.png", "ER_Diagram")

from IPython.display import Image
Image(filename="clover_er_diagram_no_schema.png", width=200)

In [None]:
# Create a new database where we will relate a complex schema
schema_store = DSI("diana_schema_tutorial.db")

# dsi.schema(filename)
schema_store.schema("./clover3d/schema2.json") # Schema neeeds to be defined before reading Cloverleaf data

# Read in Tier 2 and Tier 3 metadata
# dsi.read(path, reader)
schema_store.read("./clover3d/", 'Cloverleaf')

# Read in Datacard (Tier 1)
schema_store.read("clover3d/genesis_datacard.xlsx", 'GenesisDatacard')

# dsi.write(filename, writer)
schema_store.write("clover_er_diagram.png", "ER_Diagram")

To preview the Entity Realationship Diagram (ERDiagram), import libraries to display images

In [None]:
from IPython.display import Image
Image(filename="clover_er_diagram.png", width=300)

# DSI Write - CSV

DSI Support the output (write) of metadata if you would like to export into another project. For example, here we want to export the table "input" into a csv file.

In [None]:
store.write("input.csv", "CSV", "input")

# Ending your workflow

In [None]:
store.close()
schema_store.close()

# Reloading your workflow

In [None]:
# Target backend defaults to SQLite since not defined
store = DSI("dsi-diana-tutorial.db")
store.summary()

# Moving your data and metadata with DSI

In [None]:
from dsi.core import Sync

In [None]:
#Origin
local_files = "./clover3d/"
#Remote (Assuming on a Macbook, otherwise change to other location)
remote_path = "/Users/Shared/staging/"

In [None]:
# Create Sync type with project name
s = Sync("dsi-diana-tutorial")

In [None]:
s.index(local_files,remote_path,True) # These are user defined for now

In [None]:
store.summary()

In [None]:
s.copy("copy",True)