# Demo 3: Use Information from DaQSS

Under the consideration of two possible use cases, this third demonstration shows how DQ results and their metadata stored in DaQSS can be utilized.

1. Import required Python packages:


In [6]:
import os

os.environ["TQDM_DISABLE"] = "1"  # Disable all progress bars, to retain a clear output

from cobadq import *  # import the Python package for the constraint-based aggregation
from daqss import *  # for accessing the database part of DaQSS
from sqlalchemy import text  # Used to directly query the database

import pandas
# for the representation of data and the computation of DQ measurement results 

In [7]:
d = DaQSS()  # Create a DaQSS object

## Analysis of DQ Results

In this use case, the following question is answered:

Which records of the fake customer data have an DQ measurement result value less than 0.5 for the metric "arith_mean_completeness_per_row" (cf. [the first demonstration](../demo1))?

By directly using an [SQLAlchemy Connection](https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Connection) arbitrary queries can be executed on the database of DaQSS (cf.  [the documentation of the schema](https://johannes.schrott.onl/daqss/database_docs)).

In [3]:
fake_customer_global_identifier: str = "https://johannes.schrott.onl/fake_customer_data/fake_customer_data.csv"
metric_name = "arith_mean_completeness_per_row"

with d.connect() as connection:
    result: list = list(connection.execute(
        text("SELECT data_element_local_identifier, result_value FROM data_element, dq_result " +
             f"WHERE dq_result.calculated_by_dq_metric = :metric " +
             "AND dq_result.computed_on_data_element_global_id = data_element.data_element_global_identifier " +
             f"AND data_element.parent_data_element_global_identifier = :id " +
             "AND dq_result.result_value < 0.5"), {"metric": metric_name, "id": fake_customer_global_identifier}
    )
    )
    result.sort(key=lambda row: row[1])
    print(result[:5])  # For demonstration purpose, return the 5 smallest metric results

[('25221.0', Decimal('0.2857142857142857')), ('78828.0', Decimal('0.2857142857142857')), ('73946.0', Decimal('0.2857142857142857')), ('39076.0', Decimal('0.2857142857142857')), ('76087.0', Decimal('0.2857142857142857'))]


## Reuse of Definitions
This use case shows through a small toy example, how DQ metrics can be reused:

In [8]:
metric_name = "arith_mean_completeness_per_row"

metric = d.retrieve_dq_metric_implementation_by_name("arith_mean_completeness_per_row")

demo_dataframe = pandas.DataFrame({"index": [1], "col1": [None], "col2": [1]})
demo_dataframe = demo_dataframe.set_index("index")
demo_dataframe.head()

Unnamed: 0_level_0,col1,col2
index,Unnamed: 1_level_1,Unnamed: 2_level_1
1,,1


In [9]:
res = demo_dataframe.apply(metric, axis=1)
res.head()

index
1    0.5
dtype: float64