## Ediphi Data API Tutorial


To get started, you will need your `X_API_KEY` and `DATABASE_NO` in your .env file, check out the readme for more information.  This also assumes you have all of the dependancies installed in requirements.txt, like pandas, requests, and load_dotenv.

Data can be extracted from ediphi using endpoints under `data.ediphi.com/api`.  These are authenticated with an api key, and are POST calls, often with url-encoded payloads. 

At the end, of this document, we'll provide a selection of endpoints for you to explore

Let's begin. First let's grab our api key and database number, and set up a method to query the database.


In [None]:
import os
import json
import requests
import pandas as pd
from dotenv import load_dotenv

load_dotenv()


def query_runner(query: str, df: bool = False):
    url = "https://data.ediphi.com/api/dataset/json"
    headers = {
        "Content-Type": "application/x-www-form-urlencoded",
        "X-API-KEY": os.getenv("X_API_KEY"),
    }
    data = {
        "query": json.dumps(
            {
                "database": int(os.getenv("DATABASE_NO")),
                "type": "native",
                "native": {"query": f"{query}"},
            }
        )
    }
    response = requests.post(url, headers=headers, data=data)
    if response.status_code != 200:
        error_msg = {
            "response": response.text,
            "data": data,
        }  # this error triggers if there's a problem with your API key
        raise requests.exceptions.HTTPError(error_msg)
    else:
        result = json.loads(response.content)
        try:
            return result[
                "error"
            ]  # this error triggers if there's a problem with your sql query
        except TypeError:
            if df:
                return pd.DataFrame(result)
            else:
                return result

Now lets look at the payload that we pass to this endpoint. Pro tip: it's just a SQL query. So we'll make a basic query and run it:

In [None]:
query = "select * from regions limit 1"
query_runner(query)

An important note: where running queries which will return large result sets, it is necessary to break it into chunks and iterate. This is because the duration of a large query may run past changes made in the primary database. This would couse the attempted query to fail. We provide a helper class in utils.ediphi to demonstrate how this can be done, and we demonstrate its use below. 

Now, here's a method that will return a data dictionary of your database along with columns and datatypes, as well as each table's relationships to other tables. You can optionally get the relationships for a specific table by passing in its name, and there's also an option to return the result as a Pandas dataframe:

In [None]:
def get_data_dict(table_name: str = None, df: bool = False):
    with open("data_dictionary.sql", "r") as dd:
        query = dd.read()
    if table_name:
        query += f" where c.table_name = '{table_name}' or cf.references_table = '{table_name}'"
        query += f" order by c.table_name, cf.references_table"
    return query_runner(query, df)

We'll run it for the `estimates` table, and we'll set it to return the result as a dataframe:

In [None]:
df = get_data_dict("estimates", df=True)
display(df.describe())
display(df)

Suppose we want to join two tables, and we need to know the forign key - primary key relationship. we could use the result of the previous query like so:

In [None]:
fk_mask = df["table_name"] == "line_items"
pk_mask = (df["table_name"] == "estimates") & df["is_pk"]
fk = df.loc[fk_mask, "column_name"].values[0]
pk = df.loc[pk_mask, "column_name"].values[0]
print("the forign key for line_items conection to estimates is:", fk)
print("the primary key it connects to within the estimates table is:", pk)

query = (
    "select e.name as estimate_name, e.phase, l.name as item_name, l.quantity, l.uom, l.total_uc "
    + "from estimates e "
    + "join line_items l "
    + f"on l.{fk} = e.{pk} "
    + "where e.id = (select id from estimates order by created_at offset 5 limit 1)"
)  # this line is just limiting the result to one estimate

df2 = query_runner(query, df=True)
display(df2.head())

Another common need is retrieving sort fields and sort codes. Sort fields are like additional columns for line items so that the items can be cross coded. Then there are sort codes in each sort field that can be applied to the line items.  So this payload will get a sort field by name, and all of its sort codes. 

In [None]:
properties = "Bid Package"
query = f"select * from sort_codes where sort_field = (select id from sort_fields where name = '{properties}')"
data = query_runner(query)
data

Let's take a look at our helper classes. There are two: `Database`, and `Table`. We'll start by instantiating a `Database` object, and having done that, we'll inspect some of its properties


In [None]:
import utils.ediphi as ediphi

db = ediphi.Database()
print(db.tenant_name)
print("\n\nfirst few tables:")
display(
    {k: v for (k, v) in [i for i in db.tables.items()][:5]}
)  # first 5 items of the dictionary of all tables in the database

Now we'll look at a `Table` object and some of its properties

In [None]:
estimates = ediphi.Table("estimates")
print(f"table id: {estimates.table_id}\n\n")
print("first few columns:")
display({k: v for (k, v) in [i for i in estimates.columns.items()][:5]})

At the beginning of this tutorial, we mentioned a demonstration of how to properly iterate though a result set. The example we provide is the `get_table` method of the `Database` class. It uses the `query` method of the `Database` class to fetch chunks of a table ordered by the primary key of the table. Each chunk looks for the highest pk in the last chunk, and fetches results where pk is greater. The default chunk size is 1000 rows, but this can be adjusted. Here's an example:

In [None]:
products = db.get_table("products", df=True)
display(products.info())

Other useful methods provided by the `Database` class include `data_dictionary` and `query`. More are coming soon. 

This should give you a base understanding of the data pipeline.  If you have any questions reach out to one of our leaders of our data engineering team:

Swan Sodja, Senior Data Engineer 
swan@ediphi.com

Colby Ajoku, Director of Partnerships & Integrations
colby@ediphi.com

Mike Navarro, CTO
michael@ediphi.com