# Preprocess and Upload Data for Conquery

This tutorial shows how we can preprocess CSV files that we downloaded previously in the tutorials [Age and Gender](./age_gender.ipynb) or [ICD9](./icd9.ipynb) to the CQPP format (**c**on**q**uery **p**re**p**rocessed). These preprocessed files and the Table-JSONs and Concept-JSONs are finally uploaded to the backend using the REST-API.


In [None]:
## The imports for this notebook
import io
import requests as r
import os
import json
from pathlib import Path

# Define working directory
wd = Path(".")

## Preprocessing

This step assumes that:
- the backend has already been build/compiled (using the script `conquery/scripts/build_backend_no_version.sh`) and the JAR file is on path following the `-jar` option
- the Import-JSONS are in the folder specified by the `--desc` option
- the CSV files are in the folder that is specified with in the Import-JSONS->>`sourceFile` (the `--in` option specifies the absolute path for these relative paths)

The command below converts the CSVs to CQPP files. The `--out` option specifies the folder where the files are written to.

In [None]:
(wd / "data" / "cqpp").mkdir(exist_ok=True, parents=True)
!java -jar ../../executable/target/executable-0.0.0-SNAPSHOT.jar preprocess --desc ./data/imports --in . --out ./data/cqpp

## Import/Upload

This step assumes that the backend is running and the admin endpoint is reachable under the specified url (e.g. using the script `conquery/scripts/run_e2e_backend.sh`).

### Meta Data Import

In [None]:
import time

def process_respose(res):
    if res.ok:
            return 'done'
    else:
        if res.status_code == 409:
            return 'already uploaded.'
        elif res.status_code == 401:
            return 'authentication failed'
        elif res.status_code == 401:
            return 'forbidden'
        else:
            return f'unknown error:\n{res.status_code}\n{res.text}'

# Create Dataset
datasetId = "mimic-iii-demo"
print(f'Uploading {datasetId} ... ', end='')
res = r.post("http://localhost:8081/admin/datasets", json={"name":datasetId, "label": "MIMIC-III Demo"}, headers={"content-type":"application/json"})
print(process_respose(res))
print('---')

time.sleep(2)

# Upload Table-JSONs
for filename in (wd / "data" / "tables").glob("*.table.json"):
    print(f'Uploading {filename} ... ', end='')
    with open(filename, 'rb') as file:
        table = json.load(file)
        res = r.post(f"http://localhost:8081/admin/datasets/{datasetId}/tables", json=table, headers={"content-type":"application/json"})
        print(process_respose(res))
print('---')

time.sleep(2)

# Upload Concept-JSONs
for filename in (wd / "data" / "concepts").glob("*.concept.json"):
    print(f'Uploading {filename} ... ', end='')
    with open(filename, 'rb') as file:
        concept = json.load(file)
        res = r.post(f"http://localhost:8081/admin/datasets/{datasetId}/concepts", json=concept, headers={"content-type":"application/json"})
        print(process_respose(res))

### Data Import

In [None]:

for filename in (wd / "data" / "cqpp").glob("*.cqpp"):
    with open(filename, "rb") as file :
        print(f'Uploading {filename} ... ', end='')
        res = r.post(f"http://localhost:8081/admin/datasets/{datasetId}/cqpp", data=file, headers={"content-type":"application/octet-stream"})
        print(process_respose(res))

### Update Matching Stats

This action collects statistics that are displayed in the frontend when hovering over concepts. 

In [None]:
print(f'Updating matching stats ... ', end='')
res = r.post(f"http://localhost:8081/admin/datasets/{datasetId}/update-matching-stats", headers={"content-type":"application/json"})
print(process_respose(res))

## Visit the Frontend

Finally you can start the frontend (e.g. using the script `conquery/scripts/run_e2e_frontend.sh`) and access it under the url http://localhost:8000/?access_token=user.SUPERUSER@SUPERUSER as the super user. In the top right corner choose the *MIMIC-III Demo* dataset. You can then start combining nodes of the *ICD* concept in the query editor and submit your query.