# Postprocessing

Here we analyze the modelling results.

In [None]:
import pandas as pd

INPUT_DIR = os.path.abspath('data')
WORKING_DIR = os.path.abspath("data/working")

## Python Solution

In [None]:
results = pd.read_parquet(f'{WORKING_DIR}/model_search.parquet')
best_models = results.sort_values('metric', ascending=True).groupby("unique_id").first()
best_models['models'].value_counts()

In [None]:
from fugue_notebook import setup
setup()

In [None]:
%%fsql
results = LOAD '{{WORKING_DIR}}/model_search.parquet'
PRINT

In [None]:
%%fsql
results = LOAD '{{WORKING_DIR}}/model_search.parquet'

temp = SELECT models, metric, unique_id,
       ROW_NUMBER() OVER (PARTITION BY unique_id ORDER BY metric ASC) AS ranked_order
       FROM results

SELECT *
  FROM temp
 WHERE ranked_order = 1 
 PRINT

## Anonymity

We can simplify the code further by removing the temp table. The next query will just operate on the result of the previous query.

In [None]:
%%fsql
results = LOAD '{{WORKING_DIR}}/model_search.parquet'

SELECT models, metric, unique_id,
ROW_NUMBER() OVER (PARTITION BY unique_id ORDER BY metric ASC) AS ranked_order

SELECT *
 WHERE ranked_order = 1 
 PRINT

## Additional Keywords

We can further simplify the query above by using additional `FugueSQL` keywords.

In [None]:
%%fsql
results = LOAD '{{WORKING_DIR}}/model_search.parquet'

TAKE 1 ROW PREPARTITION BY unique_id PRESORT metric ASC
PRINT

## Python Interoperability

In [None]:
%%fsql spark
results = LOAD '{{WORKING_DIR}}/model_search.parquet'

TAKE 1 ROW PREPARTITION BY unique_id PRESORT metric ASC
YIELD DATAFRAME AS top

In [None]:
top = top.native.toPandas()

In [None]:
top['models'].value_counts()

In [None]:
import seaborn as sns

def plotter(df: pd.DataFrame):
    sns.countplot(x=df['models'])
    return

In [None]:
plotter(top)

## Interoperable SQL and Python

We can actually compress the previous steps by invoking Python in `FugueSQL`

In [None]:
%%fsql
results = LOAD '{{WORKING_DIR}}/model_search.parquet'

TAKE 1 ROW PREPARTITION BY unique_id PRESORT metric ASC
OUTPUT USING plotter