# Praxis Interface

Praxis Interface is a simple tool that helps teams reason about data that evolves over time. It is used internally by the Praxis team for financial forecasts of over $30B in transactional volume.

<figure>
  <img
  src="https://drive.google.com/uc?id=1wUv8d8Qvdrfo1SVawsvAkqsMAfZI0Vdu"
  alt="product overview"
  >
  <figcaption>visualizing bigquery-public-data.covid19_open_data</figcaption>
</figure>


## Features

- Visualize different metrics and time-series IDs and their decompositions from your pandas DataFrame
- Join covariates to your target series
- Find leading indicators among your covariates
- Export train-test splits given timestamp
- Quickly train baseline statistical and ML models (single-series and multi-series), export predictions, residuals, and model
- Benchmark different feature engineering settings and models

## Reporting Bugs / Providing Feedbacks

The tool is still in alpha, so there may be rough edges and bugs. We'd love to hear from you for bug reports and feedbacks :)

Feel free to join our Community Slack channel at https://join.slack.com/t/praxiscommunity/shared_invite/zt-1ef4vfje9-VCdEThDKIrYd0Z5ErVGo9A, or alternatively send correspondences to engineering@praxispioneering.com.

## Relevant Links

- Colab demo: https://demo.praxispioneering.com

- Documentation and API reference: https://docs.praxispioneering.com


## Installing via Pip

You may save the wheel for installation on your own platforms (hosted Jupyter notebooks, Vertex AI, localhost, etc.) 

Run the command below to install the package. This should take at most 3 to 5 minutes.

In [None]:
!pip install https://storage.googleapis.com/praxis-public/wheels/praxis_interface-0.0.9-py2.py3-none-any.whl

* After seeing:
```
WARNING: The following packages were previously imported in this runtime:
  [pkg1, pkg2, ..]
```
you may wanna restart the runtime, otherwise importing Praxis may not work.
* Pip may report an installation warning like:
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
```
this won't have any affect on the success of installation. You may proceed as usual.

## Getting Started

You could launch a Praxis interface in just a few lines:

In [None]:
from praxis_interface import Praxis
import pandas as pd

df = pd.read_parquet('https://storage.googleapis.com/praxis-public/assets/covid-19-us-demo.parquet')
praxis = Praxis(df)

In [None]:
praxis.run(mode="inline") # running inline inside a cell

In [None]:
praxis.run(mode="external") # running in an extra tab

If you are in a hosted environment, the proxy set-up may also be buggy and results in the interface not showing up. Your browser may also be configured to not display the interface due to Enhanced Tracking Protection.

## Working with the Interface

Praxis is designed to be flexible and performant. Here are some examples of how you could work with the tool:



### Selecting Time-series and Columns

In [None]:
from IPython.display import IFrame

IFrame(src="https://www.loom.com/embed/fc2616d1c28a4fcc9e552064276673ca",width=800, height=400)

### Add Covariates and Change Cutoff

In [None]:
IFrame(src="https://www.loom.com/embed/12af08aaf7eb4feebb85b318a16b35a3",width=800, height=400)

### Find Leading Indicators

In [None]:
IFrame(src="https://www.loom.com/embed/2b7257d2c59b4fe3b32c52b4940bacc0",width=800, height=400)

### Train, Investigate and Compare Single-series Models (Statistical & ML)

In [None]:
IFrame(src="https://www.loom.com/embed/9da6846324dc4c719ae14e1bd7cc90ea",width=800, height=400)

### Train, Investigate and Compare Multi-series Models (Statistical & ML)


In [None]:
IFrame(src="https://www.loom.com/embed/985fc3e2f2fa4e27a8645bd8e213c3e8",width=800, height=400)

## Getting Results from the Interface

Praxis complements your existing workflow by visualizing inputs from a DataFrame, then returning preliminary results from your investigative interactions with the interface. 

Here are different types of information you could get from the interface: 

### Models
Remark that the `['model']` field contains a Python Darts object that you could use elsewhere.

In [None]:
praxis.get_model()

{'Prophet (singleseries)': {'mode': 'forecast',
  'start_time': Timestamp('2021-01-01 00:00:00'),
  'target_column': 'new_confirmed',
  'model': <darts.models.forecasting.prophet_model.Prophet at 0x7f369f409510>,
  'model_selection': 'Prophet (singleseries)',
  'components': ['cumulative_confirmed'],
  'forecast_length': 20,
  'forecast_samples': 1,
  'disable_future_cov': False}}

### Forecasts

Forecasts are broken down by models, percentiles (for probabilistic inference), and whether it's a prediction or a residual (actual - prediction).

In [None]:
forecast = praxis.get_forecast()
forecast['prediction']['Prophet (singleseries)']['0.05'].head()

Unnamed: 0,prediction
2021-01-01,123.414772
2021-01-02,126.344334
2021-01-03,126.62696
2021-01-04,126.420224
2021-01-05,126.872941


### Metrics

Currently, we support calculating symmetric mean absolute percentage error (SMAPE) and root mean squared error (RMSE).

In [None]:
praxis.get_metrics()

{'Prophet (singleseries)': {'smape': 21.53, 'rmse': 39.09}}

### Train/test Split

To help with accelerating your prototyping process, you could easily export the train/test split in interface into respective DataFrames, so that it's easier to run on your more complex training pipeline.

In [None]:
train, test = praxis.to_dataframe()

## Tutorial: Finding Your First Data Driver in the Demo Dataset

Now that you are familiar with the basic interactive components of the interface, this tutorial will demonstrate how to use the tool to discover important drivers on our demo dataset.


### The Data

The demo dataset we are using is a subset of `bigquery-public-data.covid19_open_data` available on Google BigQuery, cleaned and formatted for use inside the Praxis interface. It contains daily time-series data related to COVID-19. You can find the list of sources available here: https://github.com/open-covid-19/data.

In [None]:
df # get a glimpse of the dataset.

Unnamed: 0,date,id,new_confirmed,new_deceased,cumulative_confirmed,cumulative_deceased,new_persons_vaccinated,cumulative_persons_vaccinated,new_persons_fully_vaccinated,cumulative_persons_fully_vaccinated,...,new_vaccine_doses_administered_pfizer,cumulative_vaccine_doses_administered_pfizer,new_persons_fully_vaccinated_moderna,cumulative_persons_fully_vaccinated_moderna,new_vaccine_doses_administered_moderna,cumulative_vaccine_doses_administered_moderna,new_persons_fully_vaccinated_janssen,cumulative_persons_fully_vaccinated_janssen,new_vaccine_doses_administered_janssen,cumulative_vaccine_doses_administered_janssen
0,2020-01-01,US_AK,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,2020-01-02,US_AK,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,2020-01-03,US_AK,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,2020-01-04,US_AK,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,2020-01-05,US_AK,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53531,2022-08-09,US_WY,767.727060,6.000001,172089.136470,1862.000000,0.000669,341423.018870,1.004531,298931.497735,...,2.457173,401370.771414,0.502280,121411.748860,2.972672,341851.513664,0.049405,26924.975297,0.079584,28970.960208
53532,2022-08-10,US_WY,255.909020,2.000000,172345.045490,1864.000000,0.000223,341423.006290,0.334844,298931.832578,...,0.819058,401371.590471,0.167427,121411.916287,0.990891,341852.504555,0.016468,26924.991766,0.026528,28970.986736
53533,2022-08-11,US_WY,85.303007,0.666667,172430.348497,1864.666667,152.666741,341575.668763,306.111615,299237.944193,...,314.939686,401686.530157,149.389142,121561.305429,258.330297,342110.834852,5.338823,26930.330589,4.675509,28975.662245
53534,2022-08-12,US_WY,28.434336,0.222222,57476.782832,621.555556,50.888914,341626.556254,102.037205,299339.981398,...,104.979895,401791.510052,49.796381,121611.101810,86.110099,342196.944951,1.779608,26932.110196,1.558503,28977.220748


In [None]:
praxis.run(mode="external") # run the interface again.

Let's say we'd like to predict the `new_deceased` column for the `US_CA` time-series starting at 03/01/2021, 20 days into the future. We'll use [CatBoost](https://catboost.ai/) for a baseline model:


![baseline](https://drive.google.com/uc?id=1C2idiBK1HIKc_9i01wFePfuhhD9wif8N)

We now have a 64.26% SMAPE, which is not great. We will now use the leading indicator search to find potential data drivers: 

![baseline](https://drive.google.com/uc?id=1_i3pV2wfb2GZzyrS6pUsbHdN1t5ftU54)

We've now found some interesting candidate leading indicators that may improve our results. We will pick the covariate that have the lowest f-value: `search_trends_hypoxia` and retrain the model.


![new-model](https://drive.google.com/uc?id=1zIRsYX7-Rh4xR4-tBQ2rWaBYLqvrYTPW)

As shown above, we are able to achieve a 30% improvement (64.26% -> 34.56%) with the Praxis tool by finding core data drivers in your time-series dataset and fast iteration of using investigative baseline models to measure the quality of your features.