# Quick start 

This tutorial assumes you know how to programm in python and have a basic understanding of the [pandas library](https://pandas.pydata.org/docs/index.html).

To use @voc@ you need to follow these steps:

## Install @voc@

You can install @voc@ using pip:
(Run this directly in a notebook cell or in your terminal)
```bash
pip install avoca
```

## Load the data


Internally @voc@ uses pandas dataframe.

The format must follow the specifed rules:

* Columns are [Multiindex](https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing) containing in the first row, the name of the compound and in the second row the name of the variable. 
* If a variable is shared among all compounds, the compound is `-` .
* One variable is reserved for each compound is called `flag`. It will be used for assigning flagged values.

In [None]:
import pandas as pd
import numpy as np

np.random.seed(31415)

df = pd.DataFrame(
    np.random.randn(100, 4),
    columns=pd.MultiIndex.from_tuples(
        [
            ("compA", "area"),
            ("compA", "C"),
            ("compB", "area"),
            ("compB", "C"),
        ]
    ),
)
# Create an outlier to ensure we will have a flagged value
df.loc[0, ("compA", "C")] = 3.0
df.head()

## Define the QA model

Many models can be found in the [models](Models) .

In this example we will use the simplest model: 
and we will use the simplest model: {py:class}`avoca.qa_class.zscore.ExtremeValues`

In [None]:
from avoca.qa_class.zscore import ExtremeValues

model = ExtremeValues(
    # Here we define some parameters on which the model will be applied
    compounds=["compA", "compB"],
    variable="C",
    # Here are some parameters for the model itself
    threshold=2,
)

## Run the QA model

In an approach similar to Machine Learning, we will first fit the model to the data and then predict the bad values.


In [None]:
# The model will calculate some statistics on the data
model.fit(df)

# Predict the outliers
outliers = model.assign(df)
outliers

Here we can see the indexes of the bad values.
But the best way to see the results is to plot the data.

For this purpose we can use the `plot` method of the model.

It plots the training data and the outliers.

In [None]:
import matplotlib.pyplot as plt
plt.style.use("default")

model.plot()


## Setting the flags

Now that we have seen how the assigner works, we would like to set the flags
to the data and then be able to export it.

For this we can use the following:

In [None]:
from avoca.flagging import flag 

df_out = flag(df, model, outliers)
df_out

We can see that the value we set at the start now received a flag value, as expected.

The flag is 0 if no flag was set. Then each flag is a power of 2.
Combining flags is done by adding the values together.

In [None]:
model.flag

## Export the data

Finally we would like to share this data further.

We can use the `to_csv` method from pandas to export the data to a csv file.



In [None]:
df_out.to_csv("flagged_data.csv")

However many programs will support custom flag formats. 

For this `avoca` provides bindings to other software.
Have a look at the [bindings](https://avoca.readthedocs.io/en/latest/bindings/index.html) to see if your software is supported.

## Conclusions


Here we have showed on a toy example how to use @voc@ to detect bad values in a dataset.

Note that we used the same dataset for training and prediction, but in a real scenario, you could have some cleaned data that you use for training and then apply the model to a new dataset.