# Working with data: FHIRFlat

This Jupyter notebook shows how to load a sample FHIRFlat folder and do simple statistics and plots. You can view a live version of this notebook on Google Colab or MyBinder by clicking the 'Launch' button (rocket icon) in the top right corner.

```{note}
On Google Colab, you will need to install the polyflame package first.
You can use `pip` to install the package by typing into an empty code cell:

    !pip install git+https://github.com/globaldothealth/polyflame
```

First we import the necessary functions:

In [None]:
import pandas as pd
import polyflame.samples
from polyflame import load_taxonomy, plot, plot_unpacked
from polyflame.fhirflat import (
    use_source,
    condition_proportion,
    condition_upset,
    age_pyramid
)

Then we load a source using the `use_source()` function. A checksum **must** be specified. This is to ensure reproducibility of outputs by being able to verify data integrity of FHIRFlat data.

In [None]:
source = use_source(polyflame.samples.fhirflat, checksum="03cc8e28d97a6a3ab20926d7c3f891f14e119eb882c6e8d3deb07e1b79eed089")
tx = load_taxonomy("fhirflat-isaric3")
source

A `source` is a Python dictionary with pre-specified keys that tells data processing and visualization functions where to get information from. Once we have a source, we can start looking at standard analyses, such as the proportion of patients having a particular condition:

In [None]:
plot(condition_proportion(source, tx))

Or, an [UpSet](https://en.wikipedia.org/wiki/UpSet_plot) plot showing top conditions and their co-occurrence:

In [None]:
plot(condition_upset(source))

We can also look at the age pyramid, grouped by outcome type:

In [None]:
plot(age_pyramid(source))

While we have shown examples using the standard FHIRFlat analyses above, the plotting functions can take any generic dataframe as an input as long as they follow a particular *shape*. Here, we will use the `plot_unpacked()` function which allows us to pass dataframes directly, instead of expecting them as part of a dictionary like `plot()`. For example, to show a hypothetical UpSet plot showing frequency of intersection of movie genres: 

In [None]:
df = pd.DataFrame({'crime': [1, 0, 1], 'fantasy': [0, 1, 1], 'drama': [1, 0, 0]})
df

In [None]:
plot_unpacked(df, "upset")

Having `plot_unpacked()` be a generic function makes PolyFLAME easy to extend to other data source types, like REDCap, or your own source.