Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ jobs:
- uses: actions/setup-python@v2

with:
python-version: 3.8
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand Down
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ format: jb-book
root: intro
chapters:
- file: getting_started
- file: articles/index.rst
- file: api/index.rst
97 changes: 97 additions & 0 deletions docs/articles/customize-pins-metadata.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
jupyter:
jupytext:
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.13.6
kernelspec:
display_name: venv-pins-python
language: python
name: venv-pins-python
---

# Using custom metadata



The `metadata` argument in pins is flexible and can hold any kind of metadata that you can formulate as a `dict` (convertable to JSON).
In some situations, you may want to read and write with _consistent_ customized metadata;
you can create functions to wrap `pin_write()` and `pin_read()` for your particular use case.

We'll begin by creating a temporary board for demonstration:

```{python setup}
import pins
import pandas as pd

from pprint import pprint

board = pins.board_temp()
```


# A function to store pandas Categoricals

Say you want to store a pandas Categorical object as JSON together with the _categories_ of the categorical in the metadata.

For example, here is a simple categorical and its categories:

```{python}
some_cat = pd.Categorical(["a", "a", "b"])

some_cat.categories
```

Notice that the categories attribute is just the unique values in the categorical.

We can write a function wrapping `pin_write()` that holds the categories in metadata, so we can easily re-create the categorical with them.

```{python}
def pin_write_cat_json(
board,
x: pd.Categorical,
name,
**kwargs
):
metadata = {"categories": x.categories.to_list()}
json_data = x.to_list()
board.pin_write(json_data, name = name, type = "json", metadata = metadata, **kwargs)
```

We can use this new function to write a pin as JSON with our specific metadata:

```{python}
some_cat = pd.Categorical(["a", "a", "b", "c"])
pin_write_cat_json(board, some_cat, name = "some-cat")
```

## A function to read categoricals

It's possible to read this pin using the regular `pin_read()` function, but the object we get is no longer a categorical!

```{python}
board.pin_read("some-cat")
```

However, notice that if we use `board.pin_meta()`, the information we stored on categories is in the `.user` field.

```{python}
pprint(
board.pin_meta("some-cat")
)
```

This enables us to write a special function for reading, to reconstruct the categorical, using the categories stashed in metadata:

```{python}
def pin_read_cat_json(board, name, version=None, hash=None, **kwargs):
data = board.pin_read(name = name, version = version, hash = hash, **kwargs)
meta = board.pin_meta(name = name, version = version, **kwargs)
return pd.Categorical(data, categories=meta.user["categories"])

pin_read_cat_json(board, "some-cat")
```

For an example of how this approach is used in a real project, look at look at how the vetiver package wraps these functions to [write](https://github.com/rstudio/vetiver-python/blob/main/vetiver/pin_read_write.py) and [read](https://github.com/rstudio/vetiver-python/blob/main/vetiver/vetiver_model.py) model binaries as pins.
5 changes: 5 additions & 0 deletions docs/articles/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Articles
========

.. toctree::
customize-pins-metadata.Rmd