Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,7 @@ dmypy.json

# Pyre type checker
.pyre/

# RStudio
.Rproj.user
*.Rproj
6 changes: 3 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pd.set_option("display.notebook_repr_html", False)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/pins-python/HEAD)

The pins package publishes data, models, and other python objects, making it
The pins package publishes data, models, and other Python objects, making it
easy to share them across projects and with your colleagues. You can pin
objects to a variety of pin *boards*, including folders (to share on a
networked drive or with services like DropBox), RStudio Connect, and Amazon
Expand Down Expand Up @@ -41,7 +41,7 @@ from pins.data import mtcars
board = pins.board_temp()
```

You can pin (save) data to a board with the `.pin_write()` method. It requires three
You can "pin" (save) data to a board with the `.pin_write()` method. It requires three
arguments: an object, a name, and a pin type:

```{python}
Expand All @@ -61,7 +61,7 @@ board.pin_read("mtcars")
A board on your computer is good place to start, but the real power of
pins comes when you use a board that’s shared with multiple people. To
get started, you can use `board_folder()` with a directory on a shared
drive or in dropbox, or if you use [RStudio
drive or in DropBox, or if you use [RStudio
Connect](https://www.rstudio.com/products/connect/) you can use
`board_rsconnect()`:

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/pins-python/HEAD)

The pins package publishes data, models, and other python objects, making it
The pins package publishes data, models, and other Python objects, making it
easy to share them across projects and with your colleagues. You can pin
objects to a variety of pin *boards*, including folders (to share on a
networked drive or with services like DropBox), RStudio Connect, and Amazon
Expand Down Expand Up @@ -35,7 +35,7 @@ from pins.data import mtcars
board = pins.board_temp()
```

You can pin (save) data to a board with the `.pin_write()` method. It requires three
You can "pin" (save) data to a board with the `.pin_write()` method. It requires three
arguments: an object, a name, and a pin type:


Expand All @@ -49,7 +49,7 @@ board.pin_write(mtcars.head(), "mtcars", type="csv")



Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220518T150837Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 5, 18, 15, 8, 37, 413288), hash='120a54f7e0818041'), name='mtcars', user={})
Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220526T165625Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 5, 26, 16, 56, 25, 738735), hash='120a54f7e0818041'), name='mtcars', user={})



Expand Down Expand Up @@ -79,7 +79,7 @@ board.pin_read("mtcars")
A board on your computer is good place to start, but the real power of
pins comes when you use a board that’s shared with multiple people. To
get started, you can use `board_folder()` with a directory on a shared
drive or in dropbox, or if you use [RStudio
drive or in DropBox, or if you use [RStudio
Connect](https://www.rstudio.com/products/connect/) you can use
`board_rsconnect()`:

Expand Down
2 changes: 1 addition & 1 deletion docs/api/boards.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Board Methods
=============

.. currentmodule:: pins
.. currentmodule:: pins.boards

Constructor
-----------
Expand Down
29 changes: 16 additions & 13 deletions docs/getting_started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ pd.options.display.max_rows = 25
Getting Started
===============

The pins package helps you publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with dropbox), RStudio connect, Amazon S3, and more.
The pins package helps you publish data sets, models, and other Python objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with DropBox), RStudio connect, Amazon S3, and more.
This vignette will introduce you to the basics of pins.

```{python}
Expand All @@ -31,13 +31,13 @@ from pins import board_local, board_folder, board_temp, board_urls
## Getting started

Every pin lives in a pin *board*, so you must start by creating a pin board.
In this vignette I'll use a temporary board which is automatically deleted when your python session is over:
In this vignette I'll use a temporary board which is automatically deleted when your Python session is over:

```{python}
board = board_temp()
```

In real-life, you'd pick a board depending on how you want to share the data.
In real life, you'd pick a board depending on how you want to share the data.
Here are a few options:


Expand All @@ -51,23 +51,23 @@ board = board_rsconnect() # share data with RStudio Connect

## Reading and writing data

Once you have a pin board, you can write data to it with `pin_write()`:
Once you have a pin board, you can write data to it with the `.pin_write()` method:

```{python}
from pins.data import mtcars

meta = board.pin_write(mtcars, "mtcars", type="csv")
```

The first argument is the object to save (usually a data frame, but it can be any R object), and the second argument gives the "name" of the pin.
The name is basically equivalent to a file name: you'll use it when you later want to read the data from the pin.
The first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the "name" of the pin.
The name is basically equivalent to a file name; you'll use it when you later want to read the data from the pin.
The only rule for a pin name is that it can't contain slashes.


As you can see from the output, pins has chosen to save this data to an `.rds` file.
Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
But you can choose another option depending on your goals:

- `type = "csv"` uses `write.csv()` to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
- `type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
- `type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.

🚧 Data formats TODO 🚧
Expand All @@ -88,17 +88,18 @@ That said, most boards transmit pins over HTTP, and this is going to be slow and
As a general rule of thumb, we don't recommend using pins with files over 500 MB.
If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.


<!-- #region -->
```{note}
If you are using the RStudio Connect board (`board_rsconnect`), then you must specify your pin name as
`<user_name>/<content_name>`. For example, `hadely/sales-report`.
`<user_name>/<content_name>`. For example, `hadley/sales-report`.
```
<!-- #endregion -->


## Metadata


Every pin is accompanied by some metadata that you can access with pin_meta():
Every pin is accompanied by some metadata that you can access with `pin_meta()`:

```{python}
board.pin_meta("mtcars")
Expand Down Expand Up @@ -139,7 +140,7 @@ While we’ll do our best to keep the automatically generated metadata consisten
> ⚠️: Warning the examples in this section use joblib to read and write data. Joblib uses the pickle format, and **pickle files are not secure**. Only read pickle files you trust. In order to read pickle files, set the `allow_pickle_read=True` argument. See: https://docs.python.org/3/library/pickle.html.


> ⚠️: versioning is not yet implemented. These docs are copied from R pins.
> ⚠️: Turning off versioning is not yet implemented; all Python pins are versioned. These docs are copied from R pins.

In many situations it's useful to version pins, so that writing to an existing pin does not replace the existing data, but instead adds a new copy.
There are two ways to turn versioning on:
Expand Down Expand Up @@ -186,6 +187,7 @@ board2.pin_read("x", version = version)

## 🚧 Reading and writing files

> ⚠️: `pin_upload()` and `pin_download()` are not yet implemented in Python. These docs are copied from R pins.

So far we've focussed on `pin_write()` and `pin_read()` which work with R objects.
pins also provides the lower-level `pin_upload()` and `pin_download()` which work with files on disk.
Expand Down Expand Up @@ -231,6 +233,7 @@ But you can `pin_download()` something that you've pinned with `pin_write()`:

## Caching

> ⚠️: `board_url` is not yet implemented in Python. These docs are copied from R pins.

The primary purpose of pins is to make it easy to share data.
But pins is also designed to help you spend as little time as possible downloading data.
Expand Down
22 changes: 12 additions & 10 deletions docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ kernelspec:
```

The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~Microsoft 365 (OneDrive and SharePoint)~.
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~~Microsoft 365 (OneDrive and SharePoint)~~.
Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.

## Installation

To try out the development version of pins you'll need to install from GitHub:
To install the released version from PyPI:

```shell
python -m pip install pins
Expand All @@ -36,7 +36,7 @@ python -m pip install pins

To use the pins package, you must first create a pin board.
A good place to start is `board_folder()`, which stores pins in a directory you specify.
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your R session ends.
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your Python session ends.
This is great for examples, but obviously you shouldn't use it for real work!

```{code-cell} ipython3
Expand All @@ -47,23 +47,25 @@ board = board_temp()
board
```

You can "pin" (save) data to a board with `pin_write()`.
It takes three arguments: the board to pin to, an object, and a name:
You can "pin" (save) data to a board with the `.pin_write()` method.
It requires three arguments: an object, a name, and a pin type:

```{code-cell} ipython3
board.pin_write(mtcars.head(), "mtcars", type="csv")
```

~As you can see, the data saved as an `.rds` by default~, but depending on what you're saving and who else you want to read it, you might use the `type` argument to instead save it as a `csv`, ~`json`, or `arrow`~ file.
Above, we saved the data as a CSV, but depending on
what you’re saving and who else you want to read it, you might use the
`type` argument to instead save it as a `joblib` or `arrow` file (NOTE: arrow is not yet supported).

You can later retrieve the pinned data with `pin_read()`:
You can later retrieve the pinned data with `.pin_read()`:

```{code-cell} ipython3
board.pin_read("mtcars")
```

A board on your computer is good place to start, but the real power of pins comes when you use a board that's shared with multiple people.
To get started, you can use `board_folder()` with a directory on a shared drive or in dropbox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:
To get started, you can use `board_folder()` with a directory on a shared drive or in DropBox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:

🚧 TODO: add informational messages shown in display below

Expand All @@ -81,7 +83,7 @@ board.pin_write(tidy_sales_data, "hadley/sales-summary", type = "csv")

+++

Then, someone else (or an automated Rmd report) can read and use your pin:
Then, someone else (or an automated report) can read and use your pin:

+++

Expand All @@ -94,5 +96,5 @@ board.pin_read("hadley/sales-summary")

You can easily control who gets to access the data using the RStudio Connect permissions pane.

The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`), Azure's blob storage (`board_azure()`), and Microsoft SharePoint (`board_ms365()`).
The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`) and Azure's blob storage (`board_azure()`).
Learn more in [getting started](getting_started.Rmd).