Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 7 additions & 10 deletions docs/getting_started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,13 @@ The name is basically equivalent to a file name; you'll use it when you later wa
The only rule for a pin name is that it can't contain slashes.


Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
But you can choose another option depending on your goals:
Above, we saved the data as a CSV, but you can choose another option depending on your goals:

- `type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
- `type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
- `type = "arrow"` uses `pyarrow` to create an arrow/feather file. [Arrow](https://arrow.apache.org) is a modern, language-independent, high-performance file format designed for data science. Not every tool can read arrow files, but support is growing rapidly.
- `type = "json"` uses `json.dump()` to create a `.json` file. Pretty much every programming language can read json files, but they only work well for nested lists.
- `type = "csv"` uses `to_csv()` from pandas to create a CSV file. CSVs are plain text and can be read easily by many applications, but they only support simple columns (e.g. numbers, strings), can take up a lot of disk space, and can be slow to read.
- `type = "parquet"` uses `to_parquet()` from pandas to create a Parquet file. [Parquet](https://parquet.apache.org/) is a modern, language-independent, column-oriented file format for efficient data storage and retrieval. Parquet is an excellent choice for storing tabular data.
- `type = "arrow"` uses `to_feather()` from pandas to create an Arrow/Feather file.
- `type = "joblib"` uses `joblib.dump()` to create a binary Python data file, such as for storing a trained model. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
- `type = "json"` uses `json.dump()` to create a JSON file. Pretty much every programming language can read JSON files, but they only work well for nested lists.

After you've pinned an object, you can read it back with `pin_read()`:

Expand Down Expand Up @@ -201,10 +201,7 @@ my_data = board_urls("", {
})
```

You can read this data by combining `pin_download()` with `read.csv()`[^1]:

[^1]: Here I'm using `read.csv()` to the reduce the dependencies of the pins package.
For real code I'd recommend using `data.table::fread()` or `readr::read_csv().`
You can read this data by combining `pin_download()` with `read_csv()` from pandas:

```{python}
fname = my_data.pin_download("penguins")
Expand Down