csvapi

"Instantly" publish an API for a CSV hosted anywhere on the internet. Also supports Excel files.

This tool is used by data.gouv.fr to show a preview of hosted CSV and XLS files.

Installation

Requires Python 3.9+ and a Unix OS with the file command available.

python3 -m venv pyenv && . pyenv/bin/activate
pip install csvapi

For development:

poetry install

Quickstart

poetry run csvapi serve -h 0.0.0.0 -p 8000

Command line options

$ poetry run csvapi serve --help
Usage: csvapi serve [OPTIONS]

Options:
    --ssl-key TEXT             Path to SSL key
    --ssl-cert TEXT            Path to SSL certificate
    --cache / --no-cache       Do not parse CSV again if DB already exists
    --reload                   Automatically reload if code change detected
    --debug                    Enable debug mode - useful for development
    -p, --port INTEGER         port for server, defaults to 8001
    -h, --host TEXT            host for server, defaults to 127.0.0.1
    --dbs DIRECTORY            Where to store sqlite DBs
    --help                     Show this message and exit.

Deploy

With SSL, using Hypercorn:

hypercorn csvapi.webservice:app -b 0.0.0.0:443 --keyfile key.pem --ca-certs cert.pem

See the documentation for more options.

You can use the environment variable CSVAPI_CONFIG_FILE to point to a custom configuration file.

API usage

Conversion

/apify?url=http://somewhere.com/a/file.csv

This converts a CSV to an SQLite database (w/ agate) and returns the following response:

{"ok": true, "endpoint": "http://localhost:8001/api/cde857960e8dc24c9cbcced673b496bb"}

Parameters

Some parameters can be used in the query string.

`encoding`

default: automatic detection

You can force an encoding (e.g. utf-8) using this parameter, instead of relying on the automatic detection.

Data API

This is the endpoint attribute of the previous response.

/api/<md5-url-hash>

This queries a previously converted API file and returns the first 100 rows like this:

    {
        "ok": true,
        "rows": [[], []],
        "columns": [],
        "query_ms": 1
    }

Parameters

Some parameters can be used in the query string.

`_size`

default: 100

This will limit the query to a certain number of rows. For instance to get only 250 rows:

/api/<md5-url-hash>?_size=250

`_sort` and `_sort_desc`

Use those to sort by a column. sort will sort by ascending order, sort_desc by descending order.

/api/<md5-url-hash>?_sort=<column-name>

`_offset`

Use this to add on offset. Combined with _size it allows pagination.

/api/<md5-url-hash>?_size=1&_offset=1

`_shape`

default: lists

The _shape argument is used to specify the format output of the json. It can take the value objects to get an array of objects instead of an array of arrays:

/api/<md5-url-hash>?_shape=objects

For instance, instead of returning:

{
    "ok": true,
    "query_ms": 0.4799365997,
    "rows": [
        [1, "Justice", "0101", 57663310],
        [2, "Justice", "0101", 2255129],
        [3, "Justice", "0101", 36290]
    ],
    "columns": ["rowid", "Mission", "Programme", "Consommation de CP"]
}

It will return:

{
    "ok": true,
    "query_ms": 2.681016922,
    "rows": [
    {
        "rowid": 1,
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 57663310
    },
    {
        "rowid": 2,
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 2255129
    },
    {
        "rowid": 3,
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 36290
    }],
    "columns": ["rowid", "Mission", "Programme", "Consommation de CP"]
}

`_rowid`

default: show

The _rowid argument is used to display or hide rowids in the returned data. Use _rowid=hide to hide.

/api/<md5-url-hash>?_shape=objects&_rowid=hide

{
    "ok": true,
    "query_ms": 2.681016922,
    "rows": [
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 57663310
    },
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 2255129
    },
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 36290
    }],
    "columns": ["Mission", "Programme", "Consommation de CP"]
}

`_total`

default: show

The _total argument is used to display or hide the total number of rows (independent of pagination) in the returned data. Use _total=hide to hide.

{
    "ok": true,
    "query_ms": 2.681016922,
    "rows": [
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 57663310
    },
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 2255129
    },
    {
        "Mission": "Justice",
        "Programme": "0101",
        "Consommation de CP": 36290
    }],
    "columns": ["Mission", "Programme", "Consommation de CP"],
    "total": 3
}

Column based filters

By adding {column}__{comparator}={value} to the query string, you can filter the results based on the following criterions:

{column} must be a valid column in your CSV
{comparator} is exact (SQL = {value}) or contains (SQL LIKE %{value}%)
{value} is the value you're filtering the column against

You can add multiple filters, they will be joined with a AND at the SQL level.

Credits

Inspired by the excellent Datasette.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.circleci		.circleci
benchmark		benchmark
csvapi		csvapi
dbs		dbs
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
bumpr.rc		bumpr.rc
config.py		config.py
poetry.lock		poetry.lock
profiling-minimal.yml		profiling-minimal.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

License

etalab/csvapi

Folders and files

Latest commit

History

Repository files navigation

csvapi

Installation

Quickstart

Command line options

Deploy

API usage

Conversion

Parameters

encoding

Data API

Parameters

_size

_sort and _sort_desc

_offset

_shape

_rowid

_total

Column based filters

Credits

About

Resources

License

Stars

Watchers

Forks

Languages

`encoding`

`_size`

`_sort` and `_sort_desc`

`_offset`

`_shape`

`_rowid`

`_total`