Water Project Datasets

These data packages are part of the San Diego Water Quality Data Project. The source directory contains data packages for original extracts from upstream soruces, and the derived directory holds processed packages that combine multiple measures and add features.

Quickstart

The easiest way to get these datasets is to visit the water-project tag in our data repository.

You can get programatic access to these in metapack:

$ pip install metapack
$ mp run http://library.metatab.org/sandiegodata.org-beachwatch-3.csv#stations

Or, in a Jupyter notebook:

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-beachwatch-3.csv')
df = pkg.resource('stations').read_csv()
df.head()

Building the Datasets

First thing is to install the Python modules in requirements.txt, usually:

make install

To build all of the datasets, run make. The key targets are:

make build: build all of the datasets
make s3: Build, then upload to s3
make ckan: Build, upload to s3, then submit to CKAN.

The S3 target requires S3 credentials to be set for boto for the bucket library.metatab.org, which is usually done with either:

A credentials file in ~/.aws/credentials
Environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Refer to the boto credentials documentataion for details.

The ckan target requires two environmental variable to be set:

METAKAN_CKAN_URL, with the base URL to the CKAN repository
METAKAN_API_KEY, With the API key for an account, which you can get from a user's page in CKAN.

There are also clean targets for all of the build targets:

clean-build
clean-s3
clean-ckan

Using Datasets

These datasets are all published to the Data Library's CKAN Repo, so it's easiest to get them from there.

If you are building the datasets for local analysis, you'll probably want to index them and use them through the index. First, build the datasets with make, then index them:

make index

Then you can check the index by listing the dataset with:

mp search -l

Or search for one with :

mp search beachwatch

Now you can use the names in package names that have an 'index:' prefix, usually also with the version number removed. For instance:

mp run index:sandiegodata.org-beachwatch#stations

Or, in Python/Jupyter:

import metapack as mp
mp.jupyter.init()
pkg = mp.open_package('index:sandiegodata.org-beachwatch#stations')
df = pkg.default_resource.read_csv()
print(df.head())

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
derived		derived
docker		docker
source		source
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
include.mk		include.mk
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Project Datasets

Quickstart

Building the Datasets

Using Datasets

About

Releases

Packages

Languages

san-diego-water-quality/water-datasets

Folders and files

Latest commit

History

Repository files navigation

Water Project Datasets

Quickstart

Building the Datasets

Using Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages