# Creating Datapackage to store the processed tables

The [Datapackage standard](https://frictionlessdata.io/specs/data-package/) is a minimal yet effective way to enrich CSV files when storing or distributing datasets.

Refer to e.g. <https://frictionlessdata.io/field-guide/> for more information about the specification, workflows, and ecosystem of existing tools.

In [1]:
from datapackage import Package
from config import PATH_RESULTS

In [2]:
p = Package(base_path=str(PATH_RESULTS))

In [3]:
p.descriptor

{'profile': 'data-package'}

Additional (but necesary) package metadata is stored in a separate file. We add it to the Package object at this stage.

In [4]:
import yaml
import datetime
from config import PATH_DATAPACKAGE, PATH_DATAPACKAGE_METADATA

METADATA = yaml.load(PATH_DATAPACKAGE_METADATA.read_text())
METADATA.update(
    {'created': datetime.datetime.now().isoformat(timespec='seconds')}
)

METADATA

{'name': 'sdwis-wsd',
 'title': 'Data from the Water System Detail endpoint of the SDWIS portal of the California Water Boards',
 'description': 'The data was parsed (scraped) from the rendered HTML pages of the portal and converted to flat (CSV) tables after minimal processing.',
 'sources': [{'title': 'Water System Detail endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/JSP/WaterSystemDetail.jsp'},
  {'title': 'Water System Search endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/index.jsp'},
  {'title': 'Code and input data used to fetch, parse and process the data in the form of Jupyter notebooks (Python)',
   'path': 'https://github.com/waterdatacollaborative'}],
 'contributors': [{'title': 'Ludovico Bianchi',
   'email': 'me@ludob.com',
   'role': 'author',
   'organization': 'Berkeley Water Data Collaborative'},
  {'title': 'Greg Gearhart',
   'email': 'Greg.Gearhart@wate

In [5]:
p.descriptor.update(METADATA)

In [6]:
p.commit()

True

Automatically infer table properties from the output CSV files

In [7]:
p.infer('*.csv')

{'profile': 'tabular-data-package',
 'name': 'sdwis-wsd',
 'title': 'Data from the Water System Detail endpoint of the SDWIS portal of the California Water Boards',
 'description': 'The data was parsed (scraped) from the rendered HTML pages of the portal and converted to flat (CSV) tables after minimal processing.',
 'sources': [{'title': 'Water System Detail endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/JSP/WaterSystemDetail.jsp'},
  {'title': 'Water System Search endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/index.jsp'},
  {'title': 'Code and input data used to fetch, parse and process the data in the form of Jupyter notebooks (Python)',
   'path': 'https://github.com/waterdatacollaborative'}],
 'contributors': [{'title': 'Ludovico Bianchi',
   'email': 'me@ludob.com',
   'role': 'author',
   'organization': 'Berkeley Water Data Collaborative'},
  {'title': 'Greg Gearha

In [8]:
p.descriptor

{'profile': 'tabular-data-package',
 'name': 'sdwis-wsd',
 'title': 'Data from the Water System Detail endpoint of the SDWIS portal of the California Water Boards',
 'description': 'The data was parsed (scraped) from the rendered HTML pages of the portal and converted to flat (CSV) tables after minimal processing.',
 'sources': [{'title': 'Water System Detail endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/JSP/WaterSystemDetail.jsp'},
  {'title': 'Water System Search endpoint of the SDWIS portal of the CA Water Boards',
   'path': 'https://sdwis.waterboards.ca.gov/PDWW/index.jsp'},
  {'title': 'Code and input data used to fetch, parse and process the data in the form of Jupyter notebooks (Python)',
   'path': 'https://github.com/waterdatacollaborative'}],
 'contributors': [{'title': 'Ludovico Bianchi',
   'email': 'me@ludob.com',
   'role': 'author',
   'organization': 'Berkeley Water Data Collaborative'},
  {'title': 'Greg Gearha

In [9]:
p.save(PATH_DATAPACKAGE.with_suffix('.zip'))

True