Skip to content
This repository has been archived by the owner on May 19, 2021. It is now read-only.

R interface to EPA's air quality system #20

Open
rdpeng opened this issue Mar 1, 2015 · 5 comments
Open

R interface to EPA's air quality system #20

rdpeng opened this issue Mar 1, 2015 · 5 comments

Comments

@rdpeng
Copy link

rdpeng commented Mar 1, 2015

This has been on my list of things to do for a while now....

The EPA puts a ton of data out on the web from its air quality system. But it takes some time to learn how the data are formatted and how the are measured. Also, downloading from the web site, unzipping, and then reading into R, can be a bit of a pain for the uninitiated. I thought it might be nice to have an R package that serves as an interface to the EPA's site and grabs data that would be useful for air pollution studies. The package could also implement some statistical methods that are commonly used in the preprocessing of the data (for health studies, for example).

@jhollist
Copy link
Member

jhollist commented Mar 1, 2015

Although I am not attending the unconf, I thought I would give this a 👍 .

I work for EPA's Office of Research and Development and although I don't work with the air quality data (I do water quality/landscapey stuff), increasing access to EPA data from R is something I am very interested in. On the water quality side some of this has been done with usgs (cc @jread-usgs), but I believe that is just accessing EPA data that is in STORET, plus a lot of USGS data. There are plenty of other EPA datasets that reside outside of STORET that would be nice to grab (e.g. Environmental Dataset Gateway or EnviroFacts).

I am not trying to hijack @rdpeng 's idea here. I think starting with the air quality system is a great first step and something that could get pretty far along in the two days at the unconf. What I would like to suggest is that the thinking that goes into it be broad enough to form the basis for adding additional EPA data sources. In the absence of a single, well supported EPA data API, a suite of EPA R packages would be the next best thing.

@jordansread
Copy link

I also agree that it would be great to have a friendly interface for these data. @jhollist mentioned some work for water quality data - the Water Quality Portal (http://waterqualitydata.us/) offers RESTlike data services for EPA, state, tribal, USGS, and USDA water quality data. The R package dataRetrieval that uses these services is on CRAN: http://cran.r-project.org/web/packages/dataRetrieval/

For the air quality data - it would be good to connect with some folks at EPA so we could make sure we wouldn't be coding against a moving target (maybe they have some plans to expose an API?). Sounds cool and interesting to me.

@rdpeng
Copy link
Author

rdpeng commented Mar 1, 2015

The EPA is always a moving target. That's just life. One issue is that often different types of data are available via different interfaces. So even if there were a clean API available, it might not have all the data accessible from it. You'd still need to obtain the raw zip files in order to get, for example, organic carbon blank data (or whatever).

@jhollist
Copy link
Member

jhollist commented Mar 1, 2015

@jread-usgs I don't know of any plans to expose our data via an API, but I am only plugged into ORD things. Most of what I know of are development of portals and the such. They may have API's but those are not usually the focus. And @rdpeng is correct that EPA is a moving target. In fact EPA is moving targets because a lot of these kinds of decisions are made on an Office by Office basis.

It would be nice to design the package so that the R front end is consistent (as much as possible anyway) regardless of which EPA (or other agency?!) data is hit.

@maelle
Copy link
Member

maelle commented Dec 31, 2015

(Regarding air quality only) I'd like to mention the good work of the Open AQ folks https://github.com/openaq/openaq-fetch The Open AQ platform could include EPA data sources (it might already include some since it has data from many stations in the US). One can open an issue for suggesting new data sources.

I've written a R package for accessing the Open AQ API https://github.com/masalmon/Ropenaq

Below I'm copying two sentences from the Open AQ website: "In this first iteration of the platform:

  • We are only collecting data from official, stationary government monitors and not smaller-scale or mobile monitoring.
  • We are collecting PM2.5, PM10, ozone (O3), sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), and black carbon (BC)."

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants