Air Pollution

My first programming assignment for [Coursera Data Science Specialization] (https://www.coursera.org/specialization/jhudatascience/1) course [R Programming] (https://www.coursera.org/course/rprog). The requirement was to write three functions to interact with an air pollution dataset. The dataset is contained in a zip file specdata.zip.

Data

The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". Each file contains three variables:

Date: the date of the observation in YYYY-MM-DD format (year-month-day)
sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter)
nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter)

pollutantmean.R

This file contains the function named ‘pollutantmean’ that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function ‘pollutantmean’ takes three arguments: ‘directory’, ‘pollutant’, and ‘id’. The function then reads the particulate matter data for the corresponding monitor(s) from the directory specified in the ‘directory’ argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA. Some things to know:

’directory’ is a character vector of length 1 indicating the location of the CSV files
’pollutant’ is a character vector of length 1 indicating the name of the pollutant for which we will calculate the mean; either "sulfate" or "nitrate".
‘id’ is an integer vector indicating the monitor ID number(s) to be used

complete.R

This file contains the function named ‘complete’ that reads a directory full of files and reports the number of completely observed cases in each data file. The function returns a data frame where the first column is the name of the file and the second column is the number of complete cases (“nobs”). Some things to know:

’directory’ is a character vector of length 1 indicating the location of the CSV files
’id’ is an integer vector indicating the monitor ID number(s) to be used

corr.R

This file contains the function named ‘corr’ that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the specified threshold. The function returns a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function returns a numeric vector of length 0. Some things to know:

‘directory’ is a character vector of length 1 indicating the location of the CSV files
‘threshold’ is a numeric vector of length 1 indicating the number of completely observed observations (on all variables) required to compute the correlation between nitrate and sulfate; the default is 0

Setup

Unzip the data file which will create the directory ‘specdata’. In each file you’ll notice that there are many days where either sulfate or nitrate (or both) are missing (coded as NA). This is common with air pollution monitoring data in the United States. Once you’ve unzipped the data file, call the functions and supply the required arguments as listed above.

Contributing

As this is a course assignment, it’s not open to contributions.

License

GNU General Public License

This file is part of Air Pollution.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Air Pollution

Data

pollutantmean.R

complete.R

corr.R

Setup

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
complete.R		complete.R
corr.R		corr.R
pollutantmean.R		pollutantmean.R
specdata.zip		specdata.zip

stef2dotoh/airpollution

Folders and files

Latest commit

History

Repository files navigation

Air Pollution

Data

pollutantmean.R

complete.R

corr.R

Setup

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages