Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new pat_timeAverageDiagnostics() function #51

Closed
6 tasks done
jonathancallahan opened this issue May 2, 2019 · 2 comments
Closed
6 tasks done

new pat_timeAverageDiagnostics() function #51

jonathancallahan opened this issue May 2, 2019 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@jonathancallahan
Copy link
Member

jonathancallahan commented May 2, 2019

I cloned the source code for openair and tool a look at the implementation of the timeAverage() function.

That function is waaaaay to tied to the openair data model. We can create our own function that does what we want, runs faster and has much more readable code. I've gotten a start in local_examples/PROTOTYPE_pat_timeAverage.R.

The feature set for this function is:

  • accept a pat pat and return a tibble with new data columns on a new time axis
  • accept a unit parameter specifying the new time axis period
  • the returned tibble with have columns with mean, sd and count statistics for each of pm25_A, pm25_B, temperature and humidity. Columns will be named <parameter>_<statistic>
  • the returned tibble will have additional t-test parameters: pm25_t, pm25_df and pm25_p
  • be sure to convert any NaN, Inf or NULL values generated by mean or sd into NA
  • if it takes less than 4 hours, add support for a data.thresh parameter
@jonathancallahan jonathancallahan added the enhancement New feature or request label May 2, 2019
@jonathancallahan jonathancallahan added this to To do in AirSensor 0.3 via automation May 2, 2019
@jonathancallahan jonathancallahan changed the title new pat_timeAverage() function new pat_timeAverageDiagnostics() function May 2, 2019
@hmrtn hmrtn moved this from To do to In progress in AirSensor 0.3 May 6, 2019
hmrtn added a commit that referenced this issue May 9, 2019
hmrtn added a commit that referenced this issue May 9, 2019
@hmrtn
Copy link
Contributor

hmrtn commented May 9, 2019

I think that any sort of statistical analysis should exist independently of the this function. Perhaps a pat_ttest() or in a pat_statistics()?

@jonathancallahan
Copy link
Member Author

jonathancallahan commented May 9, 2019

I disagree. This function is all about applying statistical functions to consecutive chunks of the overall time series and returning vector that represents the value of that statistic calculated for each chunk. Currently, the statical functions you have enabled are mean, sd and count, etc.

The idea for the t-test is to have another function that would behave similar to sd in that it takes the data within a chunk of time, applies an algorithm and returns a number. So you will need to implement another internal function that looks like this:

if ( stats == "ttest_qc" ) {
func <- function(x) {
# use t.test with pm25_A and pm25_B and appropriate parameters
# return the t test statistic or p value as the result
}

The use case is to generate clean, hourly pm25 data with the following steps:

  • remove outliers from raw data
  • pm25 <- calculate AB mean
  • calculate t_statistic
  • badQCIndices <- t_statistic > someThreshold
  • pm25[baQCIndices] <- NA

@hmrtn hmrtn moved this from In progress to Done in AirSensor 0.3 May 10, 2019
@hmrtn hmrtn closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
AirSensor 0.3
  
Done
Development

No branches or pull requests

2 participants