Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning temperature data #22

Closed
mpinsky opened this issue Feb 2, 2015 · 4 comments
Closed

Cleaning temperature data #22

mpinsky opened this issue Feb 2, 2015 · 4 comments

Comments

@mpinsky
Copy link
Collaborator

mpinsky commented Feb 2, 2015

I have spent some time looking for outlier surface and bottom temperature values in each region that may be mistakes. There are some that I had not caught in the code for the 2013 Science paper. My latest cleaning code (from my range projection project) is below. It may be useful?

wcann$surftemp = NA # field is not collected, apparently (or was not provided)

# Newfoundland one decimal place. 900 means negative
i = newf$surftemp >= 900 & !is.na(newf$surftemp)
newf$surftemp[i] = -(newf$surftemp[i] - 900)/10
i = newf$surftemp < 900 & newf$surftemp > 0 & !is.na(newf$surftemp)
newf$surftemp[i] = newf$surftemp[i]/10
i = newf$bottemp >= 900 & !is.na(newf$bottemp)
newf$bottemp[i] = -(newf$bottemp[i] - 900)/10
i = newf$bottemp < 900 & newf$bottemp > 0 & !is.na(newf$bottemp)
newf$bottemp[i] = newf$bottemp[i]/10

# Fix -9999 to NA for SST and BT
ai$BOT_TEMP[ai$BOT_TEMP==-9999] = NA
ai$SURF_TEMP[ai$SURF_TEMP==-9999] = NA
ebs$BOT_TEMP[ebs$BOT_TEMP==-9999] = NA
ebs$SURF_TEMP[ebs$SURF_TEMP==-9999] = NA
goa$BOT_TEMP[goa$BOT_TEMP==-9999] = NA
goa$SURF_TEMP[goa$SURF_TEMP==-9999] = NA


# The SST entries on Scotian Shelf in 2010 and 2011 appear suspect. There are very few (as opposed to >1000 in previous years) and are only 0 or 1. There are no entries in 2009.
scot$SURFACE_TEMPERATURE[scot$year %in% c(2009, 2010, 2011)] = NA

# Turn 0 values in GoMex to NA. These are outliers (way too cold) and must be mistakes.
i = which(gmex$TEMP_SSURF == 0)
gmex$TEMP_SSURF[i] = NA
i = which(gmex$TEMP_BOT == 0)
gmex$TEMP_BOT[i] = NA

#0 values in ai July and goa July are much lower than other values, seem suspect
ai$SURF_TEMP[ai$month == 7 & ai$SURF_TEMP==0] = NA
goa$SURF_TEMP[goa$month == 7 & goa$SURF_TEMP==0] = NA
@rBatt
Copy link
Owner

rBatt commented Feb 2, 2015

Nice, thanks.

Are these issues that need to be fixed in the website code?

If so, could you create an issue there and link to the lines of the code that need the change?

If you can make these changes to the website code yourself, could you link that commit (commit of corrections to website code) in a comment on this issue (issue of cleaning temperatures in trawl repo)?

If you don't make and issue that links to line numbers or do the commit (that would allow me to see what pieces of code were changed), could you tell me which corrections are the new ones?

Also, as a general approach, rather than specifying the year and the month etc where an error exists, is there logic than can be applied that is more general? I.e., is there something specific about the temperature value itself that is flawed? E.g., if any temperature was ever below a certain value it should be NA, or if the value is way too cold for a region (e.g., if data is a data.table of trawl values, data[region=="gmex" & stemp < 5, stemp:=NA]).

If it comes down to have a collection of manually-identified errors, we should format them into a 2D structure, save them as a .csv or .txt file, then right code to update the object based on the contents of that file. That way we have a single file that explicitly states the manual corrections we're making (easier to track), and then the code becomes less bloated.

Or, in the least, we could have a separate R script that executes some of the cleaning.

@mpinsky
Copy link
Collaborator Author

mpinsky commented Feb 3, 2015

The OceanAdapt code doesn't deal with temperature (yet).

I don't believe there is any specific logic that could be used universally.

On Mon, Feb 2, 2015 at 9:02 AM, Ryan Batt notifications@github.com wrote:

Nice, thanks.

Are these issues that need to be fixed in the website code?

If so, could you create an issue there and link to the lines of the code
that need the change?

If you can make these changes to the website code yourself, could you link
that commit (commit of corrections to website code) in a comment on this
issue (issue of cleaning temperatures in trawl repo)?

If you don't make and issue that links to line numbers or do the commit
(that would allow me to see what pieces of code were changed), could you
tell me which corrections are the new ones?

Also, as a general approach, rather than specifying the year and the month
etc where an error exists, is there logic than can be applied that is more
general? I.e., is there something specific about the temperature value
itself that is flawed? E.g., if any temperature was ever below a
certain value it should be NA, or if the value is way too cold for a region
(e.g., if data is a data.table of trawl values, data[region=="gmex" &
stemp < 5, stemp:=NA]).

If it comes down to have a collection of manually-identified errors, we
should format them into a 2D structure, save them as a .csv or .txt file,
then right code to update the object based on the contents of that file.
That way we have a single file that explicitly states the manual
corrections we're making (easier to track), and then the code becomes less
bloated.

Or, in the least, we could have a separate R script that executes some of
the cleaning.


Reply to this email directly or view it on GitHub
#22 (comment).

@rBatt
Copy link
Owner

rBatt commented Mar 24, 2015

@mpinsky I have not yet implemented these fixes, and could be related to the low temperature values in #30. I see in your code that some of those fixes involve changing 0's in gmex to NA's.

I haven't gotten around to these yet because there isn't always a simple 1-1 comparison between our code.

I'll need to add this to the master list of data verifications that need to happen (along with taxonomic ID's changing)

@rBatt
Copy link
Owner

rBatt commented Nov 24, 2015

see #36

@rBatt rBatt closed this as completed Nov 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants