Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Submission: HakaiPruthMooringProvisional #17

Closed
11 of 12 tasks
raytula opened this issue Mar 2, 2021 · 24 comments
Closed
11 of 12 tasks

Dataset Submission: HakaiPruthMooringProvisional #17

raytula opened this issue Mar 2, 2021 · 24 comments
Assignees

Comments

@raytula
Copy link
Contributor

raytula commented Mar 2, 2021

Hakai Dataset Submission

Below are listed all the different steps related to the initial submission of a dataset.

A more detailed written and visual description of every step is available respectively
here and here.

Submission steps

Initial Submission (Data Administrator)

  • Original Data Submission
  • CIOOS Metadata Form completed

ERDDAP Dataset Creation (Data Integrator)

  • Dataset Transformation (Format label)
    • 🟢 Format Compatible
    • 🟡 Format Minor Revisions
    • 🟠 Format Major Revisions
    • 🔴 Format Incompatible/Missing Information
  • Near Real-time Data Integration
  • QARTOD Integration
  • ERDDAP Integration
  • ERDDAP Dataset Documentation
  • ERDDAP Test Locally
  • Add Dataset to Development Branch

Dataset Review (Data Administrator)

  • Metadata Record
  • CKAN (dev)
  • Dataset Development Branch Revision (Reviewer Label)
    • 🟢 Reviewer Approved
    • 🟡 Reviewer Minor Revisions
    • 🟠 Reviewer Major Revisions

Dataset Completion (Data Integrator)

  • Merge Development Dataset to Production Branch
  • COMPLETED
@raytula raytula self-assigned this Mar 2, 2021
@raytula
Copy link
Contributor Author

raytula commented Mar 2, 2021

The real-time data from this node is available through a database view.

Here you can view the daily data.

                          View "sn.PruthMooring:5minuteSamples"
             Column              |           Type           | Collation | Nullable | Default 
---------------------------------+--------------------------+-----------+----------+---------
 measurementTime                 | timestamp with time zone |           |          | 
 PruthMooring:WaterTemp_0m_QL    | smallint                 |           |          | 
 PruthMooring:WaterTemp_0m_QC    | character varying        |           |          | 
 PruthMooring:WaterTemp_0m_UQL   | integer                  |           |          | 
 PruthMooring:WaterTemp_0m_Med   | double precision         |           |          | 
 PruthMooring:WaterTemp_0m_Avg   | double precision         |           |          | 
 PruthMooring:WaterTemp_0m_Min   | double precision         |           |          | 
 PruthMooring:WaterTemp_0m_Max   | double precision         |           |          | 
 PruthMooring:WaterTemp_0m_Std   | double precision         |           |          | 
...

@raytula
Copy link
Contributor Author

raytula commented Mar 2, 2021

@JessyBarrette now that the automated QC configuration has been improved a bit, I'm looking at how to create an ERDDAP dataset for this.

I'm currently thinking that we should share the five minute data, as we did for many of the other datasets mapped from the sensor network database.

However, I not sure how to handle the measurements at different depths properly. That is, can/should we continue to define a set of dataset fields for each measurements (ie. five minute average or median value, UQL level, secondary QC flag) and assign a depth value to each measurement ..... or will we need to handle the measurement at each depth as a separate ERDDAP record? Not sure if I'm explaining this right or not. Maybe we should discuss.

@raytula raytula closed this as completed Mar 2, 2021
@JessyBarrette JessyBarrette reopened this Mar 2, 2021
@JessyBarrette
Copy link
Collaborator

dataset or datasetS
That is a good question, my first intuition would be to follow what we've done with other datasets and create one unique ERDDAP dataset, perhaps even on a single dataset for both sites (but I guess that may be still a debate with the CIOOS Metadata group), and it may not be simplest if we're interfacing ERDDAP with a view of the database.

Sample Rate
Yes, I think it would be best to just provide the finest sampling rate available. However, I think the tidbits themself are sampled every 10 minutes, while there's some interpolation (maybe something else) to a higher rate of 5min. Likely those stats aren't really meaningful in this specific case since they are related to just a single value every 10 minutes.

Sort variables
I think it would be best to stack each temperature and associated stats and flags to single variables and create a depth variable that describes each record's specific depth. It will definitely make the data easier to handle through erddap like any other datasets. Ideally, I think it would also be good to capture the serial number of the tidbits, I'm not sure if that is captured in the database. All this may be a big issue to deal with by using the database data. It would easy to handle by generating netcdfs, but it may be just because I'm more familiar with the NetCDF erddap datasets.

@raytula
Copy link
Contributor Author

raytula commented Mar 3, 2021

Thanks @JessyBarrette I was thinking of one ERDDAP dataset.

Re: I think it would be best to stack each temperature and associated stats and flags to single variables and create a depth variable that describes each record's specific depth. It will definitely make the data easier to handle through erddap like any other datasets....

That is the key thing I was wondering about. @n-a-t-e do you know if/how we can transform the results returned from the database to do this? Specifically, the database will return separate columns with measurements at different depths, and we would like the ERDDAP dataset to break it out to provide (time, depth, measurement QC) columns instead -- assuming I understand @JessyBarrette correctly. A representative database view is PruthMooring:5minuteSamples.

@n-a-t-e
Copy link
Member

n-a-t-e commented Mar 3, 2021

We would have to sort this out at the database level, creating views with one depth column instead of a column for each depth, and then connect that to ERDDAP. Should be easy enough to do

@raytula
Copy link
Contributor Author

raytula commented Mar 3, 2021

We would have to sort this out at the database level, creating views with one depth column instead of a column for each depth, and then connect that to ERDDAP. Should be easy enough to do

Cool. I'm a database view novice, and have not done that type of thing before. Can you take a crack at defining such a database view? I'm thinking it would be helpful to have a workflow available to define custom views that are automatically (re)created when the underlying tables are (re)created and are used by ERDDAP.

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Mar 3, 2021 via email

@raytula
Copy link
Contributor Author

raytula commented Mar 3, 2021

Ideally, you would also need to generate a site, lat, long and source file name variable. QU5 has been at the same site the whole time while Pruth moved to another location after one year. I can get the official locations for you.

On Wed, Mar 3, 2021 at 9:09 AM Ray Brunsting @.***> wrote: We would have to sort this out at the database level, creating views with one depth column instead of a column for each depth, and then connect that to ERDDAP. Should be easy enough to do Cool. I'm a database view novice, and have not done that type of thing before. Can you take a crack at defining such a database view? I'm thinking it would be helpful to have a workflow available to define custom views that are automatically (re)created when the underlying tables are (re)created and are used by ERDDAP. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHICYONCV2JDUHTZKBP53CLTBZUNPANCNFSM4YPWPDHQ .
-- Jessy Barrette M.Sc. Marine Instrumentation Specialist Hakai Institute https://www.hakai.org/ | jessy.barrette@hakai.org | (C) (250) 208-7806

Having the database view populate and return the latitude and longitude may be a good approach, assuming it could populate those fields with different values based on date/time. In a few cases (e.g. KC Buoy) the lat/long are stored in the SN database and can be passed through to ERDDAP. In many other cases, the lat/long are fixed and could be added into the view as such (ie. most terrestrial sensor nodes that don't move).

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Mar 3, 2021 via email

@n-a-t-e
Copy link
Member

n-a-t-e commented Mar 3, 2021

As @JessyBarrette is saying, we can set the lat/long to fixed values right in datasets.xml when appropriate. So it could be in the view but wouldn't have to be

@raytula
Copy link
Contributor Author

raytula commented Mar 3, 2021

As @JessyBarrette is saying, we can set the lat/long to fixed values right in datasets.xml when appropriate. So it could be in the view but wouldn't have to be

Can that be done based on time? (ie. like with the Pruth Mooring where the mooring was relocated after the first year). If so, cool.

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Mar 3, 2021 via email

@n-a-t-e
Copy link
Member

n-a-t-e commented Mar 3, 2021

@JessyBarrette oh right, forgot that you are combining sites.

I created a view sn_unpivot."PruthMooring:5minuteSamples" with columns: measurementTime, depth, watertemp_ql, watertemp_qc, watertemp_uql, watertemp_med, watertemp_avg, watertemp_min, watertemp_max, watertemp_std. We can create a test ERDDAP dataset based on this and the set it up to auto-recreate as needed

@raytula
Copy link
Contributor Author

raytula commented Mar 3, 2021

Nice. I tried our the following queries and it worked well.

hakai=> select * from sn_unpivot."PruthMooring:5minuteSamples" where "measurementTime">'2020-06-01';
hakai=> select * from sn_unpivot."PruthMooring:5minuteSamples" where "measurementTime">'2020-06-01' and depth=10;
hakai=> select * from sn_unpivot."PruthMooring:5minuteSamples" where "measurementTime">'2020-06-01' and depth=10;

As a general approach, perhaps we should maybe put the database view definitions in this repo and automatically run them when the sensor network database changes....like you are doing now. I can imagine a bunch of situations where explicitly defined database views will work better than auto-generated. I'm not sure about performance/turning, but I imagine you can do things to address any performance issues.

@JessyBarrette
Copy link
Collaborator

Here's the moorings location over time

Site Start time End Time Latitude Longitude Latitude degrees north Longitude degrees east
Pruth 2017-12-03 00:00:00PST 2018-11-15 15:00:00 PST 51° 39.3810' N 128° 05.4890' W 51.65635 -128.0914833
Pruth 2018-11-15 15:00:00 PST onGoing 51°39.126'N 128°05.122'W 51.6521 -128.0853667
QU5M 2018-10-05 00:00:00 PST onGoing 50°07.20082'N 125°12.7317'W 50.12001367 -125.212195

@JessyBarrette
Copy link
Collaborator

@raytula Just to make things clear regarding those two Tidbits Mooring datasets. Should regroup them together as one dataset, or keep the two sites separate?

@raytula
Copy link
Contributor Author

raytula commented Mar 8, 2021

@raytula Just to make things clear regarding those two Tidbits Mooring datasets. Should regroup them together as one dataset, or keep the two sites separate?

Hmm. Not sure. I would be inclined to keep them separate, so let's start with that. Nate will be putting the related database views in this repo and working with you to define the datasets.xml fragments to get those views integrated into ERDDAP.

If we were to combine them, would it make sense to bundle it with the Provisional River's Inlet mooring data?

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Mar 8, 2021 via email

@JessyBarrette
Copy link
Collaborator

@raytula
Copy link
Contributor Author

raytula commented Mar 17, 2021

Looks good to me.

@JessyBarrette
Copy link
Collaborator

The metadata form was updated to reflect the provisional aspect

@raytula
Copy link
Contributor Author

raytula commented Mar 17, 2021

Yeah, nice to see that graph. I kind of feel like ERDDAP is partly a much more functional version of the Hakai sensor network tool --- with neither being super easy to use without spending some time to explore and perhaps have someone prepare some great examples.

image

@JessyBarrette
Copy link
Collaborator

This dataset is now available online. We can close that issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants