Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement useful checks from kvalobs #37

Merged
merged 51 commits into from
Oct 25, 2024
Merged

Implement useful checks from kvalobs #37

merged 51 commits into from
Oct 25, 2024

Conversation

intarga
Copy link
Owner

@intarga intarga commented May 2, 2024

Closes #34 and #1

  • range check
  • range check direction
    • prototype written (range_check_wind_direction) but need to verify Y param and -3.0 special case with Line
  • logger_t
    • implemented as special_values_check
  • freeze check
    • implemented as flatline_check
  • geonor_drift
    • implemented as monotonic_increase_check
    • Why does it have 4 different flags in the log, when the linked code only returns 1 possible flag, and has no parameters
    • Why the hardcoded values 100 and 0.7?
    • Why does this strictly apply to 25 values in a row? 25 seems like a strange number
    • It only returns 1 flag. Does that apply to all 25 values?
    • Verify I got the pass and fail flags the right way around
    • Unsure how to apply this to a whole DataCache. Rolling windows of 25 values, or chunks?
  • humidity_range
    • implemented as range_check_humidity, but need to verify flag on correction
    • what to do about the warnings for 5-15%? Should we put a warn case in this test, or run a separate range check to test for that?
      - [ ] freeze_check_uu
    • The code for this really didn't make sense at all, and has a very obvious mistake in it. I suspect this is based on an early iteration of freeze_check, and then they updated freeze_check while leaving this unchanged. It seems like the only actually relevant part of this is that they exit early if humidity is > 95. I suggest that instead of having a special check for that we just use the regular flatline_check + a range check and handle it in the combi
  • gt
    • implemented at greater_than
    • Verify I got the pass and fail flags the right way around
  • max_gt
    • implemented as max_greater_than_single, the other 3 tests below follow the same naming convention
    • flagging conditions for missing data changed significantly, my version does not flag as pass if there's any missing data (but it can still flag as fail). kvalobs flags only one data point in the sequence, and is inconsistent about which one, I just flag everything together, open to changing that if a clear and coherent argument can be given.
  • min_gt
  • min_lt
  • max_lt
  • prognostic_space_check (compare_with_model)
    • implemented as not_equal
  • remove series/spatial cache at the olympian level
  • unit tests

- [x] badweather_highvisibility (implemented as range_check_pair)
- [ ] mediumclouds_low
- prototype implemented as cloud_consistency_check, but need help to verify types, and provide a better description
- this check only seems to look at clouds, but in the doc it's marked as flagging pressure?

- [ ] cloudbase_hi
- prototype implemented as cloud_consistency_check2, but same comments apply as mediumclouds_low.
- Additionally, I need help picking better names here
- as above, I am concerned that this is used to QC pressure but doesn't actually look at pressure

- [x] lt_eq
- implemented as lower_limit_special_value_pair
- as above, I am concerned that this is used to QC pressure but doesn't actually look at pressure

- [x] gt_eq
- implemented as upper_limit_special_value_pair
- as above, I am concerned that this is used to QC pressure but doesn't actually look at pressure

@intarga intarga added the enhancement New feature or request label May 2, 2024
@intarga intarga self-assigned this May 2, 2024
@intarga intarga linked an issue May 2, 2024 that may be closed by this pull request
@linebaserud
Copy link

My initial thoughts on these questions:

range check direction
prototype written (range_check_wind_direction) but need to verify Y param and -3.0 special case with Line

I do not understand this either. Y wind speed? What is special with -3.0?? Maybe PiM knows who to ask?

logger_t
what is this??? the code looks like it checks control info and a bunch of special values, but it has a comment that says it's a range check?

This should just be our special_values_check, right? Maybe it says so since it is using the same flag, fr/controlinfo(1). I find explanations like "Gjoer tester paa 9999-verdier og returner en ny verdi for grenseverdi-flagg" and "fr=7 Kontrollert. Funnet å svare til spesialverdi som betyr manglende")
From the code it looks like Øystein Lie is the person to ask for more info.

min_gt
prototype written (aggregate_less_than_set) but the test seems flawed to me. I don't think it should be able to pass the aggregate if there are missing values in the set (should be at best DataMissing), and if the condition fails, I think it should fail all elems of the set that fail the condition, not just the lowest. Also what is the adjustment? Where does that magic number come from?

I do not understand this after the first quick look.

geonor_drift
No explanation at all for this test. it checks if a set of 25 values are monotonically increasing, and if the difference between the first and last is in a hardcoded range. Struggling to see the rationale behind this one. Additionally why does it have 4 different flags in the log, when the linked code only returns 1 possible flag, and has no parameters

Geonor should be the precip buckets that fills with water as it rains. They are continuously weighed to measure the precipitation. -> values should be monotonically increasing. During the summer I guess some water can evaporate away. I do not know way they use an extra range check here. I can only see 1 flag in the log, not 4.

On the consistency checks (badweather_highvisibility, mediumclouds_low, cloudbase_hi, lt_eq, gt_eq, maxtemp_nosign, range_max_min):
I agree that these checks do not seem to look at pressure!
These seem to be consistency checks that have names that uses numbers equal to the paramids used by pressure variables:

consistency checks clouds uses 170, 171, 172, 173, and 175:
50 QC1-2-170 - nolowclouds_lowcloudheight2
445 QC1-2-171 - Cu_low
253 QC1-2-172 - mediumclouds_low
55 QC1-2-173 - cloudbase_hi
113 QC1-2-175 - lt_eq

and this one related to seatemp uses 179:
822 QC1-2-179 - flowrate_for_seatemp

and something related to temp uses 177 and 178 in the naming:
1 QC1-2-177b - maxtemp_nosign
1 QC1-2-177 - maxtemp_nosign
18 QC1-2-178 - range_max_min

also visibility/weather uses 176:
14737 QC1-6-176 - gt_eq

Here I find the consistency checks for pressure variables:
https://gitlab.met.no/groups/met/obsklim/bakkeobservasjoner/data-og-kvalitet/kvalobs/-/wikis/kvalobs/qc12#air-pressure
And this is the related statistics from the kvalobs log files (the gt_eq test are in fact used, but now with more relevant input):
58096 CHECK: QC1-2-21 - pressure_nottendency (AA = null & PP ≥ 0)
254 CHECK: QC1-2-22 - tendency_notpressure (AA = 4 & PP ≠ 0)
23262 CHECK: QC1-2-23 - gt_eq (AA = 4 & PP ≠ 0)
537 CHECK: QC1-2-24 - eq_between (PP = 0 & 1 ≤ AA ≤ 3)
1619 CHECK: QC1-2-25 - eq_between (PP = 0 & 6 ≤ AA ≤ 8)
0 CHECK: QC1-2-26 - pressurereduction
(Check on station pressure reduction. The check only applies to stations where reduction is calculated manually and 950 hPa ≤ PR ≤ 1050 hPa. The inconsistency is: | PR-P1 | > ε, where ε is instrument specific)
(125783 FLAG=1, 143487 EMPTY RESULTS)
73298 CHECK: QC1-2-27.b - tendency (abs( PP - |PO - PO(-3h)| ) > 0.3 hPa)

@intarga
Copy link
Owner Author

intarga commented Aug 6, 2024

@linebaserud

I do not understand this either. Y wind speed? What is special with -3.0?? Maybe PiM knows who to ask?

I think Y is wind speed, since it makes sense that wind direction would become meaningless when wind speed is 0, but it's not mentioned in the description. As for the -3.0 it seems like special handling for some specific sensor?

I do not understand this after the first quick look.

I think this is maybe easier to explain in person and with a diagram, we can bring it up at the next meeting

I can only see 1 flag in the log, not 4.

I meant that in your document it is listed as 4 entries:

263 	1719 		QC1-3a-104drift.501 - geonor_drift
295 	1012 		QC1-3a-104drift.502 - geonor_drift
373 	176 		QC1-3a-104drift.330 - geonor_drift
...
466 	24 		QC1-3a-104drift.342 - geonor_drift

But it's unclear to me what (if any) difference there could be between these checks, as the linked code has only 1 function which does not take in any params

I agree that these checks do not seem to look at pressure!
These seem to be consistency checks that have names that uses numbers equal to the paramids used by pressure variables:

Ok so those tests leaked into the list because the regex you used on the logs wasn't quite right? Or is it a mistake that they're named like that? In that case it seems like I shouldn't implement them, right? I feel kind of sad that I spent so long puzzling about this when it never made sense.

You linked the new set of consistency checks that actually are used for pressure, do you want me to implement these? are they actually flagging anything?

@linebaserud
Copy link

@intarga

I do not understand this either. Y wind speed? What is special with -3.0?? Maybe PiM knows who to ask?

I think Y is wind speed, since it makes sense that wind direction would become meaningless when wind speed is 0, but it's not mentioned in the description. As for the -3.0 it seems like special handling for some specific sensor?

Sounds reasonable. I will put it on my list to try to ask some people to verify.

I do not understand this after the first quick look.

I think this is maybe easier to explain in person and with a diagram, we can bring it up at the next meeting

Perfect. It is now in the agenda for the next meeting.

I can only see 1 flag in the log, not 4.

I meant that in your document it is listed as 4 entries:
263 1719 QC1-3a-104drift.501 - geonor_drift
295 1012 QC1-3a-104drift.502 - geonor_drift
373 176 QC1-3a-104drift.330 - geonor_drift
...
466 24 QC1-3a-104drift.342 - geonor_drift
But it's unclear to me what (if any) difference there could be between these checks, as the linked code has only 1 function which does not take in any params

It should be the same test (https://gitlab.met.no/met/obsklim/bakkeobservasjoner/data-og-kvalitet/kvalobs/kvoss_intern/-/blob/master/kvmeta/algorithms/geonor_drift.pl?ref_type=heads), but run for the different typeids (501,502,330,...,342).

I agree that these checks do not seem to look at pressure!
These seem to be consistency checks that have names that uses numbers equal to the paramids used by pressure variables:

Ok so those tests leaked into the list because the regex you used on the logs wasn't quite right? Or is it a mistake that they're named like that? In that case it seems like I shouldn't implement them, right? I feel kind of sad that I spent so long puzzling about this when it never made sense.

You linked the new set of consistency checks that actually are used for pressure, do you want me to implement these? are they actually flagging anything?

Correct, leaked in w/ the regex. Maybe that is showing us that naming most of the tests with the paramid, but also naming lists of tests with the same numbers can be confusing...
We should not implement them for pressure, however, these consistency tests do flag observations so we will need them for checking the cloud observations etc. The document I made for you shows the kvalobs checks for the main parameters (temperature, precip, wind, rh, pressure) that we wanted to start with, but in the end we need to check also all the other observations that kvalobs checks.
The new set is flagging, except for CHECK: QC1-2-26 - pressurereduction. The number of flagged observations is the number just before each test that I listed in my previous comment.

@intarga intarga changed the title Implement useful QC tests from kvalobs Implement useful QC tests from kvalobs and titanlib Oct 14, 2024
@intarga intarga linked an issue Oct 14, 2024 that may be closed by this pull request
@intarga intarga linked an issue Oct 21, 2024 that may be closed by this pull request
@intarga intarga marked this pull request as ready for review October 22, 2024 18:30
@intarga intarga changed the title Implement useful QC tests from kvalobs and titanlib Implement useful QC tests from kvalobs Oct 22, 2024
@intarga intarga changed the title Implement useful QC tests from kvalobs Implement useful checks from kvalobs Oct 22, 2024
@intarga intarga requested a review from Lun4m October 24, 2024 09:26
Copy link
Collaborator

@Lun4m Lun4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't fully looked into buddy_check and src 🥲

src/checks/consistency/greater_than.rs Show resolved Hide resolved
src/checks/consistency/max_greater_than_single.rs Outdated Show resolved Hide resolved
src/checks/consistency/max_greater_than_single.rs Outdated Show resolved Hide resolved
src/checks/consistency/max_greater_than_single.rs Outdated Show resolved Hide resolved
src/checks/consistency/max_less_than_single.rs Outdated Show resolved Hide resolved
src/checks/spatial/buddy_check.rs Outdated Show resolved Hide resolved
src/checks/spatial/buddy_check.rs Show resolved Hide resolved
src/checks/spatial/buddy_check.rs Outdated Show resolved Hide resolved
src/checks/spatial/buddy_check.rs Outdated Show resolved Hide resolved
@intarga intarga merged commit 2a8cc87 into trunk Oct 25, 2024
1 check passed
@intarga intarga deleted the kvalobs_tests branch October 25, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants