Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PullBio.fn enhancement / documentation of filtering #45

Open
4 tasks
Tracked by #49
kellijohnson-NOAA opened this issue May 13, 2021 · 1 comment
Open
4 tasks
Tracked by #49

PullBio.fn enhancement / documentation of filtering #45

kellijohnson-NOAA opened this issue May 13, 2021 · 1 comment
Labels
enhancement warehouse Pertains to getting, documenting, or fixing data in the warehouse.

Comments

@kellijohnson-NOAA
Copy link
Contributor

Lingcod_2021#21 in part summarizes how we compared the number of ages available from nwfscSurvey::PullBio.fn to the number of ages per year that Patrick said we should see in the warehouse. @brianlangseth-NOAA and I then found that there are some ages that are filtered using a logical process. From this process, I have the following suggestions for enhancements to this package:

  • if verbose = TRUE, then print a summary of what was removed
  • add documentation on the filtering process
  • potentially make an option to not filter
  • add UrlText as an attribute of the data so users can navigate to a web browser and extract the data if they want or at least be able to see the url that was used; this attribute could be named url and wouldn't change how the average user interacts with the returned object at all
@iantaylor-NOAA
Copy link
Contributor

I just discovered https://www.webapps.nwfsc.noaa.gov/data/metadata/trawl.catch_fact which provides information on the various fields available in the data warehouse and saves us bugging Curt with every question about this stuff.

The field target_station_design_dim$stn_invalid_for_trawl_date_whid discussed in pfmc-assessments/lingcod#21 (comment) is described as "station design invalidation date warehouse identifier".

I also found a reason_stn_invalid (for the target station) and actual_station_design_dim$reason_station_invalid (for the actual station), both with description "reason station was invalidated, multiple reasons are comma separated" and allowed values

  • California State MPA,
  • Cowcod Conservation Area,
  • ENC Avoidance,
  • Farallon Island Radioactive Waste Dump,
  • ONMS MPA,
  • ONMS Permit Exclusion,
  • Outside Survey Extent,
  • Mooring,
  • San Clemente Is. Military test range

It seems useful to be able to extract one of those reason_station_invalid to better understand the reasons for any invalid stations that get filtered.

On a deeper level, my vague memory of the reason for this filter is that for design-based indices, there was interest in having the trawl footprint remain consistent across all years. Therefore, when a station was excluded from the survey, any hauls in that station prior to the exclusion were considered invalid. With the shift to geostatistical approaches to index standardization, we may want to re-think that exclusion. It does seem like our status-quo methods for creating age and length compositions could be sensitive to this choice if areas with more large or small fish get removed from the survey design. Regardless, it might be good to have a conversation with Curt about this in the future.

@kellijohnson-NOAA kellijohnson-NOAA added the warehouse Pertains to getting, documenting, or fixing data in the warehouse. label Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement warehouse Pertains to getting, documenting, or fixing data in the warehouse.
Projects
None yet
Development

No branches or pull requests

2 participants