Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update datapackage.json so it includes field names from etn_fields.csv #290

Open
PietrH opened this issue Feb 12, 2024 · 3 comments
Open
Assignees
Labels
actionable Can be implemented documentation Improvements or additions to documentation enhancement New feature or request

Comments

@PietrH
Copy link
Member

PietrH commented Feb 12, 2024

Jesus pointed this out to me via email:

Hi Pieter,

I need to update this datapackage.json so it includes the field definitions found[ here](https://github.com/inbo/etn/blob/main/inst/assets/etn_fields.csv). Would you like to apply these changes yourself, or otherwise, would you be ok with me applying the changes onto this repository?

Best,
Jesus.

I wonder how this relates to: https://github.com/inbo/etn/milestone/4

Specifically #226

Questions

  • Is inst/assets/etn_fields.csv still correct?
  • Is inst/assets/etn_fields.csv somehow automatically updated? Or kept in sync with datapackage.json?
  • Should there be a test to keep sure they are in sync?
  • https://inbo.github.io/etn/articles/etn_fields.html claims we are working on an update, if so, we have been working on one for a while. Can we pick this up again? What is the status?

I remember updating datapackage.json a good while ago, and I certainly know about etn_fields.csv...

@PietrH PietrH added enhancement New feature or request question Further information is requested documentation Improvements or additions to documentation labels Feb 12, 2024
@PietrH PietrH self-assigned this Feb 12, 2024
@PietrH
Copy link
Member Author

PietrH commented Feb 12, 2024

@peterdesmet, I have a few questions:

@PietrH
Copy link
Member Author

PietrH commented Feb 12, 2024

Did a little digging and found quite a few fields in datapackage.json that aren't in etn_fields.csv, as is expected since I remember keeping datapackage.json up to date but not etn_fields.csv. To replicate:

# is the field definitions csv up to date with the datapackage.json? 


datapackage_json <-
  jsonlite::read_json("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/datapackage.json")

field_definitions <- readr::read_csv("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/etn_fields.csv")
#> Rows: 182 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): view, field, definition, example
#> dbl (1): order
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.


# parse datapackage_json to a table ---------------------------------------

datapackage_tbl <-
  datapackage_json |>
  purrr::chuck("resources") |>
  purrr::map(
    \(resource) purrr::set_names(
      purrr::chuck(resource, "schema", "fields"),
      purrr::chuck(resource, "name")
    )
  ) |>
  purrr::map(
    \(resource) purrr::map(
      resource,
      ~ dplyr::tibble(
        name = purrr::pluck(.x, "name"),
        type = purrr::pluck(.x, "type"),
        resource_name = unique(names(resource))
      )
    )
  ) |>
  purrr::map(purrr::list_rbind) |>
  purrr::list_rbind()
  

# modify field_definitons -------------------------------------------------

field_definitions_rn <- 
  field_definitions |> 
  dplyr::mutate(
    resource_name = 
      stringr::str_extract(view, "^[a-z]+(?=_)")
  )


# See if any are missing --------------------------------------------------

# fields that are in datapackage.json but not in etn_fields.csv:
dplyr::anti_join(
  datapackage_tbl,
  field_definitions_rn,
  by = dplyr::join_by(resource_name == resource_name,
                      name == field)
) |>
  print(n = Inf)
#> # A tibble: 45 × 3
#>    name                        type     resource_name
#>    <chr>                       <chr>    <chr>        
#>  1 animal_id                   integer  animals      
#>  2 animal_project_code         string   animals      
#>  3 tag_serial_number           integer  animals      
#>  4 tag_type                    string   animals      
#>  5 tag_subtype                 string   animals      
#>  6 acoustic_tag_id             string   animals      
#>  7 acoustic_tag_id_alternative string   animals      
#>  8 tag_serial_number           integer  tags         
#>  9 tag_type                    string   tags         
#> 10 tag_subtype                 string   tags         
#> 11 acoustic_tag_id             string   tags         
#> 12 acoustic_tag_id_alternative string   tags         
#> 13 manufacturer                string   tags         
#> 14 model                       string   tags         
#> 15 activation_date             datetime tags         
#> 16 length                      number   tags         
#> 17 diameter                    number   tags         
#> 18 weight                      number   tags         
#> 19 floating                    boolean  tags         
#> 20 archive_memory              string   tags         
#> 21 sensor_range_min            integer  tags         
#> 22 sensor_range_max            integer  tags         
#> 23 sensor_resolution           number   tags         
#> 24 sensor_unit                 string   tags         
#> 25 sensor_accuracy             number   tags         
#> 26 owner_organization          string   tags         
#> 27 tag_id                      string   tags         
#> 28 tag_device_id               integer  tags         
#> 29 detection_id                integer  detections   
#> 30 tag_serial_number           integer  detections   
#> 31 acoustic_tag_id             string   detections   
#> 32 acoustic_project_code       string   detections   
#> 33 depth_in_meters             number   detections   
#> 34 sensor2_value               number   detections   
#> 35 sensor2_unit                number   detections   
#> 36 deployment_id               integer  detections   
#> 37 deployment_id               integer  deployments  
#> 38 acoustic_project_code       string   deployments  
#> 39 activation_date_time        datetime deployments  
#> 40 valid_data_until_date_time  datetime deployments  
#> 41 receiver_model              string   receivers    
#> 42 receiver_serial_number      string   receivers    
#> 43 owner_organization          string   receivers    
#> 44 built_in_acoustic_tag_id    string   receivers    
#> 45 ar_model                    string   receivers

Created on 2024-02-12 with reprex v2.0.2

@peterdesmet
Copy link
Member

As far as I know datapackage.json, is currently up to date. Is etn_fields.csv? AKA, did #226 ever get done?
What is the status of https://inbo.github.io/etn/articles/etn_fields.html ?

No, #226 never got done. Here's how the HTML page and CSV file originally worked:

  • The 5 tables (animals, detections, etc.) where views in the database. The definitions of the fields of those views were defined in a separate database table vliz.datapaper_metadata_fields, with a field indicating in what view they are used.
  • It was unclear if the table was a good source: it was hard to maintain, the order field was clunky and it never covered all the fields 100%. It was also unclear if the definitions you could see in the web application (behind the ? button) were derived from that table or stored elsewhere.
  • Nonetheless, it was important to show the information in a public overview, which is why etn_fields.Rmd was created.

With database restructuring, the views disappeared and the table was renamed to app.field_metadata. The etn_fields.Rmd could not be reinstated to its functioning form.

What I suggest: let's maintain all information in a datapackage.json file:

  • One source for the information: at least for the R package, can be used by the application too if needed.
  • We have the option to maintain it (here on GitHub)
  • Generated data packages have the definitions included
  • A human-readable page can be created from that file. Either with jekyll (but then it's better to have a separate repo) or with etn_fields.Rmd

The information for a field would look like this:

{
  "name": "capture_temperature_change",
  "description": "Difference between water temperature of the system where the fish was caught and the water temperature of the holding reservoir.",
  "type": "string",
  "unit": "degrees celsius",
  "example": "5ºC"
 }

@PietrH PietrH added actionable Can be implemented and removed question Further information is requested labels Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable Can be implemented documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants