-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt all download formats and exports to use the newly added multivalue fields in pipelines #283
Comments
@dshorthouse we are changing some fields to be arrays instead of strings (see above) and some of these fields are included in the bionomia downloads. I changed them to be arrays too, you can see the changes here. Is this ok to you? you can also test it in UAT if you want. It's not in production yet. |
Thanks, @marcos-lg. I'm not sure what are the implications here, but it sounds like you have introduced a mechanism to explode a string into an array for |
yes @dshorthouse. We are now interpreting those fields and we converted them into an array because sometimes they contain more than 1 value and this way we can improve the search in our portal and in downloads. But it's ok, I'll change the bionomia download to use the verbatim fields for |
I just took a closer look at how @MattBlissett had made the queries at https://github.com/gbif/occurrence/blob/dev/occurrence-download/src/main/resources/download-workflow/bionomia/hive-scripts/execute-bionomia-query.q#L89 and it looks like he's use |
Right. Then we just need to remove the |
Aha - I drop those two columns in the spark queries at my end and use That said, we might one day work on an Elasticsearch plugin to properly contend with material in |
All the downloads formats are adapted and in PROD now. |
The issue gbif/pipelines#665 brought some new interpreted fields and changed the typeStatus from string to array.
Some of the new fields added were used before as strings because they were being carried from the verbatim values. But now they are interpreted fields in the basic record.
You can see the changes done in the avro schemas here.
All the download formats and cloud exports needs to be adapted to these changes to either use arrays or convert the arrays into strings.
The changes for ES search and Dwc and csv downloads are here but should be reviewed too.
The text was updated successfully, but these errors were encountered: