Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interpretation for DWC term - preparations #474

Open
muttcg opened this issue Feb 8, 2021 · 6 comments
Open

Add interpretation for DWC term - preparations #474

muttcg opened this issue Feb 8, 2021 · 6 comments
Assignees

Comments

@muttcg
Copy link
Member

muttcg commented Feb 8, 2021

As part of the VertNet feature we need to interpret preparation field, add it into index and hdfs schemas.

Use VertNet feature branch

@muttcg muttcg changed the title Add interpretation for DWC preparation term Add interpretation for preparation DWC term Feb 8, 2021
@muttcg muttcg changed the title Add interpretation for preparation DWC term Add interpretation for DWC term - preparation Feb 8, 2021
@muttcg muttcg changed the title Add interpretation for DWC term - preparation Add interpretation for DWC term - preparations Feb 8, 2021
@muttcg muttcg self-assigned this Feb 8, 2021
@MattBlissett
Copy link
Member

hasTissue - indicates that the content of the preparations field can be interpreted to infer the existence of material sample(s) that can be used for DNA sequencing.

We don't yet interpret the DWC preparations term, this issue is to add it.

  • Passing the verbatim value through to HDFS will be useful, as it will allow us to see the values in data we already have.
  • We will then need a new vocabulary
  • And a parser using that vocabulary
  • Unlike many other parsers, this term is multi-valued, and each value should probably be parsed individually. The interpreted result is an array.
  • The term will be present in the search index, API, downloads etc as for any other array term (recordedById for example).

VertNet's query "is there tissue that can be used for DNA sequencing" is then an OR-query on preparations for the normalized values VertNet uses: "TISSUE", "BLOOD" etc (not "tiss" as that will become "TISSUE"). (The query excludes preparations like "fossil" or "photograph".)

@MattBlissett
Copy link
Member

Implementation in #477:

hasTissue: es term preparation exists (Parse field DwcTerm.preparation -> if hasTissue -> add original value into index/hdfs)

This is only setting interpreted dwc:preparations if verbatim dwc:preparations matches one of the VertNet tissue types ("tiss", "blood" etc).

We don't need that filter -- people might want to search for other preparations. We should split the value on | etc, and store them in an array.

Later, we can interpret the values ("tiss" → "tissue" etc), but that requires a vocabulary first.

@tucotuco
Copy link

tucotuco commented Feb 12, 2021 via email

@timrobertson100
Copy link
Member

timrobertson100 commented Feb 12, 2021

My recommendation would be to start with adding preparations to the index as a multivalued field of Strings only (no interpretation other than splitting into the array of values). Once in operation, we can build a vocabulary for that to normalize the terms.

As a second, future step I propose we consider the VertNet wish for hasTissue in a broader sense. It may be something to implement, but I feel it would be more useful if we 1) categorize the types of evidence available to support the assertion of occurrence and 2) capture what means there are for a consumer to verify the identification. This could include whether a specimen exists, whether genetic material is available, what media are available that can be reviewed etc. Determining what evidence is available, and whether the identification can be verified is currently difficult as it's littered across DwC terms inconsistently.

@dhobern
Copy link

dhobern commented Feb 15, 2021

My approximate view is that hasTissue/hasPreparations is less important than stillExistsInSomePhysicalFormThatCouldInPrincipleBeExaminedOrStudiedOrSequencedEtc. In other words - is there a specimen or some part of a specimen that remains and can be studied?

Not sure if that helps in any way ...

@tucotuco
Copy link

tucotuco commented Feb 16, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants