Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL-473 Provide endpoint for PAM to update LH sample with yes or no ‘Value in Sequencing #35

Closed
rl15 opened this issue May 19, 2020 · 9 comments · Fixed by #39
Closed

Comments

@rl15
Copy link

rl15 commented May 19, 2020

User story
GPL-473 | As Stakeholders in sampling strategy (Keith A, Rob A & John S) we would like an endpoint that for a given root sample id updates value in sequencing with yes or no for SSR to use this in +ves lighthouse samples on site (see GPL-472)

Who the primary contacts are for this story
Rich L
Rich C
Christoph P (PAM)
Danni W (SM)
Acceptance criteria
Please see:
https://ssg-confluence.internal.sanger.ac.uk/x/iKMKBg
Sequence flow diagram, messages from SSR Actor to Sample crawler

To be considered successful the solution updates fields:

  1. Lighthouse samples all start initially with field 'value in sequencing' set to 'unknown', with updated date set
  2. Via API for a given root sample ID allow value to be set to yes or no + updated date
  3. New fields in report available at https://psd.pages.internal.sanger.ac.uk/lighthouse_reports/ of
  1. value in sequencing {yes, No, Unknown}
  2. Date value in sequencing set

Additional context
Is it wise to have a preshared API key for this?
API will be given 1-? values to update (probably few 10k)

@emrojo emrojo assigned emrojo and unassigned emrojo Jun 4, 2020
@rl15
Copy link
Author

rl15 commented Jun 8, 2020

Meeting with Rob A (PAM) on 8th June to agree
What information they will send
What changes in anything to the export
What format

@rl15 rl15 changed the title GPL-473 Upload the annotated file returned from PHE GPL-473 Provide endpoint for PAM to update LH sample with yes or no ‘Value in Sequencing Jun 8, 2020
@harrietc52
Copy link
Contributor

harrietc52 commented Jun 9, 2020

Tasks:

  • Define the api message in github with success (200) and error situations descriptions with examples
    value_in_sequencing allowed values = ['Yes', 'No', 'Unknown']
    declared_at allowed format = r"%Y-%m-%dT%H:%M:%S"

  • Allow creation of the entries in samples_declarations table.
    POST request accepts the following data, a list of samples
    [{"root_sample_id": "0006","value_in_sequencing": "Yes", "declared_at": "2013-04-04T10:29:13"}, {"root_sample_id": "0007","value_in_sequencing": "Unknown", "declared_at": "2013-04-04T10:29:13"}]

  • validate sample exists in samples table

  • validate value_in_sequencing is a either ["Yes", "No", "Unknown"]

  • validate declared_at is a date

  • Test performance of sending/receiving 100000 samples with the new api blueprint. Check size of data because the request is going to be 100.000 samples (5MB)

  • Create new table with name sample_description, to store { valid_in_sequencing, sample_id, updated_at }

  • Add a new validation to check that the samples description has unique samples ids inside the bulk_insert

  • Add to the Eve middleware a check for validation API key same as in Aker
    generate and send secret key

  • build join query:
    join samples table and sample_description table
    where sample id is the sample

  • modify jobs report to use this query, to generate the excel file
    (blueprints/reports.py get_reports_details)

  • When creating the plates in SS, (blueprints/plates.py)
    send the setting for value_in_sequencing to be stored in sample_metadata table.
    (CHECK: required in SS?)
    In SS, add value_in_sequencing field to sample_metadata table,
    add to SS api to insert data into table
    add to msg to MLWH
    add value_in_sequencing field to mlwh table

Nice to have, but not important
Check in 4 if is possible to merge from 2 different tables to answer the request

  • Update the samples api in lighthouse to include this value_in_sequencing

@pjvv
Copy link
Contributor

pjvv commented Jun 9, 2020

This issue belongs in the lighthouse service.

@emrojo emrojo transferred this issue from sanger/crawler Jun 10, 2020
@emrojo
Copy link
Contributor

emrojo commented Jun 10, 2020

Update on acceptance criteria (from mail chain 9/Jun/2020)

  • LIMS service receives a request to http/s api endpoint from a server inside the Sanger intranet
  • The request has a secret key as header
  • Service needs to support:
    • At initial stages, uploading the update of 50.000 - 100.000 samples once/twice every week
    • In future, receiving a delta (diff of samples) with 100-1000 once/twice a week
  • All samples will be assigned a time and date (value_in_sequencing_updated_at) that will correspond with the time and date when we received this request in LIMS.
  • The message of the request will contain a list of samples information with:
    • Root Sample ID
    • Value in sequencing (Y/N)
    • Date when value in sequencing is declared
  • If we receive the same sample id again, we will update its Y/N value in value_in_sequencing and the date in value_in_sequencing_updated_at.
  • In case of success, the answer will be the same list of samples with the updated values. It will return a HTTP success code (200).
  • In case of error it will return a HTTP error (5xx, 4xx) depending in the condition.
  • The error conditions in message:
    • Root sample id not present (5xx)
    • Incorrect message format (4xx)
  • In case of error, apply the message partially with all right samples and receive back a list of successful and error updates
  • The automatic report of positive samples on site will be generated with current settings of value_in_sequencing and value_in_sequencing_updated_at

@harrietc52
Copy link
Contributor

harrietc52 commented Jun 10, 2020

@harrietc52
Copy link
Contributor

# curl -X POST http://127.0.0.1:5000/samples_declarations -d '[{"root_sample_id": "0012", "declared_at": "2013-04-04T10:29:13"}, {"root_sample_id": "0012", "declared_at": "2013-04-04T10:29:13"}, {"root_sample_id": "0010", "declared_at": "2013-04-04T10:29:13"}, {"root_sample_id": "0011", "declared_at": "2013-04-04T10:29:13"}, {"root_sample_id": "0011", "declared_at": "2013-04-04T10:29:13"}]' -H 'Content-Type: application/json'

@harrietc52
Copy link
Contributor

Performance
10000 samples = 14s
20000 samples = 30s
30000 samples = 69s
40000 samples = 102.94s

@emrojo emrojo linked a pull request Jun 18, 2020 that will close this issue
@rl15
Copy link
Author

rl15 commented Jun 26, 2020

Requested update from PAM Friday, 26 June 2020 at 09:44

@rl15
Copy link
Author

rl15 commented Jun 26, 2020

Christoph P writes Friday, 26 June 2020 at 12:53

Hello again,
have done some testing now. Can confirm that the service is working well, many thanks! Error messages are also very clear, which is helpful. Couple of things to make you aware of - none of these is critical, but you might want to look into them:
when providing a JSON with errors, I do not get the full result (with fields "_updated", "_created" etc) for the records that are okay, so unclear whether they were registered or not (I think we agreed to accept the correct records and just fail the incorrect ones, see further down in the thread). Would be good to get clarity of what is happening.
when providing a JSON with a missing sample ID, I do not get a JSON with error message, but an html with "internal server error"
when re-uploading sample IDs, I would expect the "_updated" and "_created" to differ in the response, but they are always the same and show the most recent submission
Cheers,
Christoph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants