Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL-544 (materialized) view for the Samples Extraction for reporting #117

Closed
rl15 opened this issue Jun 18, 2020 · 20 comments · Fixed by #119
Closed

GPL-544 (materialized) view for the Samples Extraction for reporting #117

rl15 opened this issue Jun 18, 2020 · 20 comments · Fixed by #119

Comments

@rl15
Copy link

rl15 commented Jun 18, 2020

User story
GPL-544 | As SRA for CGaP (Laura) I want a (materialized) view for the Samples Extraction database, so we can check and report on the data

Who are the primary contacts for this story
Laura L
Neil

Draft Acceptance criteria
Column headers something like;

Input barcode,
Output barcode,
Activity type,
Instrument,
Kit barcode,
Kit type,
Date,
User.
[Samples]

This would create a row per plate/rack extracted, alternative would be to do it on a per sample basis, in which case the first column would be COG-UK barcode/Supplier sample name

Dependencies
This story is blocked by the following dependencies:

  • #<issue_no.>
  • sanger/#<issue_no.>

Additional context
L

@JamesGlover
Copy link

JamesGlover commented Jul 3, 2020

Refining with a select statement before building it as a view.

SELECT
  sf.object AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  "TODO" AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS ia ON ia.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = ia.id AND sf.predicate = "supplier_sample_name";

@JamesGlover
Copy link

Users are associated with steps, not activities. This means that it is possible to have multiple users per activity. (In practice I'm seeing 1 for the majority of activities, with 23 activities performed by two users, and one by three. A few options:

  1. One row per users
  2. Join the users together
  3. Pick only the user associated with the first/last step

@JamesGlover
Copy link

Data looking a little patchy at the moment, it may make sense to filter out some of the rows based on missing data.
ie. Only show completed activities, and ignore those without assets

@JamesGlover
Copy link

Ahh, I see, for plates, samples are linked to the wells, which in turn are linked to the plates.

@JamesGlover
Copy link

Bit more fleshed out now (stopped editing the one above). This handles samples on plates.
Checking that the empty rows are empty wells.

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  "TODO" AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_",
  ia.id AS "_input_asset_id_",
  wf.id AS "_wf_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS ia ON ia.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = ia.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = ia.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
ORDER BY ac.id DESC;

@JamesGlover
Copy link

The input barcode on this appears to be an output barcode (at least for some actions), but its possible I'm just not understanding the process, and attempting to play around with the UAT app isn't helping.

Things I need:

  1. Consistent way of finding input and output barcodes across all processes

  2. Ensuring that the right input output barcode are linked per well (assuming processes with multiple input / output plates)

  3. Some asset group linked to steps, however for the CGAP Heron Extraction 500ul - 24ul I was looking at:
    a) The asset group on the activity was the LHR Stock plate
    b) The Heron Lysed Plate is linked to a Step, but that step doesn't belong to an activity
    c) There is a transferedFrom fact, which looks usable, but not sure if this applies to ALL activities

  4. User issue should be easy enough once question resolve. Assuming step is correct place.

@JamesGlover
Copy link

Current select statement

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  "TODO" AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_",
  aga.asset_group_id AS "_asset_group_id_",
  ia.id AS "_input_asset_id_",
  wf.id AS "_wf_id_",
  wf.object_asset_id AS "_well_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS ia ON ia.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = ia.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = ia.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
WHERE sf.id IS NOT NULL OR wsf.id IS NOT NULL
ORDER BY ac.id DESC;

@JamesGlover
Copy link

Switched the input asset to output asset, as that's what its looking like for the examples I'm seeing.

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  "TODO" AS "Input barcode",
  oa.barcode AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_",
  aga.asset_group_id AS "_asset_group_id_",
  oa.id AS "_output_asset_id_",
  wf.id AS "_wf_id_",
  wf.object_asset_id AS "_well_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS oa ON oa.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = oa.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = oa.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
WHERE sf.id IS NOT NULL OR wsf.id IS NOT NULL
ORDER BY ac.id DESC;

@JamesGlover
Copy link

Spoke to Eduardo:

  1. Finished activities will not, under normal circumstances, be updated again, so this may be a good trigger point.
  2. The asset group associated with an activity is updated as the activity progresses. So for a finished activity it will represent the output.
  3. Input assets are a little tricky, and my be difficult to track down reliable for a view. transferedFrom is the thing to look at, potentially using facts or operations. However the number of transfers can be variable depending on the activity.

@JamesGlover
Copy link

Sent query:
I’m currently working on getting the views in place for the sample extraction reporting:
#117

I’ve got a few questions:

  1. You may have multiple users associated with a single activity, as it is possible for different users to perform each step. This isn’t a common occurrence, 23 activities in the past have two users associated with them, and 1 has three. There are a few options:
    a) Join all users associated with an activity together (eg. “jg16,ll9”)
    b) Select only the user associated with the first/last step
    c) Display a row per user.

  2. Finding the input barcode is surprisingly tricky, due to the range of different activities that need to be supported. As things stand, I’m not even sure it is possible without modifications to sample extraction itself.
    a) Would a v1 without this information be useful?
    b) Does the report need to support the full range of activities? For example things become substantially easier if we are limiting the report to activities where one source, is transferred directly onto one destination.

I’ll keep poking at point 2 for now, just in case a moment of inspiration strikes.

James

@JamesGlover
Copy link

Response:

For the user can we go with last step or finished the activity, whichever is easier?
The input barcode is key to being able to link up to the data in CGAP LIMS, so we would definitely need this. So I think we should restrict this report to Heron specific activities, which are:
CGAP Heron Extraction 500ul - 16ul
CGAP Heron Extraction 500ul - 24ul
Illumina Extraction
DNAP Heron Extraction 500ul - 24ul
CGAP Heron Extraction 200ul - 24ul
Heron R&D Extraction

Or if you can do it by any activity that contains step type 116 “Transfer Heron Lysed Plate to LHR Stock Plate”, that covers all of these activity types.

Follow up:

Thank you for getting back to me. That should make things easier. Do you happen to know if all those process work on a strictly input plate -> output plate basis, or if they contain intermediate plates?

Response:

Annoyingly they are pretty much always going to be input plate -> output plate, apart from one or two exceptions. Sorry I know that doesn’t really help. I think there is one so far that has had the intermediate plate. I can’t see there being many more but there is a very small possibility.

@JamesGlover
Copy link

So we can limit the possibility space a bit, but still have to deal with rare cases where we may have to find grandparents, not parents.

@JamesGlover
Copy link

Looks like this these limitations we can assume stamps, although if we go that rout I'd be happier if we immediately get a WH replacement prioritised. Still tempted to go that route from the off.

@JamesGlover
Copy link

Existing data appears to all conform to having a single output asset, checking to see if this is a constraint of the actual application.

@JamesGlover
Copy link

Hmm, looks like two plates Can be scanned into the process, but both the target plates appear to be linked to both source plates. This is probably not a scenario we need to worry ourselves with today...

@JamesGlover
Copy link

So, this query works for anything that hasn't had intermediate plates. Should be possible to rejig, but want a starting point to help highlight any errors. Bit concerned about writing tests for this, as barely understand the process.

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  oa.barcode AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_",
  aga.asset_group_id AS "_asset_group_id_",
  oa.id AS "_output_asset_id_",
  wf.id AS "_wf_id_",
  tff.id AS "_tff_id_",
  wf.object_asset_id AS "_well_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS oa ON oa.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = oa.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = oa.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS tff ON tff.asset_id = oa.id AND tff.predicate = "transferredFrom"
	LEFT OUTER JOIN assets AS ia ON tff.object_asset_id = ia.id
WHERE 
	ac.state = 'finish' 
	AND (sf.id IS NOT NULL OR wsf.id IS NOT NULL)
	AND ac_ty.name IN ('CGAP Heron Extraction 500ul - 16ul',
'CGAP Heron Extraction 500ul - 24ul',
'Illumina Extraction',
'DNAP Heron Extraction 500ul - 24ul',
'CGAP Heron Extraction 200ul - 24ul',
'Heron R&D Extraction');

@JamesGlover
Copy link

JamesGlover commented Jul 7, 2020

Well got it working, but bit worried about how brittle it'll be: (Note: This probably isn't correct actually, its selecting the last transfer, not the first, but flipping that logic has no effect. Need to poke out exception a little more)
Edit again: Okay, I was an idiot. Sorting by created at ascending, then id desc. Of course flipping the id has no effect, it only handles rare edge cases.

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  ia2.barcode AS "Input barcode 2",
  oa.barcode AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  "TODO" AS "User",
  ac.id AS "_activity_id_",
  aga.asset_group_id AS "_asset_group_id_",
  oa.id AS "_output_asset_id_",
  wf.id AS "_wf_id_",
  tff.id AS "_tff_id_",
  wf.object_asset_id AS "_well_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS oa ON oa.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = oa.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = oa.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS tff ON tff.asset_id = oa.id AND tff.predicate = "transferredFrom"
	LEFT OUTER JOIN assets AS ia ON tff.object_asset_id = ia.id
	LEFT OUTER JOIN assets AS ia2 ON ia2.id = (
	  SELECT object_asset_id FROM operations
	  INNER JOIN steps ON steps.id = operations.step_id
	  WHERE operations.predicate = 'transferredFrom' AND steps.activity_id = ac.id AND operations.object_asset_id IS NOT NULL
	  ORDER BY operations.created_at, operations.id DESC
	  LIMIT 1
	)
WHERE 
	ac.state = 'finish' 
	AND (sf.id IS NOT NULL OR wsf.id IS NOT NULL)
	AND ac_ty.name IN ('CGAP Heron Extraction 500ul - 16ul',
'CGAP Heron Extraction 500ul - 24ul',
'Illumina Extraction',
'DNAP Heron Extraction 500ul - 24ul',
'CGAP Heron Extraction 200ul - 24ul',
'Heron R&D Extraction');

@JamesGlover
Copy link

Now with users as well. Just a bit of tidy-up:

SELECT
  IFNULL(sf.object, wsf.object) AS "Supplier sample name",
  ia.barcode AS "Input barcode",
  ia2.barcode AS "Input barcode",
  oa.barcode AS "Output barcode",
  ac_ty.name AS "Activity type",
  ins.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kt.name AS "Kit type",
  ac.completed_at AS "Date",
  users.fullname AS "User",
  ac.id AS "_activity_id_",
  aga.asset_group_id AS "_asset_group_id_",
  oa.id AS "_output_asset_id_",
  wf.id AS "_wf_id_",
  tff.id AS "_tff_id_",
  wf.object_asset_id AS "_well_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS oa ON oa.id = aga.asset_id
	LEFT OUTER JOIN activity_types AS ac_ty ON ac.activity_type_id = ac_ty.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types AS kt ON kits.kit_type_id = kt.id
	LEFT OUTER JOIN instruments AS ins ON ac.instrument_id = ins.id
	LEFT OUTER JOIN facts AS sf ON sf.asset_id = oa.id AND sf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS wf ON wf.asset_id = oa.id AND wf.predicate = "contains"
	LEFT OUTER JOIN facts AS wsf ON wsf.asset_id = wf.object_asset_id AND wsf.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS tff ON tff.asset_id = oa.id AND tff.predicate = "transferredFrom"
	LEFT OUTER JOIN assets AS ia ON tff.object_asset_id = ia.id
	LEFT OUTER JOIN assets AS ia2 ON ia2.id = (
	  SELECT object_asset_id FROM operations
	  INNER JOIN steps ON steps.id = operations.step_id
	  WHERE operations.predicate = 'transferredFrom' AND steps.activity_id = ac.id AND operations.object_asset_id IS NOT NULL
	  ORDER BY operations.created_at ASC, operations.id ASC
	  LIMIT 1
	)
	LEFT OUTER JOIN users ON users.id = (
	  SELECT user_id FROM steps
	  WHERE steps.activity_id = ac.id AND steps.user_id IS NOT NULL
	  ORDER BY steps.created_at, steps.id DESC
	  LIMIT 1
	)
WHERE 
	ac.state = 'finish' 
	AND (sf.id IS NOT NULL OR wsf.id IS NOT NULL)
	AND ac_ty.name IN ('CGAP Heron Extraction 500ul - 16ul',
'CGAP Heron Extraction 500ul - 24ul',
'Illumina Extraction',
'DNAP Heron Extraction 500ul - 24ul',
'CGAP Heron Extraction 200ul - 24ul',
'Heron R&D Extraction');

@JamesGlover
Copy link

Got rid of the abbreviated aliases, remove the misleading input asset barcode and stripped out a couple of the columns added for debugging

SELECT
  IFNULL(
    supplier_name_fact.object, 
    well_supplier_name_fact.object) AS "Supplier sample name",
  input_asset.barcode AS "Input barcode",
  output_asset.barcode AS "Output barcode",
  activity_types.name AS "Activity type",
  instruments.name AS "Instrument",
  kits.barcode AS "Kit barcode",
  kit_types.name AS "Kit type",
  ac.completed_at AS "Date",
  users.fullname AS "User",
  ac.id AS "_activity_id_"
FROM activities AS ac
	LEFT OUTER JOIN asset_groups_assets AS aga ON aga.asset_group_id = ac.asset_group_id
	LEFT OUTER JOIN assets AS output_asset ON output_asset.id = aga.asset_id
	LEFT OUTER JOIN activity_types ON ac.activity_type_id = activity_types.id
	LEFT OUTER JOIN kits ON ac.kit_id = kits.id
	LEFT OUTER JOIN kit_types ON kits.kit_type_id = kit_types.id
	LEFT OUTER JOIN instruments ON ac.instrument_id = instruments.id
	LEFT OUTER JOIN facts AS supplier_name_fact ON supplier_name_fact.asset_id = output_asset.id AND supplier_name_fact.predicate = "supplier_sample_name"
	LEFT OUTER JOIN facts AS well_fact ON well_fact.asset_id = output_asset.id AND well_fact.predicate = "contains"
	LEFT OUTER JOIN facts AS well_supplier_name_fact ON well_supplier_name_fact.asset_id = well_fact.object_asset_id AND well_supplier_name_fact.predicate = "supplier_sample_name"
	LEFT OUTER JOIN assets AS input_asset ON input_asset.id = (
	  SELECT object_asset_id FROM operations
	  INNER JOIN steps ON steps.id = operations.step_id
	  WHERE operations.predicate = 'transferredFrom' AND steps.activity_id = ac.id AND operations.object_asset_id IS NOT NULL
	  ORDER BY operations.created_at ASC, operations.id ASC
	  LIMIT 1
	)
	LEFT OUTER JOIN users ON users.id = (
	  SELECT user_id FROM steps
	  WHERE steps.activity_id = ac.id AND steps.user_id IS NOT NULL
	  ORDER BY steps.created_at, steps.id DESC
	  LIMIT 1
	)
WHERE 
	ac.state = 'finish' 
	AND (supplier_name_fact.id IS NOT NULL OR well_supplier_name_fact.id IS NOT NULL)
	AND activity_types.name IN ('CGAP Heron Extraction 500ul - 16ul',
'CGAP Heron Extraction 500ul - 24ul',
'Illumina Extraction',
'DNAP Heron Extraction 500ul - 24ul',
'CGAP Heron Extraction 200ul - 24ul',
'Heron R&D Extraction');

@JamesGlover
Copy link

Looks like my preliminary google to check exactly how a materialised view differed from a standard view missed a salient point.

MySQL doesn't support materialised views.

Proceeding with standard view. (Performance in testing hasn't seemed too bad.)

JamesGlover pushed a commit to JamesGlover/samples_extraction that referenced this issue Jul 9, 2020
Completes sanger#117

Adds a heron activiteis view
@rl15 rl15 added the Reporting label Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants