Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Y24-061: Research the best way to pass Pool XP tube export messages from Limber to RabbitMQ #4085

Open
sdjmchattie opened this issue Apr 18, 2024 · 3 comments · May be fixed by #4147
Open
Assignees
Labels
Bioscan RabbitMQ Decoupled interface between Limber/SS & Traction for Bioscan

Comments

@sdjmchattie
Copy link
Contributor

Description
An endpoint needs to be available in Sequencescape for Limber to be able to call so that a Pool XP tube will generate an appropriate message in Avro format and publish it to the RabbitMQ exchange which will be consumed by the TOL Lab Share tool. See the EPIC for more information. What we want to know is how we are going to create this endpoint in Sequencescape while limiting the changes we make to the system and keeping this away from existing APIs which are already harder to maintain than we'd like.

Who the primary contacts are for this work
Stuart McHattie

Knowledge or Stake holders
Andrew Sparkes
Stephen Inglis
Ben Topping
Thomas Whiteley

Additional context or information
This is part of the EPIC: #3754

@sdjmchattie sdjmchattie added the Bioscan RabbitMQ Decoupled interface between Limber/SS & Traction for Bioscan label Apr 18, 2024
@sdjmchattie sdjmchattie self-assigned this Apr 18, 2024
@sdjmchattie
Copy link
Contributor Author

We (PSD) had a call today about this. There are still a lot of unknowns, but the general gist is that we will either add to the v2 endpoint or create something separate. The message to this API will be super simple only indicating the tube to export. The endpoint will cascade into creating an AVRO schema compliant message for RabbitMQ which will either be a new system (since this is going to get reused) or will be development of Warren. Warren is a source of technical debt now that the author is not working with us, so we might look for a way to replace it if it is not easily extensible. None of this will be done before we have the message AVRO schema format agreed.

@sdjmchattie
Copy link
Contributor Author

sdjmchattie commented Apr 26, 2024

Generating Avro encoded messages in Ruby requires a new library to be used. The official library is badly documented, much like all the other language variants I've seen from Apache. However I've pieced together the correct combination of objects to encode a message to put on RabbitMQ. It's worth noting that the official library is only capable of writing binary encoded messages, rather than JSON format messages which ToL-lab-share is capable of decoding. I did look at other libraries as well which extend this one, but they don't provide any additional functionality that we want and often add complications like only reading schemas from files on disk. Obviously when we create the Schema object, we will use the JSON string obtained from RedPanda and not from a file.

#!/bin/env ruby

require 'avro'

schema_str = File.read('schema.avsc')
message_obj = JSON.parse(File.read('test_message.json'))

schema = Avro::Schema.parse(schema_str)
stream = StringIO.new
writer = Avro::IO::DatumWriter.new(schema)
encoder = Avro::IO::BinaryEncoder.new(stream)
encoder.write("\xC3\x01") # Avro single-object container file header
encoder.write([schema.crc_64_avro_fingerprint].pack('Q')) # 8 byte schema fingerprint
writer.write(message_obj, encoder)
message = stream.string

File.write('test_message.avro', message)

@sdjmchattie
Copy link
Contributor Author

The research in how to encode these messages correctly using Ruby and Python is in the GitLab Data Patch Archive repository: https://gitlab.internal.sanger.ac.uk/psd/data-patch-archive/-/tree/main/avro-tests/encoding-decoding?ref_type=heads

It's worth noting that the Ruby library for Avro does not create a parsing form for schemas that is consistent with fastavro which I have confirmed is the correct form. It misses off the namespace on custom named types, and so the schema.crc_64_avro_fingerprint is useless outside of Ruby implementations. We will need to extend the functionality of the Ruby Avro library when generating these files so that we can ensure the correct parsing form is generated for hashing.

We will need to adapt lab-share-lib to consume these messages. We can provide support for both binary forms, either by checking for the presence of the two-byte marker and choosing how to decode the binary, or more likely by trying to parse the binary with the regular binary reader and catching a failure to decode, then switching to the single-object container format shown in the above code.

@sdjmchattie sdjmchattie linked a pull request Jun 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bioscan RabbitMQ Decoupled interface between Limber/SS & Traction for Bioscan
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant