-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a Blaze-PostgreSQL adapter #1543
Comments
Hi David, it is perfectly possible to implement a Postgres resource store adapter. Blaze is designed to support more than one implementation. For example, the distributed storage variant uses Cassandra as resource store. However I would not use the resource store for queries without the index. Let me explain that. In the resource store, blaze stores versions of resources by their content hash. That means that a resource with an update appears twice in the resource store and there is no way to tell which version is the current one. That information is only available in the index. For data quality queries, I would suggest to use CQL directly. I have some examples in the CQL Documentation. More in depth data quality queries should be possible with CQL. Another way, I could imagine is to implement SQL on FHIR in Blaze. So at the end, I don't see a performance problem using Postgres as resource store, but the way Blaze stores resources in the resource store is not applicable for queries. |
Hi Alex, I didn't know about SQL on FHIR. That sounds definitely useful! However, would that mean that we would need to duplicate the Blaze contents in order to be able to analyze them using standard SQL tools? If so wouldn't it be possible to "just" store some version-counter or timestamp together with the hash and FHIR resource upon insert when using Postgres as Resource Store? |
SQL on FHIR should be implemented on top of the Blaze storage architecture, the same as I did with CQL. So no duplicate storage. Why isn't CQL an option for the data quality queries? |
Very nice. |
Yes ok, got the point with SQL. But implementing SQL on FHIR will not go fast, even if you like to do it. So can I help you with solving your data quality queries with CQL? |
Yes, I can imagine that implementing SQL on FHIR will not be accomplished overnight. Thank you for offering to help with the CQL queries. I have built a workaround solution for the time being. That was mainly motivated by other limitations that have nothing to do with Blaze. My main goal is to make the raw data more accessible to our users (bridgehead responsibles) so that they can more easily perform exploratory analyses themselves. |
I would try to tech people CQL. We already did one such event at the TMF last autumn and we plan a new event for this year. So I would be interested in examples of data quality queries we can use for the lectures. In my opinion CQL has advantages over SQL for FHIR. It's also available in HAPI and Firefly Server. So I would suggest not to ignore it. |
I'd be interested in such a CQL event. Especially if it was possible to participate remotely since I currently don't have much time for travel. I will try to write down the most important questions and examples and send them to you. |
As stated in the documentation (https://github.com/samply/blaze/blob/master/docs/architecture.md), it should be possible to use PostgreSQL as the Resource Store for Blaze.
I think having that option would be very beneficial to us if it would enable users to perform SQL queries on the resources in order to more easily assess data quality in their Blaze instance.
On the other hand, using PostgreSQL as the Resource Store might come with decreased CQL-query performance. I imagine the performance hit wouldn't be too large for us since the datasets we (CCP) use are typically not very large (< 1GB) and are not expected to grow very rapidly.
Could we implement a Blaze-PostgreSQL adapter and compare the CQL-query performances on different datasets between two instances of Blaze? One instance would use RocksDB as the Resource Store as a gold standard, and the other would use PostgreSQL as the Resource Store.
The text was updated successfully, but these errors were encountered: