Using LinkML to drive data capture #3313
Replies: 3 comments 2 replies
-
|
Hi @paulmillar, great question — this is definitely an area a few people in the community have been thinking about, and your approach sounds very reasonable to me. There isn't (as far as I know) a single turnkey "LinkML → interactive data entry web app" solution yet, but there are a few paths people have taken or are exploring: JSON Schema + form libraries: I could imagine using DataHarmonizer: DataHarmonizer provides a spreadsheet-like data entry interface and has been adapted to work with LinkML schemas directly. It's oriented towards "flat" tabular data, so it may or may not fit your use case depending on how nested/hierarchical your data is. Tool Implementer Guide: There's a Tool Implementer Guide in the LinkML docs that goes into detail about how to build generic schema-driven applications — including guidance on using LinkML metamodel elements ( Working with non-Python stacks: One thing the guide also covers is that if you're building a JavaScript/TypeScript frontend, you can use the LinkML generator ( Your idea of using the LinkML meta-model to also edit schemas themselves is interesting — since LinkML is described in LinkML, in principle the same form-generation approach could work for both data and schema editing. I don't think anyone has built that yet, but it's a neat idea. There's also been an older discussion along similar lines #191. Maybe |
Beta Was this translation helpful? Give feedback.
-
|
I really love this idea; many projects I work on have the same needs. |
Beta Was this translation helpful? Give feedback.
-
|
I'm glad I found this discussion, thanks for starting it @paulmillar :) The problem-space you describe is exactly what we (colleagues and I) have been keeping ourselves busy with for the past two years or so, with LinkML schemas being a core component of our approach. We build open source tools to support data management for large, multi-site, research consortia, often working with large and idiosyncratic datasets that nevertheless have to be described, queried, accessed, and shared. Digital independence is also an important factor in our domain, so we build or incorporate self-hostable tools by design. What we have put together to address these (meta)data management challenges is a system of integrated tools:
I have a back-pocket illustration handy :) So, basically, with the combination of these tools it means you can model your data and then get a fit-for-purpose data entry app (with data validation on entry) as well as a storage backend and server (again with schema-driven validation) for free. Well, you have to host it of course :) We've already got this running in production for multiple use cases, one being a described here. Some examples of deployments and resulting outputs:
All of the above is constantly being maintained and improved, as we learn more from users and from our own errors. I'm curious how this approach could overlap with others' needs in the LinkML community. (P.S. many thanks to the LinkML creators and maintainers!) |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
In a specific project, we're exploring the idea of having JSON files stored in a (private) GitLab repository and allowing domain-experts to maintain those JSON files. The git repo also contains the LinkML schemata for those files, with validation taking place via a GitLab CI/CD pipeline.
To make it easier for the domain experts, we would like to "hide" the JSON files by (for example) providing a web page where the user can see the current data and have the possibility to modify the data. Modifying the data would (in turn) create a corresponding branch, commit the changes in that branch and create a corresponding merge request. Handling concurrent merge requests (that target the same file) could be challenging, but (for us) this is unlikely.
I could image using the LinkML schema (either directly or by converting to JSON-Schema) to build an interactive webpage that presents the data in an intuitive way; for example, using JSON Schema annotations (like
titleanddescription) to provide semantic meaning. More generally, the schema would guide the editing process; for example, any unused fields could be offered to the user, required and recommended fields are indicated, etc...The overall goal is that the user doesn't hand-edit JSON and the user shouldn't need to know the schema.
I've found some examples of this kind of idea in other context; e.g., the adamant project provides a way to build an interactive webpage for data capture, based on a JSON-Schema. However, I haven't yet found an example of this for LinkML described data.
I think using LinkML directly (rather than first converting to JSON-Schema) would bring certain benefits. For example, if the data capture/editing page were to be based on the data's LinkML schema then the LinkML schema (for the data) could also be edited using the same framework. This is because of the LinkML meta-model: where LinkML is described using LinkML.
However, this discussion is really to gauge whether this approach makes sense, and whether anyone is already doing something similar.
Beta Was this translation helpful? Give feedback.
All reactions