Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Overhaul Metadata Management In QGIS #91
QGIS Enhancement: Overhaul Metadata Management In QGIS
Author Tim Sutton (@timlinux)
Version QGIS 3.0
QGIS Metadata Strategy: https://gist.github.com/tomkralidis/33f781e361f6d855c2f4
And keep in mind existing metadata editors:
Our intent is to build on these previous works and ideas by building a number of components that provide a comprehensive metadata strategy for QGIS.
The big picture of what we plan to produce is here:
We propose to first implement these components (WP = Work Package):
We will make a separate QEP for the other work packages once these are the above ones are taken care of. More details on these Work packages can be found below.
Work package 1 - Schema Definition/Selection:
Input schema selection: In this phase we will identify input schemas to be used for validation. We propose initially to support Dublin Core and validate against the following CSW Record schemas:
Although to keep it simple, we start by supporting Dublin Core, we expect that the future evolution will be towards supporting ISO.
Internal Schema: In this phase we will specify a schema for internal representation of metadata within QGIS (‘the QGIS Metadata Schema’). This schema would be independent of any existing standards and would be the basic structure in which all incoming metadata would be stored. When we add support for additional formats in the future, the expectation would be that these formats are also transitioned to the QGIS internal format on import so that we can deal with a single common metadata structure internally.
Since the QGIS internal schema most likely won’t be a superset of all existing schemas, conversions between this and any other schemas may result in a loss information, which mean we won’t support metadata “round trips”. One proposed solution to loss of round tripping is to keep the original metadata document (if provided) and then interpolate new values into it if it is updated.
We will also identify which fields should be mandatory within the QGIS Metadata Schema. These should include mostly information which we can extract automatically from the dataset, without requiring any intervention from the user. Only in this way, we can guarantee the automatic generation of internal metadata for every dataset.
Other things to mention:
Status: A proposed schema has been written here qgis/qgis/#4330
Work package 2 - QGIS Metadata API:
In this work package we will build the basic C++ framework for parsing metadata from a schema - initially Dublin Core and QGIS Metadata Schema. This includes implementing an internal model for representing metadata, based on the metadata schema created on WP1.
Work package 3 - Implement QGIS Metadata Storage support
In this WP we will introduce an external physical format for storing metadata internally, the “metadata store”. The goal is to support portability, enabling users to share their metadata, even in offline scenarios. This WP will build directly on the outputs of WP1, which will define an "internal metadata schema" and WP2, “QGIS metadata API”, which will encode/decode from the internal schema to the supported schemas (right now, only Dublin core).
QGIS will support two types of metadata stores: stores and local. In this WP we will focus on local stores, only. In the diagram below we depict the inheritance model for metadata stores, where an abstract metadata store will have a polymorphic behavior, according to the particular data format. For instance in the case of a PostgreSQL DB, the method “save” will create a table on the database, whether in the case of a Shapefile, it would create an XML file.
Some formats, such as text files, can be more limited than others. For that reason, we will create a “prime” format, the “QGIS metadata store”, which can accompany more restrictive formats.The prime format will be an SQLite database, because of its lightweight, and because it is well-known within the QGIS community.
As the goal is to support all these different formats in the future, we will design an infrastructure to accommodate that, but in this first iteration we will focus on the simple use case of creating an xml file, and an SQLite data store. The metadata contents will be passed by the metadata API. In this WP we will implement format translation, but not schema translation.
We will implement a user interface to allow the user to configure serialization/deserialization behavior, e.g.: in which format we should write metadata, and where. In WP5, we will add metadata detection (which perhaps we can turn on and off in the project settings). For instance, if there is an xml file with the same name and path as a Shapefile, QGIS would attempt to automatically import metadata.
The QGIS metadata store will be synced with any changes that we apply to the metadata. In the moment that we export metadata into XML, it will write those changes to the XML file.
Metadata search will also be polymorphic, according to the data format. In this iteration we will implement some text search for SQLite, and will use that rather than searching in text files which tends to be slower.
Work package 4 - Implement QGIS metadata viewer:
Metadata is only useful if it is visible to the users of the dataset that the metadata is associated with. For this reason we should have provision for presenting the metadata in an eye-pleasing and informative manner and with minimal work required on behalf of the user. We also aim to implement this, earlier on in the project workflow, so that we can start outputting the data stored in QgsMetadata.
The ideas is to replace this:
With something like this (taken from GeoNode):
Work package 8 - Implement QGIS metadata editor for layers
In wizard mode:
In form mode:
(required if applicable)
(required if known at design time)
We have some funding to make these work packages happen (for around 80%) - if anyone is interested in co funding the shortfall, please let us know.
There is a discussion group at: https://gitter.im/qgis/metadata for those who wish to collaborate in making QGIS metadata better.
The following people have already joined the effort and will be doing implementation work, planning, offering advice etc.
This will be new code and will replace any existing metadata implementation work (including what is currently in layer properties dialog). We will try to make sure that server and other parts that rely on metadata do not break - we would welcome support and input from those working on QGIS Server.
Issue Tracking ID(s)
@timlinux I was just notified of this thread. Recently I have been thinking of MD management within QGIS as well. What follows is a brain dump I shared via e-mail. So I release these ideas and thoughts to the wild as is.
Problem statement: although there are great tools for working with geospatial data, techs still spend a huge amount of time searching, collecting, storing and managing data.
Solution: A toolkit to facilitate a common work flow or tool set for GIS techs. Thinking of a Git Flow model for GIS.
User story: As a qgis user
Audience: the solo data wrangler managing data and resources on the desktop. Not a large team environment but maybe a small team accessing the same data?
Develop a "catalogue for everyone". This is not an enterprise catalogue but a personal one.
It will use PYCSW as the back end. A simple qgis interface to create and manage MD. You can add local data to the catalogue or import one or more MD records from any CSW resource added to projects. Metasearch will be the search interface.
The Metadata editing interface will be created dynamically from the schema. Or perhaps a generic model with XSLT to transform into other schemas?
Records can be exported or pushed to other catalogues through CSW.
It will use OGR and GDAL tools to populate the known data .
ISO 19115 as the profile (HNAP?).
Manage both geo and non geo data. Tabular data and images.
Did I just describe metatools by nextGIS ( inactive for a year or more) or meta edit (dead since 2011)? They both seem to have parts. Next GIS was the company involved in metasearch at one point I think.
I also wonder if it should come with a template for directory structures to store and access local data. Similar to how GRASS works but not using that data format.
A tool that can manage downloaded data. For instance monitoring an FTP or HTTP directory. When data is updated it can either notify the user or download and update local dataset.
A tool to create data management plans.
A tool to create data quality reports and data dictionaries
just some random thoughts.
Hi @samperd. Thanks very much for your brain dump. You will be pleased to know that a lot of your ideas are already incorporated into our thinking / planning. Take a look at our google doc for more details. I only included the 4 work packages above in this QEP because I don't want to muddy the waters by tabling too many features and sub proposals all at the same time. If metadata is something you care about, I encourage you to join our little subgroup in the gitter channel mentioned above.
Sorry for the late response to this but I have only just been made aware of this document. Is this the best way to make comments or would you prefer I did that somewhere else?
One thing that immediately springs to mind is that the definition of a service is a bit fuzzy. Here you define it as a qgis project, but for INSPIRE etc service means a view or download service, eg a WMS/WFS server. I can see a scenario where a user might want to store WMS/WFS service metadata, and metadata about the layers those services provide, and metadata about the project. There's also the concept of how these things link together- eg WMS/WFS services have related child datasets, but then the relation between the datasets and some parent qgis project would also need to be considered.
Currently we have only two concepts:
It would be great to get your inputs about what design changes you would envisage. At the schema level this comments are probably best directed to the PR at qgis/qgis/#4330. At the higher level, it would be great to hear your ideas on practically when where and how the concepts you propose would be captured and displayed within the QGIS desktop application.
Bear in mind we have further work packages planned - this QEP covers only the first pass implementation addressing layer level metadata. Perhaps take a look at our scratch document to see what other things we plan for the short term.
Hi @mhugo, I think the objective here is slightly different, since AFAIU you want to store auxiliary data for a layer (row based) and we want to store the schema described in WP1 (layer based). We will use the same backend (SpatiaLite) for now, but the idea is to support a polymorphic storage. You can read more details in this blog post:
What thoughts are being made to add a 'Copyright' field
to the Layers Panel Description area?
This is often a requirement for the use of OpenData sources, such as:
When geo-referencing an image from such a source, I add a
which is then taken over when geo-referencing the image with gdal, being one of the tags supported by Geo-Tiffs:
For the RasterLite2 project, Alessandro Furieri (Sandro) and I discussed this matter and the final decision was to add a copyright and licence fields, together with the existing title and abstract fields for both
to the metadata tables.
The next spatialite version will create a data_licenses table and fill it with a list of common licenses for all new Databases.
The idea being, in this way, to avoid any legal hassle by offering a means to store and for views to display the information as needed or required.
So when a Geo-Tiff is being imported, when the following TIFFTAGs are found:
they will be taken over.
After reading this and the other concepts, I was surprised to see that this aspect was not included.
In the case of 'Dublin Core', it seems to even have been removed:
But for QGIS, I would say, it would be wise to also to avoid any legal hassle by adding a copyright field to to Layers Panel Description area and the metadata concept.
I'm +1, but would prefer "attribution" over "copyright". Many geospatial datasets are now under copyleft, so naming the field "copyright" just seems wrong to me.
Important is only that a User does not get into trouble by any infringement of copyright laws.
Sounds good for me together with some sort of licence tag.
Question to the use of QgsLayerMetadata
My assumption is that it a Provider would be a major source to gather the needed information an set QgsLayerMetadata.
QgsDataProvider does not contain (at present) a mMetadata; member as QgsMapLayer does.
So when setDataProvider runs in QgsVectorLayer and QgsRasterLayer, any Metadata gathered by a provider (in the form of QgsLayerMetadata) cannot be set.
So adding a metadata() and setMetadata(..) in QgsDataProvider would be needed to make QgsLayerMetadata truly usefull.