Utilizing Patterns #19

MapZombie · 2017-01-17T22:55:38Z

What language is used when utilizing patterns? For example I want to harvest a folder but I don't want it to harvest from a specific subfolder called "Archive" what pattern would be required?

This would be useful info for a future Wiki.

pandzel-zz · 2017-01-18T16:56:12Z

The pattern language used here is called a "glob" syntax. More about it can be found here. Basically, it is a pattern very similar to the one which is used to filter files using 'dir' or 'ls' command on various operating systems. I doubt you can come up with any pattern which would skip a particular folder. Keep in mind that 'pattern' feature on WAF broker has been designed to filter individual files by it's extension rather than by folder it belongs to (by default the pattern is '**.xml' which only allows broker to grab xml files).

It doesn't mean it is not possible, but it requires a little bit of more explanation. All it harvester does is executes or schedules for execution something called: a task. Task represents a workflow of a data from the time it is acquired from to the input broker to the moment it is published to the output broker(s). In general it is as simply as that: take something from input and publish into the output. However, there is already existing concept of filters and transformers. Each of this entity is a little piece of code used to filter incoming data by some sort of predicate, while transformer can transform data from one form to another.

Both filters and transformers can be chained together to perform more complicated operations, moreover, they can be executed in some sort of parallel manner. For example, one could create task which would select only PDF files and publish to the local folder, at the same time it would select xml files then regardless what kind of metadata it is (FGDC, ISO, Dublin Core) it would normalize it to ISO only and publish to the instance of Geoportal Catalog 2.0, yet another thread would select only CSV files, publish it to the instance of 'koop' to create Feature Service based on the data from CSV, then register URL of each feature service to Geoportal Catalog 2.0 AND to the ArcGIS Online.

Such framework does already exist; what is missing is a rich collection of filters and transformers and a sophisticated UI alowed to build such tasks. At this moment only REGEX filter is available and one transformer which uses XSLT to transform one metadata format into another.

It's worth no mention that API harvester is being kept intentionally simple so such entities like filters, transformers, brokers can be developed as needed easily.

I hope my explanation shines some light on what is possibly coming in the future releases of harvester.

mhogeweg assigned pandzel-zz Jan 18, 2017

pandzel-zz closed this as completed Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilizing Patterns #19

Utilizing Patterns #19

MapZombie commented Jan 17, 2017

pandzel-zz commented Jan 18, 2017 •

edited

Loading

Utilizing Patterns #19

Utilizing Patterns #19

Comments

MapZombie commented Jan 17, 2017

pandzel-zz commented Jan 18, 2017 • edited Loading

pandzel-zz commented Jan 18, 2017 •

edited

Loading