Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing Patterns #19

Closed
MapZombie opened this issue Jan 17, 2017 · 1 comment
Closed

Utilizing Patterns #19

MapZombie opened this issue Jan 17, 2017 · 1 comment
Assignees

Comments

@MapZombie
Copy link

What language is used when utilizing patterns? For example I want to harvest a folder but I don't want it to harvest from a specific subfolder called "Archive" what pattern would be required?

This would be useful info for a future Wiki.

@pandzel-zz
Copy link

pandzel-zz commented Jan 18, 2017

The pattern language used here is called a "glob" syntax. More about it can be found here. Basically, it is a pattern very similar to the one which is used to filter files using 'dir' or 'ls' command on various operating systems. I doubt you can come up with any pattern which would skip a particular folder. Keep in mind that 'pattern' feature on WAF broker has been designed to filter individual files by it's extension rather than by folder it belongs to (by default the pattern is '**.xml' which only allows broker to grab xml files).

It doesn't mean it is not possible, but it requires a little bit of more explanation. All it harvester does is executes or schedules for execution something called: a task. Task represents a workflow of a data from the time it is acquired from to the input broker to the moment it is published to the output broker(s). In general it is as simply as that: take something from input and publish into the output. However, there is already existing concept of filters and transformers. Each of this entity is a little piece of code used to filter incoming data by some sort of predicate, while transformer can transform data from one form to another.

Both filters and transformers can be chained together to perform more complicated operations, moreover, they can be executed in some sort of parallel manner. For example, one could create task which would select only PDF files and publish to the local folder, at the same time it would select xml files then regardless what kind of metadata it is (FGDC, ISO, Dublin Core) it would normalize it to ISO only and publish to the instance of Geoportal Catalog 2.0, yet another thread would select only CSV files, publish it to the instance of 'koop' to create Feature Service based on the data from CSV, then register URL of each feature service to Geoportal Catalog 2.0 AND to the ArcGIS Online.

Such framework does already exist; what is missing is a rich collection of filters and transformers and a sophisticated UI alowed to build such tasks. At this moment only REGEX filter is available and one transformer which uses XSLT to transform one metadata format into another.

It's worth no mention that API harvester is being kept intentionally simple so such entities like filters, transformers, brokers can be developed as needed easily.

I hope my explanation shines some light on what is possibly coming in the future releases of harvester.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants