-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilizing Patterns #19
Comments
The pattern language used here is called a "glob" syntax. More about it can be found here. Basically, it is a pattern very similar to the one which is used to filter files using 'dir' or 'ls' command on various operating systems. I doubt you can come up with any pattern which would skip a particular folder. Keep in mind that 'pattern' feature on WAF broker has been designed to filter individual files by it's extension rather than by folder it belongs to (by default the pattern is '**.xml' which only allows broker to grab xml files). It doesn't mean it is not possible, but it requires a little bit of more explanation. All it harvester does is executes or schedules for execution something called: a task. Task represents a workflow of a data from the time it is acquired from to the input broker to the moment it is published to the output broker(s). In general it is as simply as that: take something from input and publish into the output. However, there is already existing concept of filters and transformers. Each of this entity is a little piece of code used to filter incoming data by some sort of predicate, while transformer can transform data from one form to another. Both filters and transformers can be chained together to perform more complicated operations, moreover, they can be executed in some sort of parallel manner. For example, one could create task which would select only PDF files and publish to the local folder, at the same time it would select xml files then regardless what kind of metadata it is (FGDC, ISO, Dublin Core) it would normalize it to ISO only and publish to the instance of Geoportal Catalog 2.0, yet another thread would select only CSV files, publish it to the instance of 'koop' to create Feature Service based on the data from CSV, then register URL of each feature service to Geoportal Catalog 2.0 AND to the ArcGIS Online. Such framework does already exist; what is missing is a rich collection of filters and transformers and a sophisticated UI alowed to build such tasks. At this moment only REGEX filter is available and one transformer which uses XSLT to transform one metadata format into another. It's worth no mention that API harvester is being kept intentionally simple so such entities like filters, transformers, brokers can be developed as needed easily. I hope my explanation shines some light on what is possibly coming in the future releases of harvester. |
What language is used when utilizing patterns? For example I want to harvest a folder but I don't want it to harvest from a specific subfolder called "Archive" what pattern would be required?
This would be useful info for a future Wiki.
The text was updated successfully, but these errors were encountered: