-
Notifications
You must be signed in to change notification settings - Fork 7
/
doc.go
119 lines (119 loc) · 3.96 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
// Package filter implements flexible ISIL attachments with expression trees[1],
// serialized as JSON. The top-level key is the label, that is to be given to a
// record. Here, this label is an ISIL. Each ISIL can specify a tree of filters.
// Intermediate nodes can be "or", "and" or "not" filters, leaf nodes contain
// filters, that are matched against records (like "collection", "source" or "issn").
//
// A filter needs to implement Apply. If the filter takes configuration
// options, it needs to implement UnmarshalJSON as well. Each filter can define
// arbitrary options, for example a HoldingsFilter can load KBART data from a single
// file or a list of urls.
//
// [1] https://en.wikipedia.org/wiki/Binary_expression_tree#Boolean_expressions
//
//
// The simplest filter is one, that says *yes* to all records:
//
// {"DE-X": {"any": {}}}
//
// On the command line:
//
// $ span-tag -c '{"DE-X": {"any": {}}}' < input.ldj > output.ldj
//
//
// Another slightly more complex example: Here, the ISIL "DE-14" is attached to a
// record, if the following conditions are met: There are two alternatives, each
// consisting of a conjuntion. The first says: IF "the record is from source id 55"
// AND IF "the record can be validated against one of the holding files given by
// their url", THEN "attach DE-14". The second says: IF "the record is from source
// id 49" AND "it validates against any one of the holding files given by their
// urls" AND "the record belongs to any one of the given collections", THEN
// "attach DE-14".
//
// {
// "DE-14": {
// "or": [
// {
// "and": [
// {
// "source": [
// "55"
// ]
// },
// {
// "holdings": {
// "urls": [
// "http://www.jstor.org/kbart/collections/asii",
// "http://www.jstor.org/kbart/collections/as"
// ]
// }
// }
// ]
// },
// {
// "and": [
// {
// "source": [
// "49"
// ]
// },
// {
// "holdings": {
// "urls": [
// "https://example.com/KBART_DE14",
// "https://example.com/KBART_FREEJOURNALS"
// ]
// }
// },
// {
// "collection": [
// "Turkish Family Physicans Association (CrossRef)",
// "Helminthological Society (CrossRef)",
// "International Association of Physical Chemists (IAPC) (CrossRef)",
// "The Society for Antibacterial and Antifungal Agents, Japan (CrossRef)",
// "Fundacao CECIERJ (CrossRef)"
// ]
// }
// ]
// }
// ]
// }
// }
//
// If is relatively easy to add a new filter. Imagine we want to build a filter that only allows records
// that have the word "awesome" in their title.
//
// We first define a new type:
//
// type AwesomeFilter struct{}
//
// We then implement the Apply method:
//
// func (f *AwesomeFilter) Apply(is finc.IntermediateSchema) bool {
// return strings.Contains(strings.ToLower(is.ArticleTitle), "awesome")
// }
//
// That is all. We need to register the filter, so we can use it in the configuration file.
// The "unmarshalFilter" (filter.go) method acts as a dispatcher:
//
// func unmarshalFilter(name string, raw json.RawMessage) (Filter, error) {
// switch name {
// // Add more filters here.
// case "any":
// return &AnyFilter{}, nil
// case "doi":
// ...
//
// // Register awesome filter. No configuration options, so no need to unmarshal.
// case "awesome":
// return &AwesomeFilter{}, nil
//
// ...
//
// We can then use the filter in the JSON configuration:
//
// {"DE-X": {"awesome": {}}}
//
//
// Further readings: http://theory.stanford.edu/~sergei/papers/sigmod10-index.pdf
package filter