Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 155 lines (113 sloc) 6.417 kb
1f9771c @niko first commit
authored
1 = JSONgrep
2
67b092e @niko + random sampling
authored
3 JSONgrep is like a combination of grep, sed and awk, but for JSON. It expects one JSON string per line. It filters by string, parses the JSON, filters by condition, outputs arbitrary and even virtual fields and summarizes the values. The base of JSONgrep is the methods #random_sampling, #grep, #condition, #print and #summarize. These methods are made to work together like this:
1f9771c @niko first commit
authored
4
5 JsonGrepper.new('{"a":1, "b":3}').grep('"a":').condition('b > 2').print('b').summarize('b')
6
7 Also provided with the library and (for now) it's primary usecase is the command line tool:
8
9 echo '{"a":1, "b":3}' | jsongrep --grep '"a":' --condition 'b > 2' --print 'b' --summarize 'b'
10
11 The following examples can be executed by Q.E.D.[https://github.com/rubyworks/qed], just run @qed@ in the root directory. Before each example we reset the internal sum hash of JsonGrepper
12
13 Before :each do
14 JsonGrepper.reset_sums!
15 end
16
17 == #grep
18
19 #grep filters the whole line before even getting parsed. That's why filtering down the number of lines before parsing speeds up the processing by magnitudes. For really large input files consider using command line grep and piping the output into JSONgrep instead.
20
21 When processing {a: 1} and grepping for "a", the whole line is printed.
22
23 output_of {
24 JsonGrepper.new('{"a":1}').grep('"a":').condition(nil).print(nil).summarize(nil)
25 }.equals '{"a":1}'
26
27 When processing {a: 1} and grepping for "b", nothing is printed.
28
29 output_of {
30 JsonGrepper.new('{"a":1}').grep('"b"').condition(nil).print(nil).summarize(nil)
31 }.equals ''
32
33 == #condition
34
35 #condition is the most powerful directive. It's argument is evaluated in the context of the parsed JSON object. The fields of the JSON object are available as local methods. Under the hood the JSON object is an OpenStruct. So you can define new virtual fields by just assigning them. You can use those later in #print and #summarize. #conditions *must* evaluate to true to have the line even further processed. In that way using 'title == "walk this way"' as a condition has somewhat the same effect as using '"title":"walk this way"' as argument for #grep. Only that #grep is *way* faster and #condition is robust against variations in the format of the JSON string.
36
37 A condition that matches.
38
39 output_of {
40 JsonGrepper.new('{"a":3}').grep(nil).condition('(0..5).include? a').print(nil).summarize(nil)
41 }.equals '{"a":3}'
42
43 A condition that fails.
44
45 output_of {
46 JsonGrepper.new('{"a":3}').grep(nil).condition('(5..9).include? a').print(nil).summarize(nil)
47 }.equals ''
48
49 A condition assigning a virtual field.
50
51 output_of {
52 JsonGrepper.new('{"a":3, "b":5}').grep(nil).condition('self.c = a + b ; c > 6').print(nil).summarize(nil)
53 }.equals '{"a":3, "b":5}'
54
55 Assigning a virtual field is really powerful combined with print or summarize.
56
57 output_of {
58 JsonGrepper.new('{"a":3, "b":5}').grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
59 JsonGrepper.new('{"a":4, "b":4}').grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
60 JsonGrepper.new('{"a":3, "b":4}').grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
61 }.equals "c: 8
62 c: 8
63 c: 7"
64
65 output_of {
66 JsonGrepper.summarize
67 }.equals "Summary for a:
68 [#] [a]
69 2 3
70 1 4
71 3 total
72
73 Summary for c:
74 [#] [c]
75 1 7
76 2 8
77 3 total"
78
79 == #print
80
81 #print is the way to output the filtered lines, one by one. Pass the fields you want to output as arguments.
82
83 output_of {
84 JsonGrepper.new('{"a":1, "b":2, "c":3}').grep(nil).condition(nil).print('a','c').summarize(nil)
85 }.equals 'a: 1 c: 3'
86
87 == #summarize
88
89 #summarize counts the occurrences of the values of the given fields and outputs these the very end. The output is ordered by the values.
90
91 @songs = %q{
92 {"artist":"RJD2","title":"Smoke & Mirrors"}
93 {"artist":"Asio Kids","title":"loom"}
94 {"artist":"Asio Kids","title":"The Answer"}
95 {"artist":"Kinderzimmer Productions","title":"Lights! Camera! Action!"}
96 {"artist":"Pharoahe Monch","title":"The Hitman"}
97 {"artist":"Looptroop","title":"feel so good"}
98 {"artist":"Asio Kids","title":"Spacek"}
99 {"artist":"RJD2","title":"Good Times Roll Pt 2"}
100 {"artist":"Dendemann","title":"Das erste Mal"}
101 {"artist":"Pharoahe Monch","title":"Clap"}
102 {"artist":"Looptroop","title":"Focus w/ Freestyle"}
103 {"artist":"Kinderzimmer Productions","title":"Das Gegenteil von gut ist gut gemeint"}
104 {"artist":"RJD2","title":"The Horror"}
105 {"artist":"Asio Kids","title":"loom"}
106 {"artist":"Pharoahe Monch","title":"Assassins"}
107 {"artist":"Kinderzimmer Productions","title":"wo is mein kopf"}
108 {"artist":"Looptroop","title":"dont hate the player"}
109 {"artist":"Dendemann","title":"Endlich Nichtschwimmer"}
110 {"artist":"RJD2","title":"Good Times Roll Pt 2"}
111 {"artist":"Looptroop","title":"feel so good"}
112 {"artist":"Asio Kids","title":"Spacek"}
113 {"artist":"Kinderzimmer Productions","title":"Lights! Camera! Action!"}
114 {"artist":"Dendemann","title":"Saldo Mortale"}
115 {"artist":"Pharoahe Monch","title":"Clap"}
116 }
117
118 @songs.split("\n").each do |line|
119 JsonGrepper.new(line).grep('').condition('true').print(:none).summarize('artist')
120 end
121
122 output_of {
123 JsonGrepper.summarize
124 }.equals 'Summary for artist:
125 [#] [artist]
126 5 Asio Kids
127 3 Dendemann
128 4 Kinderzimmer Productions
129 4 Looptroop
130 4 Pharoahe Monch
131 4 RJD2
132 24 total'
133
67b092e @niko + random sampling
authored
134 == random_sampling
135
136 Is meant to speed up the processing of large data sets.
137
138 output_of {
139 JsonGrepper.new('{"a":3, "b":5}').random_sampling(1).grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
140 }.equals 'c: 8'
141
142 # if this one fails, you are really lucky:
143 output_of {
144 JsonGrepper.new('{"a":3, "b":5}').random_sampling(1_000_000_000).grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
145 }.equals ''
146
147
03b8699 @niko ! random sanpling default value
authored
148 Random sampling also works with an explicit nil.
149
150 g = JsonGrepper.new('{"a":3, "b":5}').random_sampling(nil).grep(nil).condition('self.c = a + b ; c > 6').print('c').summarize(['a','c'])
151 output_of { JsonGrepper.summarize(nil) }
152
153
67b092e @niko + random sampling
authored
154 JsonGrepper.summarize then also takes a random sampling argument and multiplies the respective occurrences with the sample rate.
Something went wrong with that request. Please try again.