Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 121 lines (88 sloc) 5.646 kB
9ccbc17 @afader Updated the readme file.
afader authored
1 # ReVerb
2
3 ReVerb is a program that automatically identifies and extracts binary relationships from English sentences. ReVerb is designed for Web-scale information extraction, where the target relations cannot be specified in advance and speed is important.
4
5 ReVerb takes raw text as input, and outputs (argument1, relation phrase, argument2) triples. For example, given the sentence "Oranges are high in vitamin C," ReVerb will extract the triple (oranges, are high in, vitamin c).
6
7 More information is available at the ReVerb homepage: <http://reverb.cs.washington.edu>
8
9 ## Quick Start
10 If you want to run ReVerb on a small amount of text without modifying its source code, we provide an executable jar file that can be run from the command line. Follow these steps to get started:
11
12 1. Download the latest ReVerb jar from <http://reverb.cs.washington.edu/reverb-latest.jar>
13
14 2. Run `java -Xmx512m -jar reverb.jar yourfile.txt`.
15
16 3. Run `java -Xmx512m -jar reverb.jar -h` for more options.
17
18 ## Building
19 Building ReVerb from source requires Apache Maven (<http://maven.apache.org>). Run this command to download the required dependencies, compile, and create a single executable jar file.
20
21 mvn clean compile assembly:single
22
23 ## Command Line Interface
24 Once you have built ReVerb, you can run it from the command line.
25
26 The command line interface to ReVerb takes plain text or HTML as input, and outputs a tab-separated table of output. Each row in the output represents a single extracted (argument1, relation phrase, argument2) triple, plus metadata. The output has the following columns:
27
28 1. The filename (or `stdin` if the source is standard input)
29 2. The sentence number this extraction came from.
30 3. Argument1 words, space separated
31 4. Relation phrase words, space separated
32 5. Argument2 words, space separated
33 6. The start index of argument1 in the sentence. For example, if the value is `i`, then the first word of argument1 is the `i-1`th word in the sentence.
34 7. The end index of argument1 in the sentence. For example, if the value is `j`, then the last word of argument1 is the `j`th word in the sentence.
35 8. The start index of relation phrase.
36 9. The end index of relation phrase.
37 10. The start index of argument2.
38 11. The end index of argument2.
39 12. The confidence that this extraction is correct. The higher the number, the more trustworthy this extraction is.
40 13. The words of the sentence this extraction came from, space-separated.
41 14. The part-of-speech tags for the sentence words, space-separated.
42 15. The chunk tags for the sentence words, space separated. These represent a shallow parse of the sentence.
43
44 For example:
45
46 $ echo "Olympia is the capital city of Washington." | ./bin/reverb -q -s | tr '\t' '\n' | cat -n
47 1 stdin
48 2 1
49 3 Olympia
50 4 is the capital city of
51 5 Washington
52 6 0
53 7 1
54 8 1
55 9 6
56 10 6
57 11 7
58 12 0.9999999999644988
59 13 Olympia is the capital city of Washington .
60 14 NNP VBZ DT NN NN IN NNP .
61 15 B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
62
63 For a list of options to the command line interface to ReVerb, run `/bin/reverb -h`.
64
65 ### Examples
66
67 #### Running ReVerb on small set of files
68 ./bin/reverb file1 file2 file3 ...
69
70 #### Running ReVerb on standard input
71 ./bin/reverb < input
72
73 #### Running ReVerb on HTML files
74 The `--strip-html` flag (short version: `-s`) removes tags from the input before running ReVerb.
75
76 ./bin/reverb --strip-html myfile.html
77
78 #### Running ReVerb on a list of files
79 You may have an entire directory structure that you want to run ReVerb on. ReVerb takes approximately 10 seconds to initialize, so it is not feasible to simply start a new process for each file. To pass ReVerb a list of paths, use the `-f` switch:
80
81 # Run ReVerb on all files under mydir/
82 find mydir/ -type f | ./bin/reverb -f
83
84 ## Java Interface
85 To include ReVerb as a library in your own project, please take a look at the example class `ReVerbExample` in the `src/main/java/edu/washington/cs/knowitall/examples` directory.
86
87 When running code that calls ReVerb, make sure to increase the Java Virtual Machine heap size by passing the argument `-Xmx512m` to java. ReVerb loads multiple models into memory, and will be significantly slower if the heap size is not large enough.
88
89 ## Using Eclipse
90 To modify the ReVerb source code in Eclipse, use Apache Maven to create the appropraite project files:
91
92 mvn eclipse:eclipse
93
94 Then, start Eclipse and navigate to File > Import. Then, under General, select "Existing Projects into Workspace". Then point Eclipse to the main ReVerb directory.
95
96 ## Retraining the Confidence Function
97
98 ## Help and Contact
99 For more information, please visit the ReVerb homepage at the University of Washington: <http://reverb.cs.washingotn.edu>.
100
101 ## Contributors
102 * Anthony Fader (afader at cs.washington.edu)
103 * Michael Schmitz (schmmd at cs.washington.edu)
104 * Robert Bart (rbart at cs.washington.edu)
105 * Janara Christensen (janara at cs.washington.edu)
106 * Niranjan Balasubramanian (getniranj at yahoo.com)
107 * Jonathan Berant (jonatha6 at post.tau.ac.il)
108
109 ## Citing ReVerb
110 If you use ReVerb in your academic work, please cite ReVerb with the following BibTeX citation:
111
112 @inproceedings{ReVerb2011,
113 author = {Anthony Fader and Stephen Soderland and Oren Etzioni},
114 title = {Identifying Relations for Open Information Extraction},
115 booktitle = {Proceedings of the Conference of Empirical Methods
116 in Natural Language Processing ({EMNLP} '11)},
117 year = {2011},
118 month = {July 27-31},
119 address = {Edinburgh, Scotland, UK}
120 }
Something went wrong with that request. Please try again.