English Knowledge Resources

###SQL-Based Resources

@TODO: The links for downloading the resources refer now to the old BIUTEE webpage. Refer to new Maven repository.

Some knowledge resources are stored as MySQL tables, provided as compressed .sql files. In order to use them:

Download the resources from the links in the table below. Each file represents one MySQL schema, and may contain several knowledge resources. Note that you don't need to download them all, you may download only the schema files containing the resources you wish to use.
Install the free SQL server MySQL.
Install its administration tool MySQL Workbench.
Run the server.
Connect to the server via MySQL Workbench, and in it:
Create a user named db_readonly, with password BIUTEE: ''Users and Privileges --> Add Account''
Import the schema files to the database: ''Data Import/Restore --> Import from Dump Project Folder --> (input folder path containing uncompressed .sql files) --> Load Folder Contents --> (select all required schemas) --> Start Import''
Make sure user db_readonly has read (SELECT) privileges to all of the tables in the imported schemas.
Define an environment variable named MYSQL with a value referring to the MySQL server address (name or IP address) and port. For example: dbsql.cs.biu.ac.il:3306.

Schema Name	Knowledge Resources in Configuration	Schema Download	File Size (Compressed)
BAP (Directional Similarity)	BAP	Download	111 MB
Lin Similarity	LIN_DEPENDENCY_ORIGINAL LIN_PROXIMITY_ORIGINAL	Download	236 MB
Original DIRT	ORIG_DIRT	Download	55 MB
Wikipedia Knowledge Resource	WIKIPEDIA	Download	214 MB
Binary Lin, Dependency Reuters	BINARY_LIN LIN_DEPENDENCY_REUTERS	Download	2.4 GB
Framenet	FRAMENET	Download	228 KB
Geo (Geographical Knowledge Resource)	GEO	Download	1.4 MB
ReVerb (Distributional Similarity with Global Constraints)	REVERB	Download	161 MB

###Redis-based Resources

####Distributional Similarity

Distribution

Redis database files
License: MIT license

#####Lexical

Java interface: SimilarityStorageBasedLexicalResource

######Lin proximity-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, without dependency-based features. Top 1000 similarities were selected for each element.

About 57M rules.

Download

######Lin dependency-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 58M rules.

Download

######Directional similarities, Reuters

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 53M rules for left side, and about 43M rules for right side.

Download

Directional similarities, UkWAC

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the English UKWac corpus, with dependency-based features. Top 1000 similarities were selected for each element.

About 21M rules for left side, and about 33M rules for right side.

Download

Syntactic

Java interface: SimilarityStorageBasedDIRTSyntacticResource

DIRT, Reuters, Redis-based

Distributional similarity rules for English dependency paths (which appear at least 100 times in the corpus). The similarities were calculated by applying the DIRT method [Lin 1998] on the Reuters RCV1 and RCV2 corpora. Top 1000 similarities were selected for each element.

About 10M rules.

Download

######Distributional Similarity based on Reverb dataset, Redis-based

Distributional similarity rules for English predicates, based on Reverb extractions [Fader et al. 2011].

Download

####Wikipedia

Distribution:

Redis database file
License: MIT license
Java interface: eu.excitementproject.eop.lexicalminer.redis.RedisBasedWikipediaLexicalResource
Download

####Geo (Geographical Knowledge Resource)

Distribution:

Redis database file
License: MIT license
Java interface: eu.excitementproject.eop.core.component.lexicalknowledge.geo.RedisBasedGeoLexicalResource
Download

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

English Knowledge Resources

Directional similarities, UkWAC

Syntactic

DIRT, Reuters, Redis-based

Documentation

Get Involved

Development

Clone this wiki locally