Skip to content

Query in Information Retrieval (IR)

Le Thu Nguyen edited this page Mar 28, 2018 · 1 revision

Type of query to use

Keyword-based query:

query compose keyword and documents containing a keyword that searched for

+Single word query

+Context query: given context near other words as Phrase, Proximity

+Boolean query: using Boolean operators (AND, OR, BUT) with a syntax composed of atoms

+Natural Language: an enumeration of words and context queries which are of interest to the user.

Pattern matching:

Using a concept of the pattern (set of syntactic features that must be found in a text segment) allow retrieving pieces of text that have some property.

Type of patterns such as words, prefixes, suffixes, substrings, ranges, or regular expression

The differences between Boolean retrieval, wildcard queries, and phrase queries

  1. Boolean queries:

Exact match

  1. Wildcard queries:

Words have many accepted spellings such as ar or bar*l

Mapping these patterns to term/s in vocabulary. Can result in expensive query execution.

  1. Phrase queries:

The process as did with the wildcard. The representation of documents as vectors. The relative

order of terms in a document is lost in the encoding of a document as a vector.

Some techniques that can improve the efficiency of computing scoring and ranking in search systems

  • Cluster pruning

  • Tiered indexes

  • Query-term proximity

  • Designing parsing and scoring functions

  • Vector space scoring and query operator interaction for free text queries

Reference:

Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html