cirrina-search

What is a minimum span text search?

A way to characterise the relevance of a search result is the span in which the search terms are found. Large span means that the search terms are scattered sparsely around that spot and there is a high probability that the search result isn't very relevant or useful.

Minimum span text search tries to find such combination of found search terms that it minimises the span that covers all of them. This increases the meaningfulness of the results (except in some pathological cases).

Why should I use it?

Usually full text search results are just occurences of search terms (perhaps agumented with some score that depends on was the hit a whole word or just a part of it). Especially when the search terms are a bit vague or the subject that the user is looking for doesn't have very specific vocabulary associated with it the simple search term matching doesn't provide the results as accurately as needed.

When searching with multiple search terms, it comes more pronouced that the system should rank the results in a meaningful and preferably well-defined way. Minimum span text search provides a way to enhance the search results so that the results are more useful to the user.

Example

from Cirrina import Cirrina

text = read_from_file(....)

# Parse text into a searchable corpus
search = Cirrina( text )

# search for a minimum span that contains all given search terms
result = search( search_terms )

# Show results
lines = text.splitlines()
print lines[ result[0] : result[1] + 1 ]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
js		js
.gitignore		.gitignore
Cirrina.py		Cirrina.py
LICENSE		LICENSE
README.md		README.md
cirrina-cli.py		cirrina-cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

js

js

.gitignore

.gitignore

Cirrina.py

Cirrina.py

LICENSE

LICENSE

README.md

README.md

cirrina-cli.py

cirrina-cli.py

Repository files navigation

cirrina-search

What is a minimum span text search?

Why should I use it?

Example

About

Releases

Packages

Languages

License

mgronhol/cirrina-search

Folders and files

Latest commit

History

Repository files navigation

cirrina-search

What is a minimum span text search?

Why should I use it?

Example

About

Resources

License

Stars

Watchers

Forks

Languages