Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ePub text search #47

Closed
dghag opened this issue Sep 16, 2013 · 9 comments
Closed

ePub text search #47

dghag opened this issue Sep 16, 2013 · 9 comments

Comments

@dghag
Copy link

dghag commented Sep 16, 2013

How to implement "ePub text search" feature?

@dghag
Copy link
Author

dghag commented Sep 18, 2013

Please anyone at least tell me, Is "text search" possible or not??

@fchasen
Copy link
Contributor

fchasen commented Sep 19, 2013

For the immediate future search isn't possible, though we are planning to develop a backend to enable it.

It will likely be separate from the ePub.js code, as it will require a python.

@dkurth
Copy link

dkurth commented Oct 22, 2013

I wonder if you could build the search indexes using lunr.js. It already supports tokenization, stemming, and stop words. I haven't used it for anything serious, so I don't know about performance, and building the indexes on the client is not optimal, so just a thought.

@fchasen
Copy link
Contributor

fchasen commented Jan 7, 2014

@AJRenold - could you update this thread with the search progress and close?

@AJRenold
Copy link
Contributor

AJRenold commented Jan 8, 2014

We're developing a light-weight indexing and search tool which allows an entire epub file to be searched via a search API. (https://github.com/futurepress/epubjs-search) The reader plugin search.js (https://github.com/futurepress/epub.js/blob/master/reader/plugins/search.js#L4) points to the API endpoint where you must be running a search server with the index of the same book. This model only supports indexing and searching a single book.

I'll update the epubjs-search readme today with better instructions on how to use the library.

@AJRenold AJRenold closed this as completed Jan 8, 2014
@grayxr
Copy link

grayxr commented Jan 15, 2014

Please update the epubjs-search readme. Thanks!
@AJRenold @fchasen

By the way, would a nice alternative option be to just use the unzipped/decompressed epub folder format and perform a search query that iterates over all of the epub's '.xhtml' files?

@AJRenold
Copy link
Contributor

I just updated the epubjs-search README.md with instructions that should get you up and running a simple Python search API.

We are actively considering other options to create a search feature that runs on the client side only.

@fchasen
Copy link
Contributor

fchasen commented Jan 16, 2014

The search engine needs to have a sense of how a epub is put together or it can't generate the Epub CFI and cfi's are what allow the reader to jump to a location in the book.

To briefly describe what the search engine does:

  • Parses the epub spine / manifest
  • Indexs each page that was in the spine along with the spine items cfi position
  • Retrieves pages from the index that match a search query
  • Looks for all the instances in that page of the query and assigns them full cfi's
  • Returns the list of results as json

@shenzhuxi
Copy link

Is lunr.js good for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants