New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fulltext search: all entries in entire document #11313
Conversation
@@ -126,6 +127,13 @@ function ReaderSearch:onShowFulltextSearchInput() | |||
UIManager:close(self.input_dialog) | |||
end, | |||
}, | |||
{ | |||
text = _("All"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably add a context. Whether it's "all search results" or "all aliens from Mars" might make a difference in some languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@translators Find all entries in entire document, button displayed on the search bar, should be short.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance for just text boxes is okay? I can easily see going through all pages being quite heavy.
Yes, it's not very fast but rather good. I'll try some acceleration. koreader/frontend/document/koptinterface.lua Line 1523 in f8eba22
Transform is not needed for every item, can be done when drawing only. |
Various thoughts: Feels a bit Proof of Concept that we can have something working with PDF - while having nothing (yet) working with EPUBs. EPUBs is more text-centered and we may have more info to work with and present in a search result lists (ie. chapter titles of where the results is in). We may need to think about both to figure out the common needs/possibilities/API (and possibly have an API that would be reusable if we need something to present in Book map). For crengine, I think we'd need some work/addition to cre.cpp - even just removing scan limits to 2 page height to see how fast/slow it could be - which you can't do/compile/test I guess :/ There's also the notion of "context" I mentionned at #11203 (comment). About the context again, above, in your screenshot for "the", the middle word(s) is always lowercase, even when the surrounding words are uppercase. May be you should fetch the middle word from the text instead of reusing the lowercased pattern. You could also keep the list of results to re-use/display it again if the users comes back to executing the same query to go see another page - to avoid the slowness. |
That's the main reason I started with pdf. Thanks for the various ideas, I'll work with them. |
frontend/document/koptinterface.lua
Outdated
end | ||
end | ||
|
||
function KoptInterface:findTextAll(doc, pattern, caseInsensitive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findAllText
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unless there's a previous pattern of functions being named findText<Something>
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For internal usage there is
koreader/frontend/document/koptinterface.lua
Line 1410 in 5d2a441
function KoptInterface:findAllMatches(doc, pattern, caseInsensitive, page) |
and for external calls
koreader/frontend/document/koptinterface.lua
Line 1430 in 5d2a441
function KoptInterface:findText(doc, pattern, origin, reverse, caseInsensitive, pageno) |
So, the new external one supposed to be
findTextAll
.
Here's a patch to base/cre.cpp that may help getting all results and more info so you can build something similar to what you have with PDF: I tested it just on console with: +++ b/frontend/apps/reader/modules/readersearch.lua
- local retval, words_found = self.ui.document:findText(pattern, origin, direction, case_insensitive, page, regex, self.max_hits)
+ local retval, words_found = self.ui.document._document:findTextAll(pattern, case_insensitive and 1 or 0, regex and 1 or 0, self.max_hits or 200, 1, 3)
+ logger.warn(retval, words_found) The last 2 arguments can be set to 0 to only get xpointers, as in findText(), if you need to benchmark how expensive is the next stuff. If the previous to last is 1, you additionally get word (actually, what matched), prefix and suffix, which could allow you to reconstruct the full word whose only some bits did match. {
["end"] = "/body/DocFragment[16]/body/section/div/p[23]/text()[8].10",
start = "/body/DocFragment[16]/body/section/div/p[23]/text()[8].6",
word = "oman",
word_prefix = "R",
word_suffix = "ia"
} --[[table: 0x7f0535156bd8]],
{
["end"] = "/body/DocFragment[17]/body/section/div/p[63]/text()[2].32",
start = "/body/DocFragment[17]/body/section/div/p[63]/text()[2].28",
word = "oman",
word_prefix = "w"
} --[[table: 0x7f0535156da8]], The last one is the nb of additional words you want before and after. {
["end"] = "/body/DocFragment[16]/body/section/div/p[8]/text()[3].360",
next_text = " of his caste",
prev_text = "would marry a ",
start = "/body/DocFragment[16]/body/section/div/p[8]/text()[3].356",
word = "oman",
word_prefix = "w"
} --[[table: 0x7f33806e7d38]],
{
["end"] = "/body/DocFragment[16]/body/section/div/p[23]/text()[8].10",
next_text = " and ego in",
prev_text = "pronounced yo) in ",
start = "/body/DocFragment[16]/body/section/div/p[23]/text()[8].6",
word = "oman",
word_prefix = "R",
word_suffix = "ia"
} --[[table: 0x7f33806e7f68]],
{
["end"] = "/body/DocFragment[17]/body/section/div/p[63]/text()[2].32",
next_text = ", was imagined as",
prev_text = "high-born ",
start = "/body/DocFragment[17]/body/section/div/p[63]/text()[2].28",
word = "oman",
word_prefix = "w"
} --[[table: 0x7f33806e81a0]], I haven't checked how it behave when matching the first or last word of a book :) |
Not at my computer to build, |
@hius07 @poire-z |
Thank you and @mergen3107 for the lib, haven't tried yet, I want to finish with pdf and then dig cre. It's better without additional spaces: |
About brackets: in cre we will need sometimes to put them around a part of a word, and even without additional spaces the gaps are rather wide. |
Depends on how you want to present stuff: you could bracket the full word even if the pattern match only some part (the cre new functions would allow you to construct the whole word with the returned word_prefix/suffix). We could also use HtmlBoxWidget so you could just use |
In most of our UI list text stuff, we are in a plain-text world (single font size, no styling, not even bold or italic, single color, single bgcolor). |
Is there a way we could just render the current page of results at a time to lower that to maybe a maximum of 40 results/page? Maybe this is covered above, I only skimmed. |
Why ?
Or maybe not. MuPDF is probably not as good with text than our text/xtext stuff (RTL, Bidi, picking the right glyphs from the CJK font per language, using our long chain of fallback font...). |
Works good, thanks! |
local text = item.matched_text or "" | ||
if item.matched_word_prefix then | ||
text = item.matched_word_prefix .. text | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it local word
and mention why we show the full word and not only the requested matched text.
-- PDF/Kopt shows full words when only some part matches; let's do the same with CRE
If no more comment about this PR, I'll merge (and bump into the koreader repo, dunno if you can do this as part of this PR with github web) koreader/koreader-base#1720 tomorrow. |
Very good for Saturday, thanks. |
I tried to pursue on the idea at #11313 (comment). Simplest and least intrusive (and to not have to pass a --- a/frontend/apps/reader/modules/readersearch.lua
+++ b/frontend/apps/reader/modules/readersearch.lua
@@ -518,2 +518,3 @@ function ReaderSearch:showFindAllResults(not_cached)
if self.ui.rolling and not_cached then -- for ui.paging: items are built in KoptInterface:findAllText()
+ local TextBoxWidget = require("ui/widget/textboxwidget")
for _, item in ipairs(self.findall_results) do
@@ -528,3 +529,4 @@ function ReaderSearch:showFindAllResults(not_cached)
-- append context before and after the word
- local text = "ã<80><90>" .. word .. "ã<80><91>"
+ -- local text = "ã<80><90>" .. word .. "ã<80><91>"
+ local text = TextBoxWidget.FORMATTING_BOLD_START .. word .. TextBoxWidget.FORMATTING_BOLD_END
if item.prev_text then
@@ -535,2 +537,3 @@ function ReaderSearch:showFindAllResults(not_cached)
end
+ text = TextBoxWidget.FORMATTING_ENABLED .. text
item.text = text
--- a/frontend/ui/widget/textboxwidget.lua
+++ b/frontend/ui/widget/textboxwidget.lua
@@ -112,2 +112,9 @@ local TextBoxWidget = InputContainer:extend{
for_measurement_only = nil, -- When the widget is a one-off used to compute text height
+
+ -- Simple formatting chars that can be embedded in self.text
+ -- (these codepoints are not part of Unicode, so hopefully not present naturally in any provided self.text)
+ FORMATTING_ENABLED = "\u{FFF1}", -- should be put at start of 'text' to indicate we may find the next ones
+ FORMATTING_BOLD_START = "\u{FFF2}",
+ FORMATTING_BOLD_END = "\u{FFF3}",
+ _formatting_char_bold = nil,
}
@@ -156,2 +163,34 @@ function TextBoxWidget:init()
+ if self.text and type(self.text) == "string" and self.text:sub(1, #TextBoxWidget.FORMATTING_ENABLED) == TextBoxWidget.FORMATTING_ENABLED then
+ -- Support for very simple formatting (bold only for now)
+ self._formatting_char_bold = {}
+ -- Alas, we can't let any of our flag characters be fed to xtext (even with ASCII control
+ -- chars, it would give them a width, which would result at best in spurious added spacing).
+ -- So, split text into a table of chars, filter our flags out keeping track of where they
+ -- start and end bold, and rebuild a string.
+ local charlist = util.splitToChars(self.text)
+ table.remove(charlist, 1)
+ local is_bold = false
+ local len = #charlist
+ local i = 1
+ while i <= len do
+ local ch = charlist[i]
+ if ch == TextBoxWidget.FORMATTING_BOLD_START then
+ is_bold = true
+ table.remove(charlist, i)
+ len = len - 1
+ elseif ch == TextBoxWidget.FORMATTING_BOLD_END then
+ is_bold = false
+ table.remove(charlist, i)
+ len = len - 1
+ else
+ if is_bold then
+ self._formatting_char_bold[i] = true
+ end
+ i = i + 1
+ end
+ end
+ self.text = table.concat(charlist, "")
+ end
+
self:_computeTextDimensions()
@@ -824,3 +863,4 @@ function TextBoxWidget:_renderText(start_row_idx, end_row_idx)
local face = self.face.getFallbackFont(xglyph.font_num) -- callback (not a method)
- local glyph = RenderText:getGlyphByIndex(face, xglyph.glyph, self.bold)
+ local bolder = self._formatting_char_bold and self._formatting_char_bold[xglyph.text_index] or false
+ local glyph = RenderText:getGlyphByIndex(face, xglyph.glyph, bold, bolder)
local color = self.fgcolor
--- a/frontend/ui/rendertext.lua
+++ b/frontend/ui/rendertext.lua
@@ -291,3 +291,3 @@ end
-- @treturn glyph
-function RenderText:getGlyphByIndex(face, glyphindex, bold)
+function RenderText:getGlyphByIndex(face, glyphindex, bold, bolder)
if face.is_real_bold then
@@ -295,3 +295,3 @@ function RenderText:getGlyphByIndex(face, glyphindex, bold)
end
- local hash = "xglyph|"..face.hash.."|"..glyphindex.."|"..(bold and 1 or 0)
+ local hash = "xglyph|"..face.hash.."|"..glyphindex.."|"..(bold and 1 or 0)..(bolder and "x" or "")
local glyph = GlyphCache:check(hash)
@@ -301,3 +301,11 @@ function RenderText:getGlyphByIndex(face, glyphindex, bold)
end
- local rendered_glyph = face.ftsize:renderGlyphByIndex(glyphindex, bold and face.embolden_half_strength)
+ local embolden_strength
+ if bold or bolder then
+ embolden_strength = face.embolden_half_strength
+ if bolder then
+ -- Even if not bold, get it bolder than the strength we'd use for bold
+ embolden_strength = embolden_strength * 1.5 -- or 2 for very bold
+ end
+ end
+ local rendered_glyph = face.ftsize:renderGlyphByIndex(glyphindex, embolden_strength)
if not rendered_glyph then which would give us instead of our strange brackets: (I think we can't really go too bold, as complex glyphs like CJK may get ugly and unreadable.) Good enough and not too ugly to go with that ? |
I like the bold font, more than the brackets. |
Just mentionning that: |
Exactly, that's the reason. |
A few remarks I had while playing with it on my Kobo (while testing bold):
|
I agree with the last point. You can search terms within files of a directory and its subdirectories. Each search query is like a chapter in the TOC: you can sxpand and collapse it, and all searches are always available as a cache. Since book files usually don't change their contents, it would make a lot of sense to just keep piling up cache of searched terms attached to this book's sdr (?). (This now starts to resemble Kindle's X-Ray feature a lot...) |
I love the Notepad++ search for the same reason. Remembering and caching the last few results is nice as long as it doesn't take up tons of space. |
this is a brilliant addition and I am using it more and more every day, however, there is no option to adjust the font size of the results. Would it possible to add this? also, in some results, when the matched word is at the beginning of a paragraph, it will join it together to the previous word: example, let's say you are looking for the word "Train" the text is, [he departed that night. Train playing in the background] where each line is a separate paragraph, then it will displayed as [he departed that night.Train playing in the background] without any spacing between night. and Train (makes sense?) |
It's not obvious, but it is the same font size that is used by the file browser in Classic display mode. |
it would probably be best, if it was under settings like it is for dictionary lookup, and independent of the classic display mode. Another thing, currently you can only set 'words in context' and 'results per page', I was thinking that it would be great to have an option that instead of worrying about how many words you may or may not need, shows you as many words as it can fit in say 2-4 lines. So user selects font size and how many lines per result to display and the software shows whatever it can fit there. One could also select, instead of showing 50/50 word content (before/after matched word), to be perhaps 25/75 or 33/67 or any combination really. And implementing the blinky thing when you get back to original place where you started the search from a findAll search (I will die on this hill) would be REALLY useful. |
oops, just found this, in my defence: it was hidden. |
New All button to search for all entries in a pdf document.
Search results showing two previous and two next words, and page number.
On tap jumps to the page and highlights the search string.
Extreme: search for "the".
This change is