Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when opening .epub generated by a specific RSS feed due to 0-width image in SVG #12004

Closed
PlethoraChutney opened this issue Jun 9, 2024 · 6 comments
Milestone

Comments

@PlethoraChutney
Copy link

  • KOReader version: v2024.04
  • Device: Kobo Clara HD

Issue

I'm a subscriber to 404 Media, which produces private, subscriber-only RSS feeds. When I download news articles from these feeds, KOReader experiences a segfault when trying to open them. Other RSS feeds work without issue. I've attached one of the generated .epub files, since the RSS feed is not public. It's zipped, since GitHub wouldn't let me upload the .epub

24-06-06_05-55_Inside a Kidnapping Planned on the FBI's Secret Chat App.epub.zip

Please let me know if you want anything else! And thank you so much for making KOReader!!

Steps to reproduce

  1. Sync news feed from private RSS stream (which I cannot share, sorry!)
  2. Open any of the resulting .epub files in KOReader
crash.log (if applicable)

Attached crash.log downloaded after restart -> open RSS .epub, with verbose debugging enabled

crash.log

@benoit-pierre
Copy link
Contributor

I can reproduce with the 2024.04 AppImage, but not on master: try the nightly.

@PlethoraChutney
Copy link
Author

The nightly works! Thanks so much!

@poire-z
Copy link
Contributor

poire-z commented Jun 9, 2024

Adding a note for me to fix, for another bug with this book:
Select some text in the first paragraph of the article, View HTML, long press on the text, you get the popup with the classnames.
Select same text but extend it to pass over the broken image, long press on the same text in the 1st paragraph, frontend crash:

06/09/24-19:46:26 DEBUG hold_release detected @ 342 185
06/09/24-19:46:26 DEBUG onHoldReleaseText (duration: 0.806094 ) : 2167 > 2167 = breaking
./luajit: frontend/ui/viewhtml.lua:246: attempt to compare number with nil
stack traceback:
        frontend/ui/viewhtml.lua:246: in function '_handleLongPress'
        frontend/ui/viewhtml.lua:176: in function 'text_selection_callback'
        frontend/ui/widget/textviewer.lua:550: in function 'handleTextSelection'
        frontend/ui/widget/textviewer.lua:160: in function 'callback'
        frontend/ui/widget/textboxwidget.lua:2091: in function 'handleEvent'
        frontend/ui/widget/container/widgetcontainer.lua:83: in function 'propagateEvent'

@Frenzie Frenzie added this to the 2024.06 milestone Jun 9, 2024
@poire-z
Copy link
Contributor

poire-z commented Jun 9, 2024

For reference, the original segfault is solved by koreader/crengine@cb7d5e2.
zero-width images are some stupid SVGs

zero width /body/DocFragment/body/header/div[2]/div/div/div[3]/button/i/svg.0
zero width /body/DocFragment/body/footer/div[1]/div/div/div[2]/div[4]/form/button/i/svg.0
          <button id="menu-open" class="header__menu--open hamburger"
            type="button" title="Menu" aria-label="Menu">
            <i class="icon icon-menu icon--sm">
  <svg class="icon__svg">
    <use xlink:href="https://www.404media.co/assets/icons/feather-sprite.svg?v=41c333e33d#menu"></use>
  </svg>
</i>          </button>

@Frenzie Frenzie changed the title Segfault when opening .epub generated by a specific RSS feed Segfault when opening .epub generated by a specific RSS feed due to 0-width image in SVG Jun 9, 2024
@poire-z
Copy link
Contributor

poire-z commented Jun 9, 2024

The "View HTML" crash is due to some multilines attribute value:

line 0  2       33      body
line 9  3       49      DocFragment     [StyleSheet=OEBPS/https://www.404media.co/assets/dist/app.min.css?v=41c333e33d] [id=_doc_fragment_0]    [lang=en]       [Source=OEBPS/content.html]
line 166        4       65      stylesheet      [href=OEBPS/]
line 1793       4
line 1798       4       81      body    .post-template  .tag-features
line 1847       5       1889    main    .main
line 1875       6       1905    div     .container-fluid        .wrapper
line 1924       7       1921    div     .row
line 1954       8       1937    div     .col-xs-12
line 1992       9       1953    div     .post-hero
line 2032       10      2097    div     .post-hero__content
line 2083       11      2305    div     .post-hero__excerpt
line 2249       11
line 2272       10
line 2289       10      2321    figure  .post-hero__image
line 2341       11      2385    autoBoxing
line 2353       12      2337    img     .lazyload       [data-srcset=/content/images/size/w300/2024/06/anom2.png 300w,
line                 /content/images/size/w600/2024/06/anom2.png 600w,
line                 /content/images/size/w1000/2024/06/anom2.png 1000w,
line                 /content/images/size/w2000/2024/06/anom2.png 2000w]        [data-sizes=(max-width: 800px) 50vw,
line                 (max-width: 1170px) 60vw,
line                 1400px]    [data-src=/content/images/size/w2000/2024/06/anom2.png] [src=https://www.404media.co/assets/images/img-placeholder-md.jpg?v=41c333e33d] [alt=Image of Dillan Mancuso, the Anom logo, and a Swedish police officer.]
line 2956       12
line 2969       11
line 2988       11      2353    figcaption
line 3000       12      2369    span    [style=white-space: pre-wrap;]
line 3057       12

a case I didn't think about :)

@poire-z
Copy link
Contributor

poire-z commented Jun 9, 2024

Should be fixed with:

--- a/crengine/src/lvtinydom.cpp
+++ b/crengine/src/lvtinydom.cpp
@@ -4738,8 +4738,10 @@ static void writeNodeEx( LVStream * stream, ldomNode * node, lString32Collection
                         }
                     }
                     else {
-                        // Ignoring an attribute or its value is left to frontend
-                        *extra_stream << "\t[" << attrName << "=" << attrValue << "]";
+                        // Ignoring an attribute or its value is left to frontend.
+                        // But we don't want to mess the extra_stream specific format
+                        // with a multilines attribute value: replace \n with a space.
+                        *extra_stream << "\t[" << attrName << "=" << attrValue.replace('\n', ' ') << "]";
                     }
                 }
             }

image

Frenzie pushed a commit that referenced this issue Jun 16, 2024
Includes:
- Russian hyphenation: revert "allow hyphens after не" koreader/crengine#568
- Serbian hyphenation: combine patterns for Cyrillic and Latin scripts koreader/crengine#566
- writeNodeEx(): fix handling of multilines attribute values koreader/crengine#569
  See #12004 (comment).
- Add getBalancedHTML() helper

Also includes:
- kobo: add missing blitbuffer library koreader/koreader-base#1823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants