Skip to content
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.

guardian.co.uk headlines not correctly detected as being composed of separate sentences #617

Closed
marco-c opened this issue Dec 14, 2022 · 2 comments

Comments

@marco-c
Copy link
Contributor

marco-c commented Dec 14, 2022

Here's an example:
immagine
immagine

HTML content:

<li class="fc-slice__item l-row__item l-row__item--span-1 u-faux-block-link"> 
        <div class="fc-item fc-item--has-image fc-item--pillar-news fc-item--type-article js-fc-item fc-item--list-media-mobile fc-item--standard-tablet js-snappable" data-link-name="news | group-0 | card-@3" data-item-visibility="all" data-test-id="facia-card" data-id="world/2022/dec/13/ukrainian-forces-damage-key-bridge-near-melitopol-reports-say" data-loyalty-short-url="/p/mqe8d"> 
         <div class="fc-item__container"> 
          <div class="fc-item__media-wrapper"> 
           <div class="fc-item__image-container u-responsive-ratio "> 
            <picture> 
             <!--[if IE 9]><video style="display: none;"><![endif]--> 
             <source media="(min-width: 980px) and (-webkit-min-device-pixel-ratio: 1.25), (min-width: 980px) and (min-resolution: 120dpi)" sizes="220px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=220&amp;quality=45&amp;auto=format&amp;fit=max&amp;dpr=2&amp;s=aaf697b68aac1294270d15849425e21b 440w"> 
             <source media="(min-width: 980px)" sizes="220px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=220&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=00155e57b2a1b399754a5cdf788b51f1 220w"> 
             <source media="(min-width: 740px) and (-webkit-min-device-pixel-ratio: 1.25), (min-width: 740px) and (min-resolution: 120dpi)" sizes="160px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=160&amp;quality=45&amp;auto=format&amp;fit=max&amp;dpr=2&amp;s=22c7310dd7e68c710c9c46d31caa4e80 320w"> 
             <source media="(min-width: 740px)" sizes="160px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=160&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=585f8dbe045b9d882d4482b644e55e84 160w"> 
             <source media="(min-width: 0px) and (-webkit-min-device-pixel-ratio: 1.25), (min-width: 0px) and (min-resolution: 120dpi)" sizes="127px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=127&amp;quality=45&amp;auto=format&amp;fit=max&amp;dpr=2&amp;s=e590f5208444b89ce28c78c4d5c3f0f1 254w"> 
             <source media="(min-width: 0px)" sizes="127px" srcset="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=127&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=1b0a9670a4676dafb44a32bfb6c70aca 127w"> 
             <!--[if IE 9]></video><![endif]--> 
             <img loading="auto" class="responsive-img" alt="" src="https://i.guim.co.uk/img/media/6cb1d83b06dc5aeda615f9c68dadc9eea75d6302/0_42_3500_2100/master/3500.jpg?width=300&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=ca692a8a74908c6d3f95767d63b7a74c"> 
            </picture> 
           </div> 
          </div> 
          <div class="fc-item__content "> 
           <div class="fc-item__header"> 
            <h3 class="fc-item__title"><a href="https://www.theguardian.com/world/2022/dec/13/ukrainian-forces-damage-key-bridge-near-melitopol-reports-say" class="fc-item__link" data-link-name="article"><span class="fc-item__kicker">Melitopol</span> <span class="u-faux-block-link__cta fc-item__headline"> <span class="js-headline-text">Ukrainian forces damage key bridge, reports say</span></span> </a></h3> 
           </div> 
           <div class="fc-item__standfirst-wrapper"> 
            <div class="fc-item__standfirst">
             Supply lines to Russian troops likely to be affected after bridge over Molochna River partly collapsed
            </div> 
            <div class="fc-item__meta js-item__meta"> 
            </div> 
           </div> 
          </div> 
          <a href="https://www.theguardian.com/world/2022/dec/13/ukrainian-forces-damage-key-bridge-near-melitopol-reports-say" class="u-faux-block-link__overlay js-headline-text" data-link-name="article" tabindex="-1" aria-hidden="true">Ukrainian forces damage key bridge, reports say</a> 
         </div> 
        </div> </li>

Here's another example:
immagine
immagine

HTML content:

<li class="fc-slice__item l-list__item l-row__item l-row__item--span-1 u-faux-block-link"> 
          <div class="fc-item fc-item--has-image fc-item--pillar-news fc-item--type-article js-fc-item fc-item--list-mobile fc-item--list-tablet js-snappable" data-link-name="news | group-0 | card-@9" data-item-visibility="all" data-test-id="facia-card" data-id="environment/2022/dec/14/include-biodegradable-plastic-in-uk-single-use-cutlery-ban-say-campaigners" data-loyalty-short-url="/p/mqjb3"> 
           <div class="fc-item__container"> 
            <div class="fc-item__content "> 
             <div class="fc-item__header"> 
              <h3 class="fc-item__title"><a href="https://www.theguardian.com/environment/2022/dec/14/include-biodegradable-plastic-in-uk-single-use-cutlery-ban-say-campaigners" class="fc-item__link" data-link-name="article"><span class="fc-item__kicker">Plastic</span> <span class="u-faux-block-link__cta fc-item__headline"> <span class="js-headline-text">Include biodegradable plastic in UK single-use cutlery ban, say campaigners</span></span> </a></h3> 
             </div> 
             <div class="fc-item__standfirst-wrapper"> 
              <div class="fc-item__meta js-item__meta"> 
              </div> 
             </div> 
            </div> 
            <a href="https://www.theguardian.com/environment/2022/dec/14/include-biodegradable-plastic-in-uk-single-use-cutlery-ban-say-campaigners" class="u-faux-block-link__overlay js-headline-text" data-link-name="article" tabindex="-1" aria-hidden="true">Include biodegradable plastic in UK single-use cutlery ban, say campaigners</a> 
           </div> 
          </div> </li>
@marco-c
Copy link
Contributor Author

marco-c commented Dec 14, 2022

Google Translate makes the same mistake on the first example, but is correct with the second:
immagine
immagine

@marco-c
Copy link
Contributor Author

marco-c commented Jul 11, 2023

Moved to Bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1842795.

@marco-c marco-c closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants