Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to pull social images from content #268

Closed
sybrew opened this issue Feb 21, 2018 · 9 comments

Comments

@sybrew
Copy link
Owner

commented Feb 21, 2018

From marchino65's concern:

Images aren't pulled from the content. This is the default and intended behavior.

We could pull images from the content (WordPress has functions for it), but we only want to implement this optionally.

I'm not sure if Twitter supports this behavior. Facebook (through Open Graph) certainly does.

@sybrew

This comment has been minimized.

Copy link
Owner Author

commented Aug 2, 2018

Proposed method:

/**
 * Returns the social image URL from the post content.
 *
 * @since 3.1.0
 *
 * @param int $id The post ID. Required.
 * @param array $args The image args.
 * @param bool $set_og_dimensions Whether to set Open Graph image dimensions.
 * @return string The attachment URL.
 */
public function get_social_image_url_from_post_content( $id, $args = [], $set_og_dimensions = false ) {

	$images = \get_attached_media( 'image', $id ) ?: [];
	$_id = key( $images );
	$src = '';

	if ( isset( $images[ $_id ]->ID ) ) {

		$args = $this->reparse_image_args( $args );
		$args['get_the_real_ID'] = true;

		$src = $this->parse_og_image( $images[ $_id ]->ID, $args, $set_og_dimensions );

		if ( $src && $this->matches_this_domain( $src ) )
			$src = $this->set_preferred_url_scheme( $src );
	}

	return $src;
}

Now, get_attached_media() seriously harms performance.
So I'm hesitant on adding it like this.

It'd be better scraping the images from content via regex, where we obtain the first image (or more, in the future). This is at least a thousand times faster, as we don't have to fetch (all) the attachment posts from the database.

With this, we must be careful about pulling smilies.

@lifeforceinst

This comment has been minimized.

Copy link

commented Aug 2, 2018

You also need to consider that the URL from a post could be absolute or relative URL - remember the golden rule of programming is not to assume.

Here is an example of how I retrieved images in one of my themes, which determines if there is a featured image, if not, then tried to extract image from post using preg_match and if no image can then apply a default image (such as the fallback image if set).

 $thumbnail    = wp_get_attachment_image_src (get_post_thumbnail_id( $post->ID), 'thumbnail');
   if (empty($thumbnail)) {
    // There is no featured image, so check for embedded image
    $theimage = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches);
    $firstimg = $matches[1][0];
    if (empty($firstimg)) {
      $thumbnail = array ('0' => $the_fallback_image);
    } else {
      // Need to handle relative or absolute image URI, by using parse_url to get the image path
     // Then add this to the home URI
      $fileimg  = parse_url ($firstimg, PHP_URL_PATH);
      $firsturl = home_url() . $fileimg;
      $thumbnail = array ('0' => $firsturl);
    }      

}

@sybrew

This comment has been minimized.

Copy link
Owner Author

commented Aug 2, 2018

I think that the golden rule is to assume this: There are users out there doing weird stuff 😅
It's what's called defensive programming, and it's applied throughout the plugin.

Thank you for the regex and the example 👍 It's in the direction of what I had in mind!

Now, the XML standard allows omission of the '" tags, so this is a perfectly valid img tag:
<img src=example.com/image.jpg />


I won't add this enhancement/feature to v3.1, so you might find the following snippet useful.

It'll only pass URLs that are matching the same domain (or are relative). Now, it won't detect broken HTML (missing closing quotes or having extraneous spaces in the URL)--it's not a DOM parser, after all.

Nevertheless, here it is:

add_filter( 'the_seo_framework_og_image_after_featured', function( $image, $post_id ) {
	// Someone else already supplied an image in this filter.
	if ( $image ) return $image;

	$post = get_post( $post_id );
	if ( isset( $post->post_content ) ) {
		preg_match_all(
			'/<img[^>]+src=(\"|\')?([^\"\'>\s]+)\1?.*?>/mi',
			$post->post_content,
			$matches,
			PREG_SET_ORDER
		);

		$tsf = the_seo_framework();
		foreach ( $matches as $match ) {
			if ( empty( $match[2] ) ) continue;

			if ( $tsf->matches_this_domain( $match[2] ) ) {
				$image = $tsf->set_url_scheme( $match[2] );
				break;
			} elseif ( 0 === strpos( $match[2], '/' ) && 0 !== strpos( $match[2], '//' ) ) {
				$url   = trailingslashit( $tsf->get_home_host() ) . ltrim( $match[2], ' /' );
				$image = $tsf->set_url_scheme( $url );
				break;
			}
		}
	}

	return $image;
}, 10, 2 );

I tested it, and it works great. It adds 0.00009s±~200% of load time, which is negligible for a feature like this.
The prior proposed function added roughly 0.003s of load time, which is twice the plugin's runtime.

Let me know if this works for you 😄, and I'll implement it for v3.2.

@sybrew sybrew self-assigned this Aug 2, 2018

@lifeforceinst

This comment has been minimized.

Copy link

commented Aug 3, 2018

@sybrew

This comment has been minimized.

Copy link
Owner Author

commented Aug 3, 2018

Hi @lifeforceinst,

Could you use GitHub in the browser to reply to issues? Your email client doesn't seem to support HTML (and in extend Markdown) and has odd breaking points, which are making your replies unreadable.


The filter's meant for testing purposes. The placement of this feature is to be determined, and it'll be before the fallback filter fires. Thanks for highlighting that 👍

There's no commented out code. It's PHPDOC, which helps me (and others) understand the code better in the future.

The code doesn't pick up on <img src="/uploads/whiteblue-150x150.png"> because it's not valid XHTML, it should have a closing tag: <img src="/uploads/whiteblue-150x150.png" />

However, since we're in the HTML5 era, the closing tag and space may be omitted.
Thanks for looking into that 😄 I'll update the snippet to support HTML5, too.

@sybrew

This comment has been minimized.

Copy link
Owner Author

commented Aug 3, 2018

Updated the regex.

From:

<img.*?\bsrc=(\'|\")?(.*?)(\1|\s).*?\/>

To:

<img[^>]+src=(\"|\')?([^\"\'>\s]+)\1?.*?>

image


Now, the PHP method will always look for a domain or a slash at the start of each match.
Because WordPress doesn't upload files where posts are located, this is fine.

@lifeforceinst

This comment has been minimized.

Copy link

commented Aug 3, 2018

Thanks for the advice about the email client,i will instead add directly here.

The new preg_match_all rules works like a charm, I created about 200 different combinations of embedded images and it worked for all permutations. So yes using preg_match_all ('/<img[^>]+src=(\"|\')?([^\">\s]+)\1?.*?>/mi', $post->post_content, $matches, PREG_SET_ORDER); is a good way to go with trying to deal with all the variations of supporting new HTML5 tags and also dealing with things which may be legacy HTML.

Can you confirm that the file /autodescription/inc/classes/generate-image.class.php is meant to have //* instead of // as the 36 instances of this seemed to result in a large amount of code being unused?

Would you be looking to change the order of the fallback filter firing so that fallback_1 fires as match 4 ahead of 5 Fetch image from SEO settings. Example follows

Last item I am looking at the final logic elseif ( 0 === strpos( $match[2], '/' ) && 0 !== strpos( $match[2], '//' ) ) { this does not seem to be working root relative URLs, if I shortened this to just elseif ( 0 === strpos( $match[2], '/' ) ) { then that worked.

@lifeforceinst

This comment was marked as off-topic.

Copy link

commented Aug 3, 2018

FYI here is the latest filter which I am using and which is working once the fallback firing order is addressed.

// Adds a filter which places og metadata for non-features images.
add_filter ('the_seo_framework_og_image_after_featured', function ($image, $post_id) {
  if ($image) { return $image; }
  // No image is set, so try to find one in the post content.
  $post = get_post ($post_id);
  if ( isset ($post->post_content) ) {
    preg_match_all ('/<img[^>]+src=(\"|\')?([^\">\s]+)\1?.*?>/mi', $post->post_content, $matches, PREG_SET_ORDER);
    $tsf = the_seo_framework();
    foreach ($matches as $match) {
      if ( empty( $match[2] ) ) continue;
      if ( $tsf->matches_this_domain( $match[2] ) ) {
        $image = $tsf->set_url_scheme( $match[2] );
        break;
      } elseif ( 0 === strpos( $match[2], '/' ) ) {
        $url   = trailingslashit( $tsf->get_home_host() ) . ltrim( $match[2], ' /' );
        $image = $tsf->set_url_scheme( $url );
        break;
        }
      }
/* Alternative code for matching images.
    foreach ($matches as $postimg) {
      if ( empty ($postimg [2]) ) continue;
      $image = home_url() . parse_url ($postimg [2], PHP_URL_PATH);
      break;
    }
*/
  }
//  echo '<pre>THE Image is: ' . $image . '</pre>';
  return $image;
}, 10, 2 );

@sybrew

This comment has been minimized.

Copy link
Owner Author

commented Aug 3, 2018

I'm glad everything's in working (intended) order 😄

The root relative URLs are ignored, and no condition is necessary to be added.
I commented on this at the end of #268 (comment)

To explain: When you add an image to the root location of a post, the post won't show up, as WP_Rewrite can't reach it.


Off topic:

  1. Regarding the filters:

    1. I'm not going to change the order of the filters. It might destroy someone's site.
    2. I'm not planning on adding new filters, either. In fact, I'm moving towards removing API reliance: everything should be possible via options or extensions. My goal is that there's no longer a need to fiddle with code.
  2. Whenever a line starts with # or //, it's an inline comment, regardless of the appended characters.
    I use various forms of documenting code:

// general inline comment
//* comment relating to the code below. Used as an alternative to a codeblock: /* ... */
//! caution/should-change/planned-change: probable bug, difficult code that looks like a bug, or inconsistency ahead
//? explanatory: this is odd, this is why we did this anyway...

p.s. I hid your latest comment because it was a copy of my snippet, with extra debugging output added. This will confuse users that are stumbling upon it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.