Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images blacklist #54

Closed
soullivaneuh opened this issue Mar 23, 2015 · 13 comments
Closed

Images blacklist #54

soullivaneuh opened this issue Mar 23, 2015 · 13 comments

Comments

@soullivaneuh
Copy link
Contributor

Hi, little feature idea.

What about images blacklist?

Some website always return their logo or whatever useless on open-graph.

Maybe can we have a blacklist solution to ignore some images url?

Thanks.

@oscarotero
Copy link
Owner

Hi, that's a great idea, and it's not difficult to implement. How about something like this?

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'http?://ads.com/*',
                '*banners/*',
            ]
        ]
    ]
];

@soullivaneuh
Copy link
Contributor Author

I didn't think about regex, good catch!

But, will not regexp be annoying to make with special chars like / or ?, used for standard URL?

Maybe something like this?

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'plain' => [
                    'http://example.com/full/path/to/image.png'
                ],
                'regex' => [
                    'http?://ads.com/*',
                    '*banners/*',
                ]
            ]
        ]
    ]
];

@oscarotero
Copy link
Owner

Well, that's not really a regex, it's a url string that can contain two special chars: ? and * and it's converted to regex automatically by this function:
https://github.com/oscarotero/Embed/blob/master/src/Url.php#L57

This is used already in many places, for example for oembed patterns (https://github.com/oscarotero/Embed/blob/master/src/Providers/OEmbed/Instagram.php#L21), so you can do this:

$config = [
    'adapter' => [
        'config' => [
            'minImageWidth' => 16,
            'minImageHeight' => 16,
            'imagesBlacklist' => [
                'http://example.com/full/path/to/image.png',
                'http?://ads.com/*',
                '*banners/*',
            ]
        ]
    ]
];

@soullivaneuh
Copy link
Contributor Author

Ok, let's do this so! 👍

Do you have time to do it soon? If you want, I can work for a PR this evening.

@oscarotero
Copy link
Owner

I wont have time today, so pull requests are welcome.
I think the filter can be done in this function https://github.com/oscarotero/Embed/blob/master/src/Adapters/Adapter.php#L305
Thanks 😄

@soullivaneuh
Copy link
Contributor Author

Ok I will take a look, thanks.

@soullivaneuh
Copy link
Contributor Author

I'm woking on it.

If I try this var_dump for testing:

    public function getImagesUrls()
    {
        $data = Utils::getData($this->providers, 'imagesUrls', $this->request->url);

        var_dump($data);

        return $data;
    }

I sometime got this:

array(1) {
  [0] =>
  array(2) {
    'value' =>
    string(0) ""
    'providers' =>
    array(1) {
      [0] =>
      string(9) "opengraph"
    }
  }
}

Is that normal? What sould I suppose to do with that? Nothing?

@oscarotero
Copy link
Owner

value should contain the image url and providers the list of providers that provide this url (in this case, only opengraph). The value key is empty, what url are you using for testing?

@soullivaneuh
Copy link
Contributor Author

Many URLs with this kind of result. But, if I dump getImages data the empty data aren't present.

For example:

string(67) "http://rss.lefigaro.fr/~r/lefigaro/laune/~3/FpJOta0Eow0/story01.htm"
string(13) "getImagesUrls"
array(2) {
  [0] =>
  array(2) {
    'value' =>
    string(0) ""
    'providers' =>
    array(1) {
      [0] =>
      string(9) "opengraph"
    }
  }
  [1] =>
  array(2) {
    'value' =>
    string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
    'providers' =>
    array(1) {
      [0] =>
      string(4) "html"
    }
  }
}
string(9) "getImages"
array(1) {
  [0] =>
  array(6) {
    'value' =>
    string(82) "http://evene.lefigaro.fr/files/imagecache/celebrity_image_full/celebrity/13658.jpg"
    'providers' =>
    array(1) {
      [0] =>
      string(4) "html"
    }
    'width' =>
    int(90)
    'height' =>
    int(90)
    'size' =>
    int(8100)
    'mime' =>
    string(10) "image/jpeg"
  }
}

Maybe they are filtered on getImages? I think I should filter for the blacklist here instead of getImagesUrls.

@oscarotero
Copy link
Owner

getImages get the urls provided by getImagesUrls and execute curl process to get the image dimmensions and mimetypes, and removes the non-valid images (urls that does not exist or does not belong to real images). I think it's better to filter the images before this, to avoid unnecessary requests. GetImagesUrls should remove the empty values and check each value whether or not is included in the black list.

@soullivaneuh
Copy link
Contributor Author

Ok so two goals now for my PR:

  • Remove empty URLs
  • Remove blacklisted URLs

All on getImagesUrlsfunction. Are we good?

@oscarotero
Copy link
Owner

👍

@soullivaneuh soullivaneuh mentioned this issue Mar 24, 2015
4 tasks
@soullivaneuh
Copy link
Contributor Author

I close to continue on #55.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants