Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse a HTML string #18

Closed
nazieb opened this issue Mar 21, 2014 · 6 comments
Closed

Parse a HTML string #18

nazieb opened this issue Mar 21, 2014 · 6 comments

Comments

@nazieb
Copy link

nazieb commented Mar 21, 2014

Hello,

Is there any way to parse a HTML string instead of a URL? In my app I already grab the HTML via Guzzle HTTP and I want to parse from the response, not to request the URL all over again.

@oscarotero
Copy link
Owner

Hi.
The Url class has the Url->resolve method to do the request and get all available data (https://github.com/oscarotero/Embed/blob/master/Embed/Url.php#L58)
You can provide your own resolver class editing the Embed\Url::$resolver variable. Note that the Url class is used not only to get the content of the main url but also to get the response of other secondary urls (APIs, redirects, oembed, etc). I guess you want to change only the way to resolve the main url, not all these secondary requests, so changing the resolver of the Url class is not the best way.
Maybe a possible solution can be to provide a new method to set manually the content and headers of the url.

@oscarotero
Copy link
Owner

Hi again, @nazieb
I've working in a new feature to provide custom urls resolvers in a more flexible way. There is a new branch called "custom_url_resolvers" with some changes:

You can create your own url resolver and use it on create a new Url instance:

$resolver = new GuzzleResolver($guzzleData);
$url = new Embed\Url($resolver); //You can provide directly the resolver instead the url string
$info = Embed\Embed::create($url);

You can set your url resolver as default to use it always, not only the main url:

Embed\Url::setDefaultResolver('GuzzleResolver');

Please, let me know if this is what you need.

@nazieb
Copy link
Author

nazieb commented Mar 22, 2014

Hello Oscar,

It's really nice of you to build the custom URL resolver. That might come
handy, but what I really meant is the ability to parse HTML string without
needing to do an HTTP request.

For the example, in my app the HTML is already stored in the database
(after resolved by Guzzle in another process) then I want to parse the
OpenGraph, Twitter Card etc from those HTMLs.

Without Wax,

Ainun Nazieb
http://nazie.bz/

On Sat, Mar 22, 2014 at 1:48 AM, Oscar Otero notifications@github.comwrote:

Hi again, @nazieb https://github.com/nazieb
I've working in a new feature to provide custom urls resolvers in a more
flexible way. There is a new branch called "custom_url_resolvers" with some
changes:

You can create your own url resolver and use it on create a new Url
instance:

$resolver = new GuzzleResolver($guzzleData);$url = new Embed\Url($resolver); //You can provide directly the resolver instead the url string$info = Embed\Embed::create($url);

You can set your url resolver as default to use it always, not only the
main url:

Embed\Url::setDefaultResolver('GuzzleResolver');

Please, let me know if this is what you need.


Reply to this email directly or view it on GitHubhttps://github.com//issues/18#issuecomment-38309790
.

@oscarotero
Copy link
Owner

Hi.
There is some data that require http requests: data provided by oembed, by facebook graph or some others APIs.
For example, youtube has an oembed service to provide information about the videos (example: http://www.youtube.com/oembed?format=xml&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DeiHXASgRTcA) so it's required to do this request to get the title, author, description, embed code, etc.
If you have the html content of the page stored, you can prevent to do the first http request (to get the page content) but not the requests that connect with other apis.
With the url resolver, you can do something like this:

//Create your own resolver class that implements the Embed\UrlResolvers\UrlResolverInterface and instance it:
$resolver = new MyOwnResolver();

//Now set the information you have stored in your database (url, content, etc) to prevent the request
$resolver->setUrl($url);
$resolver->setContent($content);

//This can be set by default:
$resolver->setMimetype('text/html');
$resolver->setHttpCode(200);

//Ok, you can now get all information about this url
$info = Embed\Embed::create($resolver);

@nazieb
Copy link
Author

nazieb commented Mar 22, 2014

Wow, that's nice solution. I'm fine with the http request to another API
providers.

I'll test your example right away. Thanks for the support and this great
library!
On Mar 22, 2014 7:27 PM, "Oscar Otero" notifications@github.com wrote:

Hi.
There is some data that require http requests: data provided by oembed, by
facebook graph or some others APIs.
For example, youtube has an oembed service to provide information about
the videos (example:
http://www.youtube.com/oembed?format=xml&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DeiHXASgRTcA)
so it's required to do this request to get the title, author, description,
embed code, etc.
If you have the html content of the page stored, you can prevent to do the
first http request (to get the page content) but not the requests that
connect with other apis.
With the url resolver, you can do something like this:

//Create your own resolver class that implements the Embed\UrlResolvers\UrlResolverInterface and instance it:$resolver = new MyOwnResolver();
//Now set the information you have stored in your database (url, content, etc) to prevent the request$resolver->setUrl($url);$resolver->setContent($content);
//This can be set by default:$resolver->setMimetype('text/html');$resolver->setHttpCode(200);
//Ok, you can now get all information about this url$info = Embed\Embed::create($resolver);


Reply to this email directly or view it on GitHubhttps://github.com//issues/18#issuecomment-38350204
.

@younes0
Copy link
Contributor

younes0 commented Sep 13, 2015

With the new Guzzle5 resolver:

use GuzzleHttp\Client;
use GuzzleHttp\Event\BeforeEvent;
use GuzzleHttp\Message\Response;
use GuzzleHttp\Stream\Stream;

$html = file_get_contents('http://whatever'); // HTML string

$client = new Client();

$client->getEmitter()->on('before', function(BeforeEvent $e) use ($html) {
    $body = isset($html) ? Stream::factory($html) : null;
    $e->intercept(new Response(200, [], $body));
});

$embed = Embed::create($url = 'dummy', [ // $url must not be empty
    'resolver' => [ 
        'class' => \Embed\RequestResolvers\Guzzle5::class,
        'config' => [ 'client' => $client ],
    ],
]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants