Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing differences between f43.me and a local graby #83

Closed
Simounet opened this issue Mar 8, 2017 · 11 comments
Closed

Parsing differences between f43.me and a local graby #83

Simounet opened this issue Mar 8, 2017 · 11 comments

Comments

@Simounet
Copy link
Contributor

Simounet commented Mar 8, 2017

Hi,
I've got a strange behavior with graby on my local machine. I started with wallabag 2.2.2, graby 1.6.0, readability 1.1.6 but I have the same issue with a standalone graby.
This URL http://www.rom-game.fr/news/2446-Jean%20Baudlot%20-%20de%20l%20Eurovision%20a%20Delphine%20Software.html cannot be parsed on my local but it works on https://f43.me/feed/test .
Any idea?

@j0k3r
Copy link
Owner

j0k3r commented Mar 8, 2017

I just tried with a simple script and I got the same result (which you can put at the root of the project):

<?php

require 'vendor/autoload.php';

$full = new \Graby\Graby(array('debug' => true));
$res = $full->fetchContent('http://www.rom-game.fr/news/2446-Jean%20Baudlot%20-%20de%20l%20Eurovision%20a%20Delphine%20Software.html');

var_export($res);

Graby 1.6.1 (same result with 1.6.0), php-readability 1.1.6

@Simounet
Copy link
Contributor Author

Simounet commented Mar 8, 2017

I did that but get no content. Do I need a special extension? I don't know how to debug this. :/

@j0k3r
Copy link
Owner

j0k3r commented Mar 8, 2017

Do you have tidy installed?

j0k@MBP:~/Sites/github/graby$ php -i | grep tidy
/usr/local/etc/php/5.6/conf.d/ext-tidy.ini,
tidy
tidy.clean_output => no value => no value
tidy.default_config => no value => no value

@j0k3r
Copy link
Owner

j0k3r commented Mar 8, 2017

Without tidy:

j0k@MBP:~/Sites/github/graby$ php test.php
array (
  'status' => 200,
  'html' => '[unable to retrieve full-text content]',
  'title' => 'Jean Baudlot - de l\'Eurovision à Delphine Software',
  'language' => 'fr',
  'url' => 'http://www.rom-game.fr/news/2446-Jean%20Baudlot%20-%20de%20l%20Eurovision%20a%20Delphine%20Software.html',
  'content_type' => 'text/html',
  'open_graph' =>
  array (
    'og_title' => 'Jean Baudlot - de l\'Eurovision à Delphine Software',
    'og_description' => 'Il est celui par qui tout a commencé : raconter l\'histoire de Jean Baudlot revient à faire la genèse de Delphine Software, un des fruits les plus savoureux de l\'aventure du jeu vidéo français. De l\'Eurovision 1979 à l\'Amiga 2000, des Voyageurs du Temps à Croisière pour un Cadavre, Jean Baudlot a bien voulu nous livrer quelques anecdotes sur son parcours...',
    'og_type' => 'article',
    'og_image' => 'http://www.rom-game.fr/multimedia/news/170123_jeanbaudlot730x334.jpg',
    'og_url' => 'http://www.rom-game.fr/news/2446-Jean+Baudlot+-+de+l+Eurovision+a+Delphine+Software.html',
  ),
  'native_ad' => false,
  'summary' => '[unable to retrieve full-text content]',
)

I think you need to install the tidy extension.

@Simounet
Copy link
Contributor Author

Simounet commented Mar 8, 2017

Hmmm, it fixed the standalone issue but not the wallabag one. Even if I cleared my cache.

@Simounet
Copy link
Contributor Author

Simounet commented Mar 9, 2017

It might be a cache issue this time because it is working this morning. It could be great adding tidy to the requirements or at least to en enhanced experience explained into the documentation.

@Simounet Simounet closed this as completed Mar 9, 2017
@j0k3r
Copy link
Owner

j0k3r commented Mar 9, 2017

It's a requirement from php-readability, not graby.
https://github.com/j0k3r/php-readability/blob/master/README.md#requirements

Also, maybe this will prevent the problem when suggesting the extension but it doesn't seems to work atm j0k3r/php-readability#25

@Simounet
Copy link
Contributor Author

Simounet commented Mar 9, 2017

Fair enough. Thanks anyway!

@Simounet
Copy link
Contributor Author

Sorry, I'm back. Getting no content with http://www.begeek.fr/the-elder-scrolls-legends-officiellement-lance-pc-233727 event on wallabag.it but it is working on f43.me.

@Simounet Simounet reopened this Mar 10, 2017
@j0k3r
Copy link
Owner

j0k3r commented Nov 12, 2017

Sorry to get back so late on this, but I'm getting content for that url using graby.

@Simounet
Copy link
Contributor Author

It seems to be fixed. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants