-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PhantomEnvironment is undefined #88
Comments
Hello @Dugnist Which version of goose-parser are you using? We've started a process to determine goose, environment and other blocks which can be used separately.
Here is an example of usage since latest version {
"dependencies": {
"goose-parser": "^0.5.0-alpha.3",
"goose-phantom-environment": "^1.0.12"
}
} Usage: const Parser = require('goose-parser');
const PhantomEnvironment = require('goose-phantom-environment');
const env = new PhantomEnvironment({
url: 'http://www.gooseplanet.ru/',
});
const parser = new Parser({ environment: env });
(async function () {
try {
const results = await parser.parse(
require('./rules/rules'),
);
} catch (e) {
console.log(e.message, e.stack);
}
})(); Also you can consider to user version |
Let me know if you have any other issues |
@maZahaca
and it throw me this error:
|
Current issue with JSDom is related to the fact that this environment does not support dynamic javascript, so any wait, click or whatever iterations with the page won't work. |
@maZahaca ok, i change jsdom to goose-chrome-environment.
It also throw errors:
|
@maZahaca also I change url address to 'https://habrahabr.ru' and I catch this error:
|
@Dugnist please provide your package.json and OS you're operating on, I will try to reproduce these issues. It's wired bugs, cause we use this parsers in production for now |
Also, let's stick to one website when testing, and try it out. |
@maZahaca i'm using linux ubuntu 16.04 LTS
I want to get all html page with executed javascript (if target site use framework like React.js) and save result to html file and required assets. |
@Dugnist This parsing tool (goose-parser) allows you to save only JSON results, not HTML and assets. Here is an example of using goose-parser+goose-chrome-environment to fetch const Parser = require('goose-parser');
const ChromeEnvironment = require('goose-chrome-environment');
const env = new ChromeEnvironment({
url: 'https://www.google.com/search?newwindow=1&ei=mzDCWoPkOI-RmwWaoLzYCg&q=goose-parser&oq=goose-parser&gs_l=psy-ab.3..0i30k1.1186908.1189012.0.1189621.12.12.0.0.0.0.154.877.9j2.11.0....0...1c.1.64.psy-ab..1.11.876...0j0i131k1j0i131i67k1j0i67k1j0i10k1j0i19k1j0i30i19k1j0i10i30i19k1j0i13i30k1j0i8i30k1.0.lU1cumFem2s&gws_rd=cr&dcr=0&fg=1',
});
const parser = new Parser({ environment: env });
(async function () {
try {
const results = await parser.parse({
actions: [
{
type: 'wait',
timeout: 10 * 1000,
scope: '.srg>.g',
parentScope: 'body'
}
],
rules: {
scope: '.srg>.g',
collection: [[
{
name: 'url',
scope: 'h3.r>a',
attr: 'href',
},
{
name: 'text',
scope: 'h3.r>a',
}
]]
}
});
console.log(results);
} catch (e) {
console.log(e.message);
}
})(); And results will be: [
{
url: 'https://www.npmjs.com/package/goose-parser',
text: 'goose-parser - npm'
},
{
url: 'https://github.com/advancedlogic/GoOse/blob/master/parser.go',
text: 'GoOse/parser.go at master · advancedlogic/GoOse · GitHub'
},
{
url: 'https://habrahabr.ru/post/271425/',
text: 'Как парсить интернет по-гусиному / Хабрахабр'
},
{
url: 'https://pypi.python.org/pypi/goose-extractor/',
text: 'goose-extractor 1.0.25 : Python Package Index'
},
{
url: 'https://toster.ru/q/337511',
text: 'Как добавлять комментарии в Instagram без api? — Toster.ru'
},
{
url: 'https://www.youtube.com/watch?v=BEbAhwyQeOM',
text: 'Continued Work on Goose\'s Parser - YouTube'
},
{
url: 'https://godoc.org/github.com/advancedlogic/GoOse',
text: 'goose - GoDoc'
},
{
url: 'http://blog.reddikh.com/goose-parser/',
text: 'Goose parser |'
},
{
url: 'https://www.kth.se/social/upload/538599b1f27654141f4cc333/Master',
text: 'Development of a library to generate and parse IEC 61850-90-5 ... - KTH'
},
{
url: 'http://nullege.com/codes/search/goose.parsers.Parser',
text: 'goose.parsers.Parser - Nullege Python Samples'
}
] |
TypeError: _gooseParser.PhantomEnvironment is not a constructor
I look at the imported entities and both of them is undefined.
If I write:
import Parser from 'goose-parser'
It return
[Function: Parser]
But where I can find PhantomEnvironment???
The text was updated successfully, but these errors were encountered: