Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looping through pages when next is available #402

Closed
SantoshSrinivas79 opened this issue Dec 22, 2015 · 6 comments
Closed

Looping through pages when next is available #402

SantoshSrinivas79 opened this issue Dec 22, 2015 · 6 comments

Comments

@SantoshSrinivas79
Copy link

Hi All,

What would be the best way to keep looping the browsing while a "next" link is available and then return the result to the main process.

The below example returns the first search result for the first page. How to result all first links from each search page?

// Run using node --harmony yahoo.js

var Nightmare = require('nightmare');
var vo = require('vo');

vo(function* () {
  var nightmare = Nightmare({ show: true });
  var link = yield nightmare
    .goto('http://yahoo.com')
    .type('input[title="Search"]', 'github nightmare')
    .click('#UHSearchWeb')
    .wait('a.next')
    .evaluate(function () {
      var links = document.querySelectorAll("a.td-u");
      return links[0].href;
    });
  yield nightmare.end();
  return link;
})(function (err, result) {
  if (err) return console.log(err);
  console.log(result);
});
@rosshinkley
Copy link
Contributor

A hastily thrown together sample:

var Nightmare = require('nightmare');                                                                                                                                                                      
var vo = require('vo');                                                                                                                                                                                    

vo(run)(function(err, result) {                                                                                                                                                                            
    if (err) throw err;                                                                                                                                                                                    
});                                                                                                                                                                                                        

function* run() {                                                                                                                                                                                          
    var nightmare = Nightmare(),                                                                                                                                                                           
        MAX_PAGE = 10,                                                                                                                                                                                     
        currentPage = 0,                                                                                                                                                                                   
        nextExists = true,                                                                                                                                                                                 
        links = [];                                                                                                                                                                                        


    yield nightmare                                                                                                                                                                                        
        .goto('https://www.yahoo.com')                                                                                                                                                                     
        .type('.input-query', 'github nightmare')                                                                                                                                                          
        .click('#search-submit')                                                                                                                                                                           
        .wait('body')                                                                                                                                                                                      

    nextExists = yield nightmare.visible('.next');                                                                                                                                                         
    while (nextExists && currentPage < MAX_PAGE) {                                                                                                                                                         
        links.push(yield nightmare                                                                                                                                                                         
            .evaluate(function() {                                                                                                                                                                         
                var links = document.querySelectorAll("ol.searchCenterMiddle a");                                                                                                                          
                return links[0].href;                                                                                                                                                                      
            }));                                                                                                                                                                                           

        yield nightmare                                                                                                                                                                                    
            .click('.next')                                                                                                                                                                                
            .wait('body')                                                                                                                                                                                  

        currentPage++;                                                                                                                                                                                     
        nextExists = yield nightmare.visible('.next');                                                                                                                                                     
    }                                                                                                                                                                                                      

    console.dir(links);                                                                                                                                                                                    
    yield nightmare.end();                                                                                                                                                                                 
}                                                                                                                                                                                                          

You could, of course, remove the MAX_PAGE guard, but if you're searching something more popular, the script would take much longer to complete.

@SantoshSrinivas79
Copy link
Author

Thank you @rosshinkley

@misbach
Copy link

misbach commented Oct 26, 2017

@rosshinkley I can't get your script to run. It gives the following error: "Cannot read property 'focus' of null". Does it still run for you?

@misbach
Copy link

misbach commented Oct 26, 2017

Nevermind, I just had to update the script a bit.

vo(run)(function(err, result) {
  if (err) throw err;
});

function* run() {
  var MAX_PAGE = 2;
  var currentPage = 0;
  var nextExists = true;
  var links = [];

  yield nightmare
    .goto('https://www.yahoo.com')
    .type('input#uh-search-box', 'bitcoin')
    .click('button#uh-search-button')
    .wait('div#main')

  nextExists = yield nightmare.visible('.next');

  while (nextExists && currentPage < MAX_PAGE) {
    links.push(yield nightmare
      .evaluate(function() {
        return document.querySelectorAll(".lh-24")[0].href;
      }));

    yield nightmare
      .click('.next')
      .wait('body')

    currentPage++;
    nextExists = yield nightmare.visible('.next');
  }

  console.dir(links);
  yield nightmare.end();
}

@Macxim
Copy link

Macxim commented Oct 31, 2017

@misbach I'm still missing the last page, even with the updated script.

@globalkonvict
Copy link

This was helpful Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants