Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't scrape airplane prices #142

Open
aragar opened this issue Jan 10, 2017 · 9 comments
Open

Can't scrape airplane prices #142

aragar opened this issue Jan 10, 2017 · 9 comments

Comments

@aragar
Copy link

aragar commented Jan 10, 2017

I am trying to get airplane ticket prices between some two cities from https://www.air.bg/en

I started with the following code:

osmosis
  .get('https://www.air.bg/')
  .submit('#amadeus_book', {
    /*depart*/B_LOCATION_1:  "SOF",
    /*dest*/E_LOCATION_1:  "VAR",
    TRIP_TYPE:  "R" /*round trip*/,
    /*from*/B_DATE_1:  "201704140000",
    /*from +/-*/DATE_RANGE_VALUE_1:  0,
    /*to +/-*/DATE_RANGE_VALUE_2:  0,
    /*to*/B_DATE_2:  "201704170000",
    CABIN:  "E" /*Economic*/,
    /*adults*/ADTPAX:  1
  })

From then on, I can't get any more information. Every time I try .find(SOMETHING) I receive no results for ...

Can somebody help me how to continue and get some information from the next page with the results ?

@taylorsmcclure
Copy link

@aragar I have the same issue in #141

I am also attempting to scrape an airline page. I can see you are running into the issue as me. The site you are scraping has AJAX (xhr requests) in-between loading the actual ticket price results. There are some suggestions provided in #81 however none of those have worked with my case. I would say give those a shot, maybe you will have better luck.

I think the key to solving this is utilizing https://github.com/rchipka/node-libxmljs-dom to simulate a browser, but I have not been able to implement that correctly.

@bchr02
Copy link

bchr02 commented Jan 10, 2017 via email

@taylorsmcclure
Copy link

@bchr02 That library looks rather spiffy. I think I will give it a shot! Thank you for the input.

@aragar
Copy link
Author

aragar commented Jan 11, 2017

@taylorsmcclure thanks for the link. I've tried them already, but with no success. I will try the idea with libxmljs
@bchr02 I've already seen the nightmare, but only the .wait method and I wasn't sure how to implement it with osmosis. But maybe I didn't need to. I will try with nightmare only.

@taylorsmcclure
Copy link

taylorsmcclure commented Jan 11, 2017

@aragar I played around with nightmare yesterday and it solved the issue I was having with AJAX. I think I will continue to use that library for my project. If you are running in a headless environment check out this segment-boneyard/nightmare#224

Here is my proof of concept using nightmare: https://gist.github.com/taylorsmcclure/76d1ecd7f999b009f6b4f8c03c600a97

It's a shame libxmljs-dom and osmosis doesn't seem to work for my use-case. Comparatively osmosis is much more lightweight. With nightmare you need to emulate a screen with xvfb. I am not sure if that will scale well with what I am trying to do... time will tell...

@aragar
Copy link
Author

aragar commented Jan 11, 2017

Wow, thank you very much @taylorsmcclure. This is really helpful. I agree that osmosis looked better :\ I hope @rchipka could help us with this.

@aragar
Copy link
Author

aragar commented Jan 17, 2017

I tried the advices in #81.

  • I added .click("#main-layout-header"). The result was that it started executing some js scripts, but it ended with errors in some of them
    image
  • I tried then .then((window) => { osmosis.find('.availability-bound-0') }). I think the result was the same.

@create-account
Copy link

I too have the same issue. I too Tried what @aragar mentioned in his comment, but I too had the issue persisting. Hope someone can help with this issue. I prefer to use osmosis compared to the much heavier nightmarejs (Although I got it to work with it).

@aragar
Copy link
Author

aragar commented Sep 22, 2017

Hello, guys, I've restarted the project after some huge delay and I've realised after reading the Osmosis source code, that the result from the submit (which is a post request) is actually strip from the available data. You need to reconfigure the Osmosis to be able to see it.

Here is the code I use for getting the result with WizzAir:

osmosis
  .get('https://wizzair.com/')
  .config('keep_data', true)
  .config('parse_response', true)
  .post('https://be.wizzair.com/7.2.1/Api/search/search', {
	  adultCount : 1,
	  childCount : 0,
	  flightList : {
		  0 : { arrivalStation : 'VAR', departureDate : '2017-09-27', departureStation : 'SOF' },
		  1 : { arrivalStation : 'SOF', departureDate : '2017-09-29', departureStation : 'VAR' }
	  },
	  infantCount : 0,
	  isFlighChange : false,
	  isSeniorOrStudent : false,
	  rescueFareCode : '',
	  wdc : false
  })
  .then(function(document) {
	  console.log(document.response.data)
  })

As a result, you have the response of the query, saved in the response field of the current context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants