Skip to content
This repository

Casper.download() not working correctly with binaries #73

Closed
n1k0 opened this Issue March 23, 2012 · 26 comments

7 participants

Nicolas Perriault Kristijan Tim Bunce izumeroot Nick Currier FergusNelson hellojinjie
Nicolas Perriault
Owner
n1k0 commented March 23, 2012

From someone having reported the issue privately by email:

casper.download is supposed to make the job done.

but in my try , the casper.download() works weirdly and the saved
image files are all broken.

I made a sample code to show the download issue . I have run the following code on windows xp 32 bits with phantomjs 1.4.1 & caserjs 0.6.4.

I use casperjs.download() & casperjs.captureSelector() to download the same image file.
captureSelector gives good image file.download gives broken image file

phantom.casperPath = 'E:/casperjs';
var casperjsFile = phantom.casperPath + '/bin/bootstrap.js';
var ret = phantom.injectJs(casperjsFile);
if (ret) {
       console.log("load capserjs successfully");
       var casper = require("casper").create( {
               verbose : true,
               logLevel : 'info'
       });
} else {
       console.log("load failed");
}

var logo = null;
casper.start('http://www.baidu.com/', function() {
       logo = this.evaluate(function() {
               var imgUrl = document.querySelector('img').getAttribute('src');
               var title = document.title;

               console.log("title="+title);
               return title;
       });

       // a.jpg will be a broken image file
       this.wait(2000,function() {
               casper.echo ("start downloading");
               this.download("http://www.baidu.com/img/baidu_sylogo1.gif","a.jpg");
               this.echo("finish download");
       });


   // b.jpg is a good image file
       this.captureSelector("b.gif","img[usemap='#mp']");

});

console.log("ready to go");
casper.run(function() {
       //this.exit();
});
Nicolas Perriault
Owner
n1k0 commented March 23, 2012

As phantomjs 1.5 now ships with a WebKit version providing Uint8Array, this will solve the issue.

Kristijan

Hey, any workarounds?

Nicolas Perriault
Owner
n1k0 commented May 28, 2012

I'm still struggling with writing base64 encoded contents onto the filesystem using native phantomjs' fs module.

I'm more and more thinking that this should be solved by the c++ side of things in phantomjs rather than hacking around in casperjs… stay tuned though.

Kristijan

Thanks for your answer @n1k0!

Tim Bunce

Untested, but this might help:

Change: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'w');
To: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'wb'); // note the 'b' flag

Nicolas Perriault
Owner
n1k0 commented June 03, 2012

Works in at least 80% of cases, good enough right now.

Nicolas Perriault n1k0 closed this in e19d77e June 03, 2012
izumeroot

Hello! It does not work again. casper.download() saves images as zero sized.
casperjs --version
1.0.2

Nicolas Perriault
Owner
Nicolas Perriault
Owner

Works for me with both casper 1.0.2 and master… what version of casper and phantom are you using? on which platform?

Edit: also, pasting the encountered error or a stack trace would eventually help.

izumeroot

$ phantomjs --version
1.8.1
$ casperjs --version
1.0.2
Platform: OpenSUSE Linux 12.2 32bit.

I have not any errors in console. Just I have zero sized images as result :(

Nicolas Perriault
Owner

Are you trying to download things over SSL? If so,you may want to try to use the --ignore-ssl-errors option

izumeroot

In my example i did not use https
var url = 'http://google.ru/logos/2013/fyodor_shalyapins_140th_birthday-1047005-hp.jpg';
this.download(url, 'test.jpg');

izumeroot

Is there another way to save image in filesystem?

Nicolas Perriault
Owner

Nope. That's a strange issue I unfortunately can't investigate until I can reproduce it :/

izumeroot

May be there is a way to see additional info (errors or another)? In linux console I have not errors

Nick Currier
Collaborator

I had been running into this issue when trying to download some images, however I found that it was due to the images being on a subdomain of the page I was viewing. The script below shows an example of the problem.
I'm not sure if this is the same problem that is being experienced here, however they could be connected.

var casper = require('casper').create();
var img = 'http://i.imgur.com/rvNBmlf.gif';

casper.start();

casper.thenOpen('http://i.imgur.com/', function() { // the sub-domain of the image
  this.download(img, 'Success.gif', 'GET');
});
casper.thenOpen('http://imgur.com/', function() { // the domain the image was found
  this.download(img, 'Failed.gif', 'GET');
});
casper.thenOpen(img, function() { // the image
  this.download(img, 'Success2.gif', 'GET');
});

casper.run(function() {
  this.echo('Finished downloading.');
  this.exit();
});

Tested using CasperJS 1.0.2 and PhantomJS 1.8.1

Nicolas Perriault
Owner

In this case, could using the web-security=no option solve the issue?

Nick Currier
Collaborator

That did it.

Also, it should pointed out that the pageSettings.webSecurityEnabled option is currently missing from the API.

izumeroot

Hello, hexid!
Your example works for me! I mean Success.gif and Success2.gif were loaded correctly and Failed.gif was loaded as zero-sized.

izumeroot

I tested my examples and they work fine with web-security=false! Thank you!

FergusNelson

I am also hitting this issue. Here are some more details.
phantomjs --version
1.9.1
casperjs --version
1.0.2

scren-capture.js

var casper = require("casper").create({
    viewportSize: {
        width: 1024,
        height: 768
    }, 
    pageSettings: {
        webSecurityEnabled: false
    },
    verbose: true,
    loglevel: 'debug'
});

var address = casper.cli.get(0);
var output       = casper.cli.get(1);

if (!address || !output || !/\.(png|jpg|pdf)$/i.test(output)) {
    casper
        .echo("Usage: $ casperjs screen-capture.js <address> <output.[jpg|png|pdf]>")
        .exit(1)
    ;
}

casper.start(address, function(status) {
    if (status !== 'success') {
        casper.echo(casper.page.settings.webSecurityEnabled);
        this.download(address, output +'.binary');
    } else {
    this.waitForSelector(".stream-container", (function() {
        this.captureSelector(filename, "html");
        this.echo("Saved screenshot of " + (this.getCurrentUrl()) + " to " + filename);
    }), (function() {
        this.die("Timeout reached. Fail whale?");
        this.exit();
    }), 12000);
    }
});

casper.run();

command 
c:\Program Files\casper\samples>casperjs --web-security=no screen-capture.js  ht
tp://www.elliottmarketingpr.com/wp-content/uploads/2012/07/Foodservice-Europe-Go
urmet-Burger-UK-by-Katie-Dunne1.pdf out.png
false
[error] [remote] getBinary(): Error while fetching http://www.elliottmarketingpr
.com/wp-content/uploads/2012/07/Foodservice-Europe-Gourmet-Burger-UK-by-Katie-Du
nne1.pdf: Error: NETWORK_ERR: XMLHttpRequest Exception 101
Nicolas Perriault
Owner
n1k0 commented June 24, 2013

@FergusNelson have you tried using the --web-security=no CLI option or the webSecurityEnabled setting as suggested above?

FergusNelson

@n1k0 Yes I am using that command line option. See above for the exact command that I am running. I also added some console out for "casper.page.settings.webSecurityEnabled", which is the "false" output line above, so it is getting set correctly, but also still throwing an error.

hellojinjie

I also encountered this issue.
With
pageSettings: {
webSecurityEnabled: false
}
Problem solved.

thx

Shashidhar Dakuri sdakuri referenced this issue in hdxsfbr/coursera-downloader October 07, 2013
Merged

Bugfix - Downloaded files were of 0 bytes. #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.