Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Casper.download() not working correctly with binaries #73

Closed
n1k0 opened this Issue · 27 comments

8 participants

@n1k0
Owner

From someone having reported the issue privately by email:

casper.download is supposed to make the job done.

but in my try , the casper.download() works weirdly and the saved
image files are all broken.

I made a sample code to show the download issue . I have run the following code on windows xp 32 bits with phantomjs 1.4.1 & caserjs 0.6.4.

I use casperjs.download() & casperjs.captureSelector() to download the same image file.
captureSelector gives good image file.download gives broken image file

phantom.casperPath = 'E:/casperjs';
var casperjsFile = phantom.casperPath + '/bin/bootstrap.js';
var ret = phantom.injectJs(casperjsFile);
if (ret) {
       console.log("load capserjs successfully");
       var casper = require("casper").create( {
               verbose : true,
               logLevel : 'info'
       });
} else {
       console.log("load failed");
}

var logo = null;
casper.start('http://www.baidu.com/', function() {
       logo = this.evaluate(function() {
               var imgUrl = document.querySelector('img').getAttribute('src');
               var title = document.title;

               console.log("title="+title);
               return title;
       });

       // a.jpg will be a broken image file
       this.wait(2000,function() {
               casper.echo ("start downloading");
               this.download("http://www.baidu.com/img/baidu_sylogo1.gif","a.jpg");
               this.echo("finish download");
       });


   // b.jpg is a good image file
       this.captureSelector("b.gif","img[usemap='#mp']");

});

console.log("ready to go");
casper.run(function() {
       //this.exit();
});
@n1k0
Owner

As phantomjs 1.5 now ships with a WebKit version providing Uint8Array, this will solve the issue.

@n1k0 n1k0 was assigned
@xpepermint

Hey, any workarounds?

@n1k0
Owner

I'm still struggling with writing base64 encoded contents onto the filesystem using native phantomjs' fs module.

I'm more and more thinking that this should be solved by the c++ side of things in phantomjs rather than hacking around in casperjs… stay tuned though.

@xpepermint

Thanks for your answer @n1k0!

@timbunce

Untested, but this might help:

Change: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'w');
To: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'wb'); // note the 'b' flag

@n1k0
Owner

Works in at least 80% of cases, good enough right now.

@n1k0 n1k0 closed this in e19d77e
@izumeroot

Hello! It does not work again. casper.download() saves images as zero sized.
casperjs --version
1.0.2

@n1k0
Owner
@n1k0
Owner

Works for me with both casper 1.0.2 and master… what version of casper and phantom are you using? on which platform?

Edit: also, pasting the encountered error or a stack trace would eventually help.

@izumeroot

$ phantomjs --version
1.8.1
$ casperjs --version
1.0.2
Platform: OpenSUSE Linux 12.2 32bit.

I have not any errors in console. Just I have zero sized images as result :(

@n1k0
Owner

Are you trying to download things over SSL? If so,you may want to try to use the --ignore-ssl-errors option

@izumeroot

In my example i did not use https
var url = 'http://google.ru/logos/2013/fyodor_shalyapins_140th_birthday-1047005-hp.jpg';
this.download(url, 'test.jpg');

@izumeroot

Is there another way to save image in filesystem?

@n1k0
Owner

Nope. That's a strange issue I unfortunately can't investigate until I can reproduce it :/

@izumeroot

May be there is a way to see additional info (errors or another)? In linux console I have not errors

@hexid
Collaborator

I had been running into this issue when trying to download some images, however I found that it was due to the images being on a subdomain of the page I was viewing. The script below shows an example of the problem.
I'm not sure if this is the same problem that is being experienced here, however they could be connected.

var casper = require('casper').create();
var img = 'http://i.imgur.com/rvNBmlf.gif';

casper.start();

casper.thenOpen('http://i.imgur.com/', function() { // the sub-domain of the image
  this.download(img, 'Success.gif', 'GET');
});
casper.thenOpen('http://imgur.com/', function() { // the domain the image was found
  this.download(img, 'Failed.gif', 'GET');
});
casper.thenOpen(img, function() { // the image
  this.download(img, 'Success2.gif', 'GET');
});

casper.run(function() {
  this.echo('Finished downloading.');
  this.exit();
});

Tested using CasperJS 1.0.2 and PhantomJS 1.8.1

@n1k0
Owner

In this case, could using the web-security=no option solve the issue?

@hexid
Collaborator

That did it.

Also, it should pointed out that the pageSettings.webSecurityEnabled option is currently missing from the API.

@izumeroot

Hello, hexid!
Your example works for me! I mean Success.gif and Success2.gif were loaded correctly and Failed.gif was loaded as zero-sized.

@izumeroot

I tested my examples and they work fine with web-security=false! Thank you!

@FergusNelson

I am also hitting this issue. Here are some more details.
phantomjs --version
1.9.1
casperjs --version
1.0.2

scren-capture.js

var casper = require("casper").create({
    viewportSize: {
        width: 1024,
        height: 768
    }, 
    pageSettings: {
        webSecurityEnabled: false
    },
    verbose: true,
    loglevel: 'debug'
});

var address = casper.cli.get(0);
var output       = casper.cli.get(1);

if (!address || !output || !/\.(png|jpg|pdf)$/i.test(output)) {
    casper
        .echo("Usage: $ casperjs screen-capture.js <address> <output.[jpg|png|pdf]>")
        .exit(1)
    ;
}

casper.start(address, function(status) {
    if (status !== 'success') {
        casper.echo(casper.page.settings.webSecurityEnabled);
        this.download(address, output +'.binary');
    } else {
    this.waitForSelector(".stream-container", (function() {
        this.captureSelector(filename, "html");
        this.echo("Saved screenshot of " + (this.getCurrentUrl()) + " to " + filename);
    }), (function() {
        this.die("Timeout reached. Fail whale?");
        this.exit();
    }), 12000);
    }
});

casper.run();

command 
c:\Program Files\casper\samples>casperjs --web-security=no screen-capture.js  ht
tp://www.elliottmarketingpr.com/wp-content/uploads/2012/07/Foodservice-Europe-Go
urmet-Burger-UK-by-Katie-Dunne1.pdf out.png
false
[error] [remote] getBinary(): Error while fetching http://www.elliottmarketingpr
.com/wp-content/uploads/2012/07/Foodservice-Europe-Gourmet-Burger-UK-by-Katie-Du
nne1.pdf: Error: NETWORK_ERR: XMLHttpRequest Exception 101
@n1k0
Owner

@FergusNelson have you tried using the --web-security=no CLI option or the webSecurityEnabled setting as suggested above?

@FergusNelson

@n1k0 Yes I am using that command line option. See above for the exact command that I am running. I also added some console out for "casper.page.settings.webSecurityEnabled", which is the "false" output line above, so it is getting set correctly, but also still throwing an error.

@hellojinjie

I also encountered this issue.
With
pageSettings: {
webSecurityEnabled: false
}
Problem solved.

thx

@sdakuri sdakuri referenced this issue in hdxsfbr/coursera-downloader
Merged

Bugfix - Downloaded files were of 0 bytes. #2

@pasht

I'm trying to download some video files for my coursera account with no luck. The files get created on my disk but their length is zero. After logging and getting the correct links, here is the code that I'm using

casper.thenOpen('https://class.coursera.org/mmds-001/lecture',function(){

this.waitUntilVisible('div[class="course-lectures-list"]', function(){

links=this.evaluate(getLinks)
this.eachThen(links,function(response){
this.echo('Downloading '+response.data.filename)
this.download(response.data.link,'./'+response.data.filename+'.mp4')
})
})
})
I ve used the -web-security=no CLI option as suggested above with no success !!!
I believe that the Coursera is hosted at Amazon. Any thoughts ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.