Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

js9Helper / fits2fits woes #5

Closed
o-smirnov opened this issue Jul 4, 2018 · 106 comments
Closed

js9Helper / fits2fits woes #5

o-smirnov opened this issue Jul 4, 2018 · 106 comments

Comments

@o-smirnov
Copy link

Sorry @ericmandel, I'm struggling to get fits2fits to work, or wrap my head around it.

Firstly, a conceptual problem. Here you say "the browser knows to ask the js9 helper to extract a smaller file for display from the original file (which is one the server)", but in this comment you tell me to "...Using js9prefs.js in Jupyter, you'll need to turn off the helper". So if I do that, who's the browser going to ask for a smaller file?

I have tried to re-enable the helper -- but when I have JS9.Load("my_fits_file.fits", {fits2fits:true}) in my script, I get this on the console:

js9.min.js:410 Uncaught TypeError: Cannot read property 'image' of undefined
    at Object.b.fits2RepFile (VM193 js9.min.js:410)
    at Object.Load (VM193 js9.min.js:587)
    at loadIm (3C147-CD-LO-spw0-s7-lwimager.fullrest.js9-inline.html:44)
b.fits2RepFile @ js9.min.js:410
(anonymous) @ js9.min.js:587
loadIm @ 3C147-CD-LO-spw0-s7-lwimager.fullrest.js9-inline.html:44
setTimeout (async)
(anonymous) @ 3C147-CD-LO-spw0-s7-lwimager.fullrest.js9-inline.html:43

A simple JS9.Load("my_fits_file.fits") works fine.

@ericmandel
Copy link

We're getting confused between different conversations. You definitely need a helper to be running. Advice to turn off the helper at various points in the conversation was part of testing different aspects of the Tornado problem, trying to see what worked and what did not work.

I've just updated GitHub, and will write about how far I've gotten in the appropriate issue.

@o-smirnov
Copy link
Author

o-smirnov commented Jul 5, 2018

Good news is, I've got the helper working (note to others: the nodejs package in Ubuntu 16.04 is too old to run it right. Need to install an updated version from https://nodejs.org. Ubuntu 18.04 already has a sufficiently new version, I think). I can upload files. So far so good.

But, I can't get fits2fits to work as intended. I'm following https://js9.si.edu/js9/help/repfile.html, and @ericmandel's comment here.

So firstly https://js9.si.edu/js9/help/repfile.html is a bit incorrect, it says to modify js9Prefs.json, but in fact j9prefs.js also needs the "image" option set (or is the latter meant to be auto-generated from the former somehow?)

Having set things up like this:

{
  "globalOpts":
  {
    "helperType":      "nodejs",
    "helperPort":       2718,
    "helperCGI":        "./cgi-bin/js9/js9Helper.cgi",
    "fits2png":         false,
    "debug":            10,
    "loadProxy":        true,
    "useWasm":          false,
    "workDir":          ".radiopadre",
    "workDirQuota":     500,
    "dataPath":         ".",
    "analysisPlugins":  "./analysis-plugins",
    "analysisWrappers": "./analysis-wrappers",
    "fits2fits":        "always",
    "image":
    {
      "xdim": 256,
      "ydim": 256,
      "bin":  1
    }
  },
  "imageOpts":
  {
    "colormap":       "grey",
    "scale":          "linear"
  }
}

I then expect JS9.Load("/./3C147-CD-LO-spw0-s7-lwimager.fullrest.fits", {fits2fits:true}); on the client to only fetch a tiny representation file. But I can see in the network console that the entire 129MB original FITS file is being fetched:

image

...although what then gets displayed is the 256x256 cutout.

What am I missing here?

@o-smirnov
Copy link
Author

Following the example URLs here https://js9.si.edu/js9/js9large.html, I've tried

JS9.Load("/./3C147-CD-LO-spw0-s7-lwimager.fullrest.fits",{fits2fits:true,xcen:2048,ycen:1024,xdim:256,ydim:256,bin:4})

but I still get a full download. If I check the log of the helper process, I see it chattering about fits2fits:

  engine:ws received "421["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"}]" +877ms
  engine:socket packet +1ms
  socket.io-parser decoded 21["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"}] as {"type":2,"nsp":"/","id":1,"data":["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"}]} +0ms
  socket.io:socket got packet {"type":2,"nsp":"/","id":1,"data":["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"}]} +0ms
  socket.io:socket emitting event ["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"}] +0ms
  socket.io:socket attaching ack callback to event +0ms
  socket.io:socket dispatching an event ["fits2fits",{"sect":"256@2048,256@1024,4","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":".:.:/.../../"},null] +0ms
  socket.io:socket sending ack [{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}] +0ms

...but I don't observe the intended effect. Nor is there any sign of activity in the working directory server-side.

@ericmandel
Copy link

In the node log, you should see an exec command like this:

exec: /Users/eric/js9/analysis-wrappers/js9Xeq [imsection,-parent,fits/casa.fits[events][energy=3000:7000],700@3888,600@4098,2]  [2018-7-5 09:43:33]

Its presence would tell us that node exec'ed the imsection command, which runs the script:

analysis-wrappers/js9Xeq

with the specified arguments.

Once you have verified that js9Xeq was run, the easiest thing to do is to edit js9Xeq and uncomment the lines at the top (edit the path of the file as needed):

# set -x
# exec 2>$HOME/foo.log

The foo.log file will tell us what the script is doing. Last time I debugged this remotely, it turned out to be a user PATH problem ...

@o-smirnov
Copy link
Author

But I don't see the exec command at all. Here's the fuller log:

  engine:ws received "421["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]" +1s
  engine:socket packet +0ms
  socket.io-parser decoded 21["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}] as {"type":2,"nsp":"/","id":1,"data":["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]} +0ms
  socket.io:socket got packet {"type":2,"nsp":"/","id":1,"data":["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]} +0ms
  socket.io:socket emitting event ["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}] +0ms
  socket.io:socket attaching ack callback to event +0ms
  socket.io:socket dispatching an event ["fits2fits",{"sect":"256,256,1","fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","parent":true,"cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"},null] +0ms
  socket.io:socket sending ack [{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}] +1ms
  socket.io:client writing packet {"id":1,"type":3,"data":[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} +0ms
  socket.io-parser encoding packet {"id":1,"type":3,"data":[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} +0ms
  socket.io-parser encoded {"id":1,"type":3,"data":[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} as 31[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}] +0ms
  engine:socket sending packet "message" (31[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}]) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "431[{"stdout":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","stderr":null,"errcode":0,"encoding":"ascii"}]" +0ms
  engine:ws received "422["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]" +8s
  engine:socket packet +0ms
  socket.io-parser decoded 22["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}] as {"type":2,"nsp":"/","id":2,"data":["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]} +0ms
  socket.io:socket got packet {"type":2,"nsp":"/","id":2,"data":["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}]} +0ms
  socket.io:socket emitting event ["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"}] +0ms
  socket.io:socket attaching ack callback to event +0ms
  socket.io:socket dispatching an event ["image",{"image":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits","cookie":"_xsrf=2|89506d0f|3a80c841be4330ee255e237b763d1d24|1530023674","dataPath":"$HOME/Desktop:$HOME/data:.:/.../../"},null] +0ms
  socket.io:socket sending ack [{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}] +1ms
  socket.io:client writing packet {"id":2,"type":3,"data":[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} +0ms
  socket.io-parser encoding packet {"id":2,"type":3,"data":[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} +0ms
  socket.io-parser encoded {"id":2,"type":3,"data":[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}],"nsp":"/"} as 32[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}] +0ms
  engine:socket sending packet "message" (32[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}]) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "432[{"stdout":null,"stderr":null,"errcode":0,"encoding":"ascii"}]" +0ms

@o-smirnov
Copy link
Author

BTW, this is bypassing Tornado again and just using SimpleHTTPServer. (I've ran into more Tornado problems, on which I'll comment on the other thread...)

@ericmandel
Copy link

ericmandel commented Jul 5, 2018

Sorry, I'm talking about the console output from the node command itself:

node js9Helper.js

The question is whether node got the imsection command and exec'ed the script to process it.
It looks like node decided not to exec the imsection command, and that would be good to know.

@o-smirnov
Copy link
Author

No, the node is definitely not exec'ing the imsection command... (The log above is from running DEBUG='*' nodejs js9Helper.js, so it contains both the packet log and the console output).

Funnily enough, in the "Analysis" menu, I have the "upload FITS to make tasks available" option, which I can click on, to which the node responds with:

exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [quotacheck]  [2018-7-5 16:14:42]
exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [uploadfits,__3C147-CD-LO-spw0-s7-lwimager.fullrest.fits]  [2018-7-5 16:14:43]

...and then an upload commences, and server-side analysis becomes available. But this is kind of perverse: the FITS file starts out server-side, the browser downloaded it, and now it's sent it back to the server....

But the js9Xeq script is definitely there and working. It's just not getting executed when the initial image is loaded...

@ericmandel
Copy link

Well, node looks for the FITS file and if it cannot find it, it lets the client deal with file:

	myenv.JS9_DATAPATH = getDataPath(obj.dataPath);
	s = getFilePath(obj.fits, myenv.JS9_DATAPATH, myenv);
	if( !s ){
	    // did not find file, let js9 take care of it
	    if( cbfunc ){
		res.stdout = obj.fits;
		cbfunc(res);
	    }
	    return;

Where is this data file located relative to the JS9 install directory?

"fits":"/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits"

Probably node did not find the data file ... which might be due to confusion on my part about where to look and what the paths mean ...

@ericmandel
Copy link

No, you definitely should not have to upload the FITS file. But what is happening is that the uploaded file is being stored in a place the helper can find easily.

@ericmandel
Copy link

So you could try removing the initial "/" from the path of the FITS file ...

@o-smirnov
Copy link
Author

Where is this data file located relative to the JS9 install directory?

Bingo, that's exactly the problem. If I take the leading slash off the file name, things start to work.

So I see where the confusion arises, I'm just not quite sure how to work it out properly yet. Let's say the FITS file lives in ~/data. When I start Jupyter (and SimpleHTTPServer) under ~/data, they interpret the URLs the browser asks for relative to that directory. To them, the leading slash actually means ~/data. But obviously not to the helper (which I also run under ~/data in this instance).

@ericmandel
Copy link

Yes, this has caused me endless confusion and obviously I don't have it right yet. So I suggest leaving the "/" prefix off for now, and I'll look into it ... once again.

The problem is that the web page is not always in the same location as the node server, and I try really hard to correct the path relative to node -- when node is processing that data file.

@o-smirnov
Copy link
Author

OK, without the slash I get into a new problem. Playing with the Bin/Filter/Section plugin, I quickly break it with:

image

And the corresponding console line:

exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [imsection,/home/oms/radiopadre/js9-www/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits,256@2047,256@2047,1,,]  [2018-7-5 16:38:35]

In this case, /home/oms/radiopadre/js9-www/ is my JS9 web install dir. So obviously it's gotten prepended to the filename somewhere along the line.

@ericmandel
Copy link

Hmmm ... this might be a simple bug in js9Helper.js, in which it is processing "/" differently from the browser. Can you please make the change below in js9Helper.js, go back to using "/" as a prefix, and let me know if it works? If it does (and it should), I have to make a bunch of tests before updating.

screen shot 2018-07-05 at 10 52 40 am

@ericmandel
Copy link

exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [imsection,/home/oms/radiopadre/js9-www/3C147-CD-LO-spw0-s7-lwimager.fullrest.fits,256@2047,256@2047,1,,] [2018-7-5 16:38:35]

What is the actual directory in which the FITS file resides? Is it /home/oms/data?

@o-smirnov
Copy link
Author

What is the actual directory in which the FITS file resides? Is it /home/oms/data?

/home/oms/RP-3C147. Where I speak of ~/data above, I mean ~/RP-3C147. (Sorry -- I should've used the correct name to begin with to avoid confusion.)

I'll try your patch momentarily.

@ericmandel
Copy link

To them, the leading slash actually means ~/data. But obviously not to the helper (which I also run under ~/data in this instance).

and:

In this case, /home/oms/radiopadre/js9-www/ is my JS9 web install dir. So obviously it's gotten prepended to the filename somewhere along the line.

The helper should be running in the js9 install directory, where the node_modules reside. Is that not the case here? I'm confused ...

@o-smirnov
Copy link
Author

With the patch, and with a leading slash, I'm getting a "Can't find FITS file '3C147-CD-LO-spw0-s7-lwimager.fullrest.fits'" dialog.

I'm pretty sure the problem is, the helper is not even looking in the directory that I originally run it in (~/RP-3C147).

The helper should be running in the js9 install directory, where the node_modules reside. Is that not the case here? I'm confused ...

No, I'm launching it in ~/RP-3C147, where the FITS files reside:

oms@jakob:~/RP-3C147$ nodejs ~/radiopadre/js9-www/js9Helper.js 

I guess I'm trying to use it in a way that you haven't intended! Is it the case that the helper figures out where it's been installed and chdirs into there before doing anything else? If that's the case, I understand why it's not working...

@ericmandel
Copy link

Yes, JS9 figures out the path from the web page to the js9 install dir, where it is assumed that the helper is running. It uses this info to fix up the paths of the data files when they are processed by the helper (since they are specified relative to the web page).

Let me think about this for a bit. We should be able to make it work.

@ericmandel
Copy link

But before I tear JS9 apart, could you please try this patch of js9Helper.js? I see that the previous patch did not quite handle the "/" properly:

screen shot 2018-07-05 at 11 33 29 am

@o-smirnov
Copy link
Author

Yep, exactly. If I add $HOME/RP-3C147 to globalOpts.dataDir in js9Prefs.json, the helper is able to find my FITS files again (with a leading slash in the name, and with your patch in place).

I think we're running into philosophical differences here. You designed it for the scenario of a single webserver and a single helper process serving multiple users, correct? Whereas in the Jupyter use case, each user essentially runs their own private Tornado webserver (or even multiple Tornado servers, if they launch multiple notebooks in different directories). This does not align well with the "single global JS9 install" philosophy.

(The discussion in ericmandel/js9#38 (comment) is related to this -- I wanted per-user helper processes for similar reasons...)

In principle I can work around this by generating a custom js9Prefs.json on-the-fly per each Jupyter session, setting up a fake custom JS9 install dir, etc... Very kludgy... it would be easier if you could either:

  • add a command-line option to the helper to specify an additional dataDir on-the-fly

  • have the helper look in the directory it was started in

For full Jupyter integration, there's going to be an additional problem. If I start a notebook under ~/x/, then in Tornado, the file ~/x/y/z.fits is actually http://localhost:8888/files/y/z.fits. So the helper would have to engage in some URL rewriting: if the browser is trying to load /files/y/z.fits, it should know to look at ~/x/y/z.fits.

@o-smirnov
Copy link
Author

But before I tear JS9 apart, could you please try this patch of js9Helper.js?

Tried it. Still getting the "Can't find FITS file" dialog...

exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [imsection,-parent,3C147-CD-LO-spw0-s9-lwimager.fullrest.fits,2048,2048,2]  [2018-7-5 17:44:49]

@ericmandel
Copy link

have the helper look in the directory it was started in

I thought that is what the latest patch accomplished by adding "." to the list of data directories. But then why does this not work:

exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [imsection,-parent,3C147-CD-LO-spw0-s9-lwimager.fullrest.fits,2048,2048,2] [2018-7-5 17:44:49]

Isn't the FITS file in the same directory from which nodejs was started?

@o-smirnov
Copy link
Author

Isn't the FITS file in the same directory from which nodejs was started?

It most certainly is. :( Silly question, but how do make it print to the console? I could add some print statements to that code...

@ericmandel
Copy link

ericmandel commented Jul 5, 2018

Console.log should work ... unless we have reached that point where nothing seems to work!

So:

console.log("trying: %s", parr[i])

@o-smirnov
Copy link
Author

Aha, so I also added a log statement inside the if... and I see:

Trying 3C147-CD-LO-spw0-s7-lwimager.fullrest.fits
Found 3C147-CD-LO-spw0-s7-lwimager.fullrest.fits
exec: /home/oms/radiopadre/js9-www/analysis-wrappers/js9Xeq [imsection,-parent,3C147-CD-LO-spw0-s7-lwimager.fullrest.fits,2048,2048,2]  [2018-7-5 19:30:51]

So the helper finds the FITS file fine at that point. It's something down the line that causes the error...

@o-smirnov
Copy link
Author

OK, I see the problem. It's in js9Xeq. Looks like it cd's into the output directory:

/home/oms/radiopadre/js9-www/.radiopadre/ecf792de-0a5a-446e-a27a-1c186214461d

Then runs

js9helper -i 3C147-CD-LO-spw0-s7-lwimager.fullrest.fits imsection \!3C147-CD-LO-spw0-s7-lwimager.fullrest_2048_2048_2.fits.gz 2048,2048,2

So I guess you just need to invoke it with an absolute path to the original FITS file in this case (which would have happened anyway if the file was located in a dataDir...)

@ericmandel
Copy link

I was just about to get to the same place ...

I need to set up a similar situation, with the helper running in the data directory. Let me do that and make some changes to facilitate that sort of processing and I'll be back ...

@o-smirnov
Copy link
Author

By the way, is there cube support for fits2fits?

@o-smirnov
Copy link
Author

This does not seem to be true. When I first tried to set up a direct proxy from Mac to Linux, I ran the Linux js9 helper on port 2718. This did not work, because our firewall blocks that port by default. ssh -N -f

Ah wait, but that's just a quirk of your local setup. You're not tunneling all the way to the host running the helper. You're doing something like:

ssh -L localhost:2717:js9.si.edu:2718 eric@xxx.cfa.harvard.edu

...which means that your ssh end-point, xxx.cfa.hardvard.edu, is connecting to js9.si.edu:2718, so if there's a firewall in between those two machines, it can certainly get in the way.

I'm running something like

ssh -L localhost:11001:localhost:1025 oms@stills.ru.ac.za

...so I have an ssh connection directly to stills (where the helper is running). So stills connects to port 1025 on itself ("localhost:1015"). As far as any firewalls are concerned, all they see is ssh traffic between my laptop and stills.

I'm pretty sure the tunneling itself is OK -- I'm running all sorts of other services through it, with no problems whatsoever. And I can connect to the helper with wget just fine (see my comment here #5 (comment)). So there's something odd in JS9 itself.

Can I run js9.js instead of js9.min.js, and start adding console.log statements? Not sure I'll have much time for it today though...

@ericmandel
Copy link

OK, thanks, I have verified that these both work, even though 2718 is blocked by the firewall:

ssh -N -f -L 2717:localhost:14285 eric@bokhara.cfa.harvard.edu
ssh -N -f -L 2717:localhost:2718 eric@bokhara.cfa.harvard.edu

I think using localhost instead of bokhara made it work on the otherwise blocked port 2718.

So can you, at your convenience, use tcpdump to make sure you are receiving the tcp packets on stills?

Yes, you can replace js9.min.js with js9.js but I don't know which console statement will help. Let me think whether I can come up with some debugging experiments. This error message:

JS9 helper connect error: timeout (or connection refused)

is just the error callback from the connect(), but perhaps there is something I can add to the xhr call itself ...

@o-smirnov
Copy link
Author

Interesting. This is tcpdump on my laptop when J9 starts up:

$ sudo tcpdump -i lo "tcp port 11001"
10:54:38.691326 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [S], seq 1594062229, win 43690, options [mss 65476,sackOK,TS val 12231557 ecr 0,nop,wscale 7], length 0
10:54:38.691340 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [S.], seq 2972804348, ack 1594062230, win 43690, options [mss 65476,sackOK,TS val 12231557 ecr 12231557,nop,wscale 7], length 0
10:54:38.691352 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 1, win 342, options [nop,nop,TS val 12231557 ecr 12231557], length 0
10:54:38.691489 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [P.], seq 1:3322, ack 1, win 342, options [nop,nop,TS val 12231557 ecr 12231557], length 3321
10:54:41.953580 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [P.], seq 1:16385, ack 3322, win 1365, options [nop,nop,TS val 12232372 ecr 12231557], length 16384
10:54:41.953594 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 16385, win 1365, options [nop,nop,TS val 12232372 ecr 12232372], length 0
10:54:41.954896 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [P.], seq 16385:32769, ack 3322, win 1365, options [nop,nop,TS val 12232373 ecr 12232372], length 16384
10:54:41.954911 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 32769, win 2388, options [nop,nop,TS val 12232373 ecr 12232373], length 0
10:54:41.955056 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [P.], seq 32769:49153, ack 3322, win 1365, options [nop,nop,TS val 12232373 ecr 12232373], length 16384
10:54:41.955062 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 49153, win 3410, options [nop,nop,TS val 12232373 ecr 12232373], length 0
10:54:41.955153 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [P.], seq 49153:62613, ack 3322, win 1365, options [nop,nop,TS val 12232373 ecr 12232373], length 13460
10:54:46.955478 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [F.], seq 62613, ack 3322, win 1365, options [nop,nop,TS val 12233623 ecr 12232373], length 0
10:54:46.994726 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 62614, win 3636, options [nop,nop,TS val 12233633 ecr 12233623], length 0
10:55:31.982760 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [.], ack 62614, win 3636, options [nop,nop,TS val 12244880 ecr 12233623], length 0
10:55:31.982772 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [.], ack 3322, win 1365, options [nop,nop,TS val 12244880 ecr 12233633], length 0
10:55:37.471712 IP6 ip6-localhost.33330 > ip6-localhost.11001: Flags [F.], seq 3322, ack 62614, win 3636, options [nop,nop,TS val 12246252 ecr 12244880], length 0
10:55:37.471731 IP6 ip6-localhost.11001 > ip6-localhost.33330: Flags [.], ack 3323, win 1365, options [nop,nop,TS val 12246252 ecr 12246252], length 0

And this is what I see on stills:

0:54:40.271290 IP6 localhost.55334 > localhost.1025: Flags [S], seq 607750725, win 43690, options [mss 65476,sackOK,TS val 1386531999 ecr 0,nop,wscale 7], length 0
10:54:40.271299 IP6 localhost.1025 > localhost.55334: Flags [R.], seq 0, ack 607750726, win 0, length 0
10:54:40.271409 IP localhost.44736 > localhost.1025: Flags [S], seq 725647229, win 43690, options [mss 65495,sackOK,TS val 1386531999 ecr 0,nop,wscale 7], length 0
10:54:40.271423 IP localhost.1025 > localhost.44736: Flags [S.], seq 2447566619, ack 725647230, win 43690, options [mss 65495,sackOK,TS val 1386531999 ecr 1386531999,nop,wscale 7], length 0
10:54:40.271435 IP localhost.44736 > localhost.1025: Flags [.], ack 1, win 342, options [nop,nop,TS val 1386531999 ecr 1386531999], length 0
10:54:41.885465 IP localhost.44736 > localhost.1025: Flags [P.], seq 1:3322, ack 1, win 342, options [nop,nop,TS val 1386532402 ecr 1386531999], length 3321
10:54:41.885477 IP localhost.1025 > localhost.44736: Flags [.], ack 3322, win 1365, options [nop,nop,TS val 1386532402 ecr 1386532402], length 0
10:54:41.886100 IP localhost.1025 > localhost.44736: Flags [.], seq 1:21889, ack 3322, win 1365, options [nop,nop,TS val 1386532402 ecr 1386532402], length 21888
10:54:41.886118 IP localhost.44736 > localhost.1025: Flags [.], ack 21889, win 1365, options [nop,nop,TS val 1386532402 ecr 1386532402], length 0
10:54:41.886147 IP localhost.1025 > localhost.44736: Flags [P.], seq 21889:43777, ack 3322, win 1365, options [nop,nop,TS val 1386532402 ecr 1386532402], length 21888
10:54:41.886155 IP localhost.44736 > localhost.1025: Flags [.], ack 43777, win 2388, options [nop,nop,TS val 1386532402 ecr 1386532402], length 0
10:54:41.886179 IP localhost.1025 > localhost.44736: Flags [P.], seq 43777:62613, ack 3322, win 1365, options [nop,nop,TS val 1386532402 ecr 1386532402], length 18836
10:54:41.886186 IP localhost.44736 > localhost.1025: Flags [.], ack 62613, win 3411, options [nop,nop,TS val 1386532402 ecr 1386532402], length 0
10:54:46.891675 IP localhost.1025 > localhost.44736: Flags [F.], seq 62613, ack 3322, win 1365, options [nop,nop,TS val 1386533654 ecr 1386532402], length 0
10:54:46.930355 IP localhost.44736 > localhost.1025: Flags [.], ack 62614, win 3411, options [nop,nop,TS val 1386533664 ecr 1386533654], length 0
10:55:38.475411 IP localhost.44736 > localhost.1025: Flags [F.], seq 3322, ack 62614, win 3411, options [nop,nop,TS val 1386546550 ecr 1386533654], length 0
10:55:38.475426 IP localhost.1025 > localhost.44736: Flags [.], ack 3323, win 1365, options [nop,nop,TS val 1386546550 ecr 1386546550], length 0

The only obvious difference is that some packets from stills seem to be longer than the ones arriving on my laptop. Length being truncated at 16384?

@o-smirnov
Copy link
Author

o-smirnov commented Aug 3, 2018

Actually nevermind the packet size -- the sum of the response packet sizes is 62612 in both cases, so it's the same length message coming back.

Anyway, I'm convinced traffic from the helper is getting back properly (and remember, I can use wget to talk to it without a problem...)

@ericmandel
Copy link

OK, thanks, it's good to verify that the packets are arriving ... now all we need to do is understand why the helper is not seeing them.

I can't work on this right now, I'm buried deep in a problem with the socket.io fall-back transport mechanism (long-polling) . Don't ask why ... just know there are a lot of ways to (mis-)configure a web site!

@ericmandel
Copy link

  socket.io:server serve client source +27s
  socket.io:server serve client 304 +1s

I'm back ... these two messages above are output in a socket.io routine called Server.prototype.serve that serves socket.io.js back to the client. Once the browser has retrieved socket.io.js, it tries to connect to the server ... and the error you are seeing indicates its inability to connect.

We do see TCP packets going to port 1025 on stills, although there is no indication that the js9 helper is processing these TCP packets. We also see TCP packets going from 1025 on stills back to the other machine, but we still get a timeout.

Would that be a fair summary? If so, I have no idea (yet) what is going on.

@ericmandel
Copy link

When you start up the helper on stills, did you change the helperHost value or leave it as 0.0.0.0? That is, is the helper listening on all IP addresses (0.0.0.0) or a particular IP address?

@ericmandel
Copy link

Also, I see very few web pages returned when I google:

socket.io ipv6

Any chance you can try downgrading to ipv4?

@o-smirnov
Copy link
Author

o-smirnov commented Aug 3, 2018 via email

@o-smirnov
Copy link
Author

Would that be a fair summary? If so, I have no idea (yet) what is going on.

Yep, fair summary.

Tried it with IPv6 disabled, get the same problem, sadly...

When you start up the helper on stills, did you change the helperHost value or leave it as 0.0.0.0? That is, is the helper listening on all IP addresses (0.0.0.0) or a particular IP address?

0.0.0.0. But I changed it to 127.0.0.1 just in case, it didn't help.

What really intrigues me is that roughly 20% of the time it will connect successfully....

@ericmandel
Copy link

What really intrigues me is that roughly 20% of the time it will connect successfully....

Is the helper started up on the fly?

@ericmandel
Copy link

(oops, didn't mean to make that a comment by itself)

I agree that the 20% looks like a timing issue. If the helper is being started on the fly, perhaps you can start it manually instead (as a test) and see if it always connects.

@o-smirnov
Copy link
Author

Already tried that... anyway, it starts long before any request is made (Jupyter itself takes a while to come up, especially over a remote conection, and the helper is started first).

And it always successfully serves socket.io.js back to the client. It's just the subsequent bit that goes wrong.

@ericmandel
Copy link

And it always successfully serves socket.io.js back to the client.

I'm a little confused. Above you mentioned that DEBUG=* showed only two messages:

  socket.io:server serve client source +27s
  socket.io:server serve client 304 +1s

It doesn't seem like this would result in the return of the socket.io.js file, or else I would have expected more output from DEBUG=*. My output (up to the first js9 "initialize" message) is below, which seems to include the initial processing before a connect request.

  socket.io:server serve client source +13s
  engine intercepting request for path "/socket.io/" +0ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=MK5AT5E" +0ms
  engine handshaking client "IXyG7sMgqZBGt_sMAAAA" +3ms
  engine:socket sending packet "open" ({"sid":"IXyG7sMgqZBGt_sMAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}) +0ms
  engine:socket sending packet "message" (0) +0ms
  engine:polling setting request +0ms
  engine:socket flushing buffer to transport +1ms
  engine:polling writing "96:0{"sid":"IXyG7sMgqZBGt_sMAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40" +1ms
  engine:socket executing batch send callback +2ms
  socket.io:server incoming connection with id IXyG7sMgqZBGt_sMAAAA +110ms
  socket.io:client connecting to namespace / +0ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +0ms
  socket.io:socket joining room IXyG7sMgqZBGt_sMAAAA +0ms
  socket.io:socket packet already sent in initial handshake +0ms
  socket.io:socket joined room IXyG7sMgqZBGt_sMAAAA +1ms
  engine upgrading existing transport +256ms
  engine:socket might upgrade socket transport from "polling" to "websocket" +253ms
  engine intercepting request for path "/socket.io/" +3ms
  engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=MK5AT9H&sid=IXyG7sMgqZBGt_sMAAAA" +0ms
  engine setting new request for existing client +0ms
  engine:polling received "52:420["initialize",{"displays":["JS9"],"pageid":null}]" +257ms

@ericmandel
Copy link

This page https://socket.io/docs/logging-and-debugging/ tells us that we can add debugging statements to the client in this way:

localStorage.debug = '*';

When I did this, I do get some beautiful console log statements (I again end at the point where JS9 sends its "initialize" message):
screen shot 2018-08-04 at 10 01 25 am

@o-smirnov
Copy link
Author

I'm a little confused. Above you mentioned that DEBUG=* showed only two messages:

That is correct. But from your comment here I assumed that that meant socket.io.js was being successfully returned. So let me put it more precisely: those two messages (whatever they mean) appear with 100% reliability. A subsequent helper connection is established with ~20% reliability.

I happened to stumble on something interesting just now though. If I run the helper like this:

DEBUG='*'  /usr/bin/nodejs /home/oms/.radiopadre-venv/js9-www/js9Helper.js '{"helperPort": 1025, "helperOpts":{"transports":["polling"],"maxHttpBufferSize": 10E9}}'

...then the connection is 100% reliable! Without DEBUG='*', it gets flaky again. (My previous runs with DEBUG='*' were without the long polling options.)

@o-smirnov
Copy link
Author

Gotta leave the computer for today, but I think we're getting somewhere... I'll enable debugging and report tomorrow...

@ericmandel
Copy link

OK, I have no idea what is going on, but the client debugging (with and without setting the transport value set to polling) will tell us some good stuff. It stinks of a race condition, though.

I'll be away from computers from early tomorrow until Monday evening, and will pick up when I return. (I will read emails on my phone but not much more ...)

@o-smirnov
Copy link
Author

o-smirnov commented Aug 4, 2018 via email

@ericmandel
Copy link

Well, given that we are more or less cut from the same cloth, we probably should try to avoid pushing one another during vacation time. Let me know when you want to start up ... I'm not in a hurry either, except to make sure your needs are taken care of.

@o-smirnov
Copy link
Author

It stinks of a race condition, though.

I agree with your olfactory diagnosis. I got the browser to log debug statements. With debugging on, it tends to connect successfully more frequently (same observation with server-side debugging above).

Maddeningly, when it does fail to connect, there is nothing at all in the log from socket.io -- just the "helper connect error: timeout or connection refused" message straight up. When it succeeds, I get the following (note the connect/reconnect messages...)

registering JS9 partner display plug-in
socket.io.js?v=20180811133135:6 socket.io-client:url parse http://localhost:11001 +0ms
socket.io.js?v=20180811133135:6 socket.io-client new io instance for http://localhost:11001 +0ms
socket.io.js?v=20180811133135:6 socket.io-client:manager readyState closed +0ms
socket.io.js?v=20180811133135:6 socket.io-client:manager opening http://localhost:11001 +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket creating transport "polling" +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling polling +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr poll +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr open GET: http://localhost:11001/socket.io/?EIO=3&transport=polling&t=MKenbf1 +3ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr data null +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket setting transport polling +6ms
socket.io.js?v=20180811133135:6 socket.io-client:manager connect attempt will timeout after 1000 +9ms
socket.io.js?v=20180811133135:6 socket.io-client:manager readyState opening +1ms
socket.io.js?v=20180811133135:6 socket.io-client:manager connect attempt timed out after 1000 +1s
socket.io.js?v=20180811133135:6 engine.io-client:socket socket close with reason: "forced close" +1s
socket.io.js?v=20180811133135:6 engine.io-client:polling transport not open - deferring close +1s
socket.io.js?v=20180811133135:6 engine.io-client:socket socket closing - telling transport to close +1ms
socket.io.js?v=20180811133135:6 socket.io-client:manager connect_error +1ms
socket.io.js?v=20180811133135:6 socket.io-client:manager cleanup +0ms
VM712:424 JS9 helper: connect error
socket.io.js?v=20180811133135:6 socket.io-client:manager will wait 855ms before reconnect attempt +1ms
VM712:424 JS9 helper: connect timeout
socket.io.js?v=20180811133135:6 engine.io-client:polling polling got data 96:0{"sid":"xv3hw4EYNzONzvTyAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40 +393ms
socket.io.js?v=20180811133135:6 socket.io-client:manager attempting reconnect +998ms
socket.io.js?v=20180811133135:6 socket.io-client:manager readyState closed +4ms
socket.io.js?v=20180811133135:6 socket.io-client:manager opening http://localhost:11001 +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket creating transport "polling" +1s
socket.io.js?v=20180811133135:6 engine.io-client:polling polling +612ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr poll +2s
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr open GET: http://localhost:11001/socket.io/?EIO=3&transport=polling&t=MKencAc +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr data null +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket setting transport polling +3ms
socket.io.js?v=20180811133135:6 socket.io-client:manager connect attempt will timeout after 1000 +6ms
socket.io.js?v=20180811133135:6 engine.io-client:polling polling got data 96:0{"sid":"-GSbGnkFuUL2f30nAAAB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}2:40 +769ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "open", data "{"sid":"-GSbGnkFuUL2f30nAAAB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000}" +767ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket open +0ms
socket.io.js?v=20180811133135:6 socket.io-client:manager open +767ms
socket.io.js?v=20180811133135:6 socket.io-client:manager cleanup +1ms
socket.io.js?v=20180811133135:6 socket.io-client:socket transport is open - connecting +0ms
socket.io.js?v=20180811133135:6 socket.io-client:manager reconnect success +1ms
VM712:424 JS9 helper: reconnect
socket.io.js?v=20180811133135:6 engine.io-client:socket starting upgrade probes +4ms
socket.io.js?v=20180811133135:6 engine.io-client:socket probing transport "websocket" +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket creating transport "websocket" +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "message", data "0" +2ms
socket.io.js?v=20180811133135:6 socket.io-parser decoded 0 as {"type":0,"nsp":"/"} +0ms
socket.io.js?v=20180811133135:6 socket.io-client:socket emitting packet with ack id 0 +11ms
socket.io.js?v=20180811133135:6 socket.io-client:manager writing packet {"type":2,"data":["initialize",{"displays":[]}],"options":{"compress":true},"id":0,"nsp":"/"} +12ms
socket.io.js?v=20180811133135:6 socket.io-parser encoding packet {"type":2,"data":["initialize",{"displays":[]}],"options":{"compress":true},"id":0,"nsp":"/"} +6ms
socket.io.js?v=20180811133135:6 socket.io-parser encoded {"type":2,"data":["initialize",{"displays":[]}],"options":{"compress":true},"id":0,"nsp":"/"} as 20["initialize",{"displays":[]}] +3ms
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +11ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr open POST: http://localhost:11001/socket.io/?EIO=3&transport=polling&t=MKencMw&sid=-GSbGnkFuUL2f30nAAAB +788ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr data 33:420["initialize",{"displays":[]}] +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling polling +23ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr poll +6ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr open GET: http://localhost:11001/socket.io/?EIO=3&transport=polling&t=MKencN1&sid=-GSbGnkFuUL2f30nAAAB +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr data null +2ms
socket.io.js?v=20180811133135:6 engine.io-client:polling polling got data 94:430[{"pageid":"448a5df8-11a6-4ea6-9890-cd5b11dc46d3","js9helper":false,"dataPathModify":true}] +794ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "message", data "30[{"pageid":"448a5df8-11a6-4ea6-9890-cd5b11dc46d3","js9helper":false,"dataPathModify":true}]" +799ms
socket.io.js?v=20180811133135:6 socket.io-parser decoded 30[{"pageid":"448a5df8-11a6-4ea6-9890-cd5b11dc46d3","js9helper":false,"dataPathModify":true}] as {"type":3,"nsp":"/","id":0,"data":[{"pageid":"448a5df8-11a6-4ea6-9890-cd5b11dc46d3","js9helper":false,"dataPathModify":true}]} +801ms
socket.io.js?v=20180811133135:6 socket.io-client:socket calling ack 0 with [{"pageid":"448a5df8-11a6-4ea6-9890-cd5b11dc46d3","js9helper":false,"dataPathModify":true}] +809ms
VM712:424 JS9 helper: connect: nodejs
socket.io.js?v=20180811133135:6 engine.io-client:polling polling +8ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr poll +798ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr open GET: http://localhost:11001/socket.io/?EIO=3&transport=polling&t=MKencZX&sid=-GSbGnkFuUL2f30nAAAB +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling-xhr xhr data null +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket probe transport "websocket" opened +739ms
socket.io.js?v=20180811133135:6 engine.io-client:socket probe transport "websocket" pong +703ms
socket.io.js?v=20180811133135:6 engine.io-client:socket pausing current transport "polling" +0ms
socket.io.js?v=20180811133135:6 engine.io-client:polling we are currently polling - waiting to pause +1s
socket.io.js?v=20180811133135:6 engine.io-client:polling polling got data 1:6 +100ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "noop", data "undefined" +102ms
socket.io.js?v=20180811133135:6 engine.io-client:polling pre-pause polling complete +2ms
socket.io.js?v=20180811133135:6 engine.io-client:polling paused +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket changing transport and sending upgrade packet +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket setting transport websocket +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket clearing existing transport polling +1ms
socket.io.js?v=20180811133135:6 engine.io-client:polling ignoring poll - transport state "paused" +3ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +23s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +2ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +709ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +25s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +736ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +25s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +470ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +26s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +702ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +25s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +1ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +710ms
socket.io.js?v=20180811133135:6 engine.io-client:socket writing ping packet - expecting pong within 5000ms +25s
socket.io.js?v=20180811133135:6 engine.io-client:socket flushing 1 packets in socket +0ms
socket.io.js?v=20180811133135:6 engine.io-client:socket socket receive: type "pong", data "undefined" +693ms

And even with a successful connection, it's still not invoking fits2fits, and trying to download full images. Is it unable to find the helper shell script perhaps? (It would be nice BTW if this condition also displayed an error when requireHelper=true). This is the exchange I see in the log:

image

@ericmandel
Copy link

Maddeningly, when it does fail to connect, there is nothing at all in the log from socket.io

And, on the server side, on a connect failure, all you see is this, right?

  socket.io:server serve client source +27s
  socket.io:server serve client 304 +1s

If so, this is consistent with the unix system call connect() not actually connecting to a socket, and eventually timing out -- and this failure/timeout is independent of the implementation in which its embedded. (See https://github.com/ericmandel/xpa for a socket-based communication mechanism from the 90's, still in heavy use in xray astronomy -- man, I tripped over every problem imaginable back then).

Why is the connect just hanging and not connecting?? Especially when you see the TCP packets arriving on the server side!

Have you tried executing netstat on the server side:

netstat -an | egrep 2718

(or whatever port you are using to listen to). I'd be interested to know what is listening.

I assume you won't find that port in a TIME_WAIT state ... that would be a problem.

I'm starting to wonder if we need to replace the helper with another socket-based server to see what is happening at connect time. It doesn't seem like this is a socket.io problem, but only a replacement server will tell us that for sure.

@ericmandel
Copy link

socket.io.js?v=20180811133135:6 socket.io-client:manager connect attempt will timeout after 1000 +6ms

Also, let's try increasing the connect timeout, which is controlled by these globalOpts variables (probably the second, in your case):

    htimeout: 5000,		// connection timeout for the helper connect
    lhtimeout: 1000,		// connection timeout for local helper connect

Try:

    JS9.globalOpts.htimeout = 10000
    JS9.globalOpts.lhtimeout = 10000

Perhaps a miracle will happen.

I envy the fact that you got to attend the commissioning of Meerkat during the day. I guess I'll try to get up at 3am again to see Parker Solar Probe launch (my division built SWEAP, although my only real contact with that project is through my friend, SWEAP's long-long-suffering grant administrator) but I can't guarantee it.

@o-smirnov
Copy link
Author

Perhaps a miracle will happen.

BINGO! That's exactly what has happened. Seems 100% reliable now.

And it makes perfect sense in hindsight -- initialization of the whole thing (Jupyter+JS9) via the tunnel takes a while, and there's a lot of latency in opening up new connections in particular... so that 1s was too optimistic...

Right, new problems coming up soon though. :)

I envy the fact that you got to attend the commissioning of Meerkat during the day.

Yes, it was a pretty special event. Nothing quite like standing among a mob of radio dishes tracking around in unison... ;)

@ericmandel
Copy link

Whoah, that was hard to find! I've increased the timeout to 10 seconds everywhere, so hopefully this won't happen again.

And I'll look forward to some non-race-condition problems ...

@o-smirnov
Copy link
Author

And I'll look forward to some non-race-condition problems ...

Be careful what you wish for! Here's one coming right up. I've broken something again in the Load() call. At the moment it looks like this:

    opts = {fits2fits:true, xcen:(xsz/2>>0), ycen:(ysz/2>>0), xdim:xsz, ydim:ysz, bin:bin,
            onload: im => this.onLoadRebin(im, xsz, ysz),
            zoom:'T'}
    this.setDefaultImageOpts(opts)
    JS9p.log("Loading", path, "with options", opts)
    JS9.Load(path, opts, {display:this.disp_rebin});

And here's the console output:

image

The opts object looks OK to me (there are some extra fields in a misguided effort to set the initial clipping limits). And yet, it no longer calls the helper to get the rebinned image. It just silently falls back to getting a full image (I opened an issue about the silent part in ericmandel/js9#42).

The helper itself is configured correctly though -- when it goes to load a zoomed section later, I can see js9Xeq being invoked.

@ericmandel
Copy link

BINGO! That's exactly what has happened. Seems 100% reliable now.
Whoah, that was hard to find! I've increased the timeout to 10 seconds everywhere, so hopefully this won't happen again.

On reflection, you should explicitly set JS9.globalOpts.lhtimeout to a value that makes sense for you. I now have it set to 10 seconds, but that will cause other problems: if a local helper is expected but is not running, there will be a 10 second delay before JS9 recovers and moves on without the helper. I'll need to see if anyone complains -- this might just be my debugging setup, but setting the timeout value yourself will be safest.

@o-smirnov
Copy link
Author

OK, with the new requireFits2Fits mode, the reason became immediately apparent. The problem was with the http://localhost prefix on the image URL. If I'm always going via the helper, I should never have that prefix on the image...

Everything seems to work again now, so I'll close this thread, as it's getting unwieldy anyway. I'll open new ones when I break something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants