-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
js9Helper / fits2fits woes #5
Comments
We're getting confused between different conversations. You definitely need a helper to be running. Advice to turn off the helper at various points in the conversation was part of testing different aspects of the Tornado problem, trying to see what worked and what did not work. I've just updated GitHub, and will write about how far I've gotten in the appropriate issue. |
Good news is, I've got the helper working (note to others: the But, I can't get fits2fits to work as intended. I'm following https://js9.si.edu/js9/help/repfile.html, and @ericmandel's comment here. So firstly https://js9.si.edu/js9/help/repfile.html is a bit incorrect, it says to modify Having set things up like this:
I then expect ...although what then gets displayed is the 256x256 cutout. What am I missing here? |
Following the example URLs here https://js9.si.edu/js9/js9large.html, I've tried
but I still get a full download. If I check the log of the helper process, I see it chattering about fits2fits:
...but I don't observe the intended effect. Nor is there any sign of activity in the working directory server-side. |
In the node log, you should see an exec command like this:
Its presence would tell us that node exec'ed the imsection command, which runs the script:
with the specified arguments. Once you have verified that js9Xeq was run, the easiest thing to do is to edit js9Xeq and uncomment the lines at the top (edit the path of the file as needed):
The foo.log file will tell us what the script is doing. Last time I debugged this remotely, it turned out to be a user PATH problem ... |
But I don't see the exec command at all. Here's the fuller log:
|
BTW, this is bypassing Tornado again and just using SimpleHTTPServer. (I've ran into more Tornado problems, on which I'll comment on the other thread...) |
Sorry, I'm talking about the console output from the node command itself:
The question is whether node got the imsection command and exec'ed the script to process it. |
No, the node is definitely not exec'ing the imsection command... (The log above is from running Funnily enough, in the "Analysis" menu, I have the "upload FITS to make tasks available" option, which I can click on, to which the node responds with:
...and then an upload commences, and server-side analysis becomes available. But this is kind of perverse: the FITS file starts out server-side, the browser downloaded it, and now it's sent it back to the server.... But the |
Well, node looks for the FITS file and if it cannot find it, it lets the client deal with file:
Where is this data file located relative to the JS9 install directory?
Probably node did not find the data file ... which might be due to confusion on my part about where to look and what the paths mean ... |
No, you definitely should not have to upload the FITS file. But what is happening is that the uploaded file is being stored in a place the helper can find easily. |
So you could try removing the initial "/" from the path of the FITS file ... |
Bingo, that's exactly the problem. If I take the leading slash off the file name, things start to work. So I see where the confusion arises, I'm just not quite sure how to work it out properly yet. Let's say the FITS file lives in |
Yes, this has caused me endless confusion and obviously I don't have it right yet. So I suggest leaving the "/" prefix off for now, and I'll look into it ... once again. The problem is that the web page is not always in the same location as the node server, and I try really hard to correct the path relative to node -- when node is processing that data file. |
OK, without the slash I get into a new problem. Playing with the Bin/Filter/Section plugin, I quickly break it with: And the corresponding console line:
In this case, |
Hmmm ... this might be a simple bug in js9Helper.js, in which it is processing "/" differently from the browser. Can you please make the change below in js9Helper.js, go back to using "/" as a prefix, and let me know if it works? If it does (and it should), I have to make a bunch of tests before updating. |
What is the actual directory in which the FITS file resides? Is it /home/oms/data? |
I'll try your patch momentarily. |
and:
The helper should be running in the js9 install directory, where the node_modules reside. Is that not the case here? I'm confused ... |
With the patch, and with a leading slash, I'm getting a "Can't find FITS file '3C147-CD-LO-spw0-s7-lwimager.fullrest.fits'" dialog. I'm pretty sure the problem is, the helper is not even looking in the directory that I originally run it in (
No, I'm launching it in
I guess I'm trying to use it in a way that you haven't intended! Is it the case that the helper figures out where it's been installed and chdirs into there before doing anything else? If that's the case, I understand why it's not working... |
Yes, JS9 figures out the path from the web page to the js9 install dir, where it is assumed that the helper is running. It uses this info to fix up the paths of the data files when they are processed by the helper (since they are specified relative to the web page). Let me think about this for a bit. We should be able to make it work. |
Yep, exactly. If I add I think we're running into philosophical differences here. You designed it for the scenario of a single webserver and a single helper process serving multiple users, correct? Whereas in the Jupyter use case, each user essentially runs their own private Tornado webserver (or even multiple Tornado servers, if they launch multiple notebooks in different directories). This does not align well with the "single global JS9 install" philosophy. (The discussion in ericmandel/js9#38 (comment) is related to this -- I wanted per-user helper processes for similar reasons...) In principle I can work around this by generating a custom
For full Jupyter integration, there's going to be an additional problem. If I start a notebook under |
Tried it. Still getting the "Can't find FITS file" dialog...
|
I thought that is what the latest patch accomplished by adding "." to the list of data directories. But then why does this not work:
Isn't the FITS file in the same directory from which nodejs was started? |
It most certainly is. :( Silly question, but how do make it print to the console? I could add some print statements to that code... |
Console.log should work ... unless we have reached that point where nothing seems to work! So:
|
Aha, so I also added a log statement inside the if... and I see:
So the helper finds the FITS file fine at that point. It's something down the line that causes the error... |
OK, I see the problem. It's in
Then runs
So I guess you just need to invoke it with an absolute path to the original FITS file in this case (which would have happened anyway if the file was located in a dataDir...) |
I was just about to get to the same place ... I need to set up a similar situation, with the helper running in the data directory. Let me do that and make some changes to facilitate that sort of processing and I'll be back ... |
By the way, is there cube support for fits2fits? |
Ah wait, but that's just a quirk of your local setup. You're not tunneling all the way to the host running the helper. You're doing something like:
...which means that your ssh end-point, xxx.cfa.hardvard.edu, is connecting to js9.si.edu:2718, so if there's a firewall in between those two machines, it can certainly get in the way. I'm running something like
...so I have an ssh connection directly to stills (where the helper is running). So stills connects to port 1025 on itself ("localhost:1015"). As far as any firewalls are concerned, all they see is ssh traffic between my laptop and stills. I'm pretty sure the tunneling itself is OK -- I'm running all sorts of other services through it, with no problems whatsoever. And I can connect to the helper with wget just fine (see my comment here #5 (comment)). So there's something odd in JS9 itself. Can I run js9.js instead of js9.min.js, and start adding |
OK, thanks, I have verified that these both work, even though 2718 is blocked by the firewall:
I think using localhost instead of bokhara made it work on the otherwise blocked port 2718. So can you, at your convenience, use tcpdump to make sure you are receiving the tcp packets on stills? Yes, you can replace js9.min.js with js9.js but I don't know which console statement will help. Let me think whether I can come up with some debugging experiments. This error message:
is just the error callback from the connect(), but perhaps there is something I can add to the xhr call itself ... |
Interesting. This is tcpdump on my laptop when J9 starts up:
And this is what I see on stills:
The only obvious difference is that some packets from stills seem to be longer than the ones arriving on my laptop. Length being truncated at 16384? |
Actually nevermind the packet size -- the sum of the response packet sizes is 62612 in both cases, so it's the same length message coming back. Anyway, I'm convinced traffic from the helper is getting back properly (and remember, I can use wget to talk to it without a problem...) |
OK, thanks, it's good to verify that the packets are arriving ... now all we need to do is understand why the helper is not seeing them. I can't work on this right now, I'm buried deep in a problem with the socket.io fall-back transport mechanism (long-polling) . Don't ask why ... just know there are a lot of ways to (mis-)configure a web site! |
I'm back ... these two messages above are output in a socket.io routine called Server.prototype.serve that serves socket.io.js back to the client. Once the browser has retrieved socket.io.js, it tries to connect to the server ... and the error you are seeing indicates its inability to connect. We do see TCP packets going to port 1025 on stills, although there is no indication that the js9 helper is processing these TCP packets. We also see TCP packets going from 1025 on stills back to the other machine, but we still get a timeout. Would that be a fair summary? If so, I have no idea (yet) what is going on. |
When you start up the helper on stills, did you change the helperHost value or leave it as 0.0.0.0? That is, is the helper listening on all IP addresses (0.0.0.0) or a particular IP address? |
Also, I see very few web pages returned when I google:
Any chance you can try downgrading to ipv4? |
Ahh, ipv6. That's an idea. I'll take a look when I get back to my computer.
Might not be until tomorrow though.
Cheers,
Oleg
…---
Sent from my phone. Quality of spelling inversely proportional to finger
size.
On Fri, 03 Aug 2018, 20:01 Eric Mandel, ***@***.***> wrote:
Also, I see very few web pages returned when I google:
socket.io ipv6
Any chance you can try downgrading to ipv4?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGK5vxFtuXGdPcwBn9j9pLcc62POriMOks5uNJ4FgaJpZM4VC9EG>
.
|
Yep, fair summary. Tried it with IPv6 disabled, get the same problem, sadly...
0.0.0.0. But I changed it to 127.0.0.1 just in case, it didn't help. What really intrigues me is that roughly 20% of the time it will connect successfully.... |
Is the helper started up on the fly? |
(oops, didn't mean to make that a comment by itself) I agree that the 20% looks like a timing issue. If the helper is being started on the fly, perhaps you can start it manually instead (as a test) and see if it always connects. |
Already tried that... anyway, it starts long before any request is made (Jupyter itself takes a while to come up, especially over a remote conection, and the helper is started first). And it always successfully serves socket.io.js back to the client. It's just the subsequent bit that goes wrong. |
I'm a little confused. Above you mentioned that DEBUG=* showed only two messages:
It doesn't seem like this would result in the return of the socket.io.js file, or else I would have expected more output from DEBUG=*. My output (up to the first js9 "initialize" message) is below, which seems to include the initial processing before a connect request.
|
This page https://socket.io/docs/logging-and-debugging/ tells us that we can add debugging statements to the client in this way:
When I did this, I do get some beautiful console log statements (I again end at the point where JS9 sends its "initialize" message): |
That is correct. But from your comment here I assumed that that meant socket.io.js was being successfully returned. So let me put it more precisely: those two messages (whatever they mean) appear with 100% reliability. A subsequent helper connection is established with ~20% reliability. I happened to stumble on something interesting just now though. If I run the helper like this:
...then the connection is 100% reliable! Without |
Gotta leave the computer for today, but I think we're getting somewhere... I'll enable debugging and report tomorrow... |
OK, I have no idea what is going on, but the client debugging (with and without setting the transport value set to polling) will tell us some good stuff. It stinks of a race condition, though. I'll be away from computers from early tomorrow until Monday evening, and will pick up when I return. (I will read emails on my phone but not much more ...) |
Well I'm technically on leave now so I'm in no hurry. But I'll check the
debug output when all the vacationing gets too boring.
…On Sat, 04 Aug 2018, 15:34 Eric Mandel, ***@***.***> wrote:
OK, I have no idea what is going on, but the client debugging (with and
without setting the transport value set to polling) will tell us some good
stuff. It stinks of a race condition, though.
I'll be away from computers from early tomorrow until Monday evening, and
will pick up when I return. (I will read emails on my phone but not much
more ...)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGK5v4A223ae6hpcHZ7w1NQkZSy3d15Jks5uNbDYgaJpZM4VC9EG>
.
|
Well, given that we are more or less cut from the same cloth, we probably should try to avoid pushing one another during vacation time. Let me know when you want to start up ... I'm not in a hurry either, except to make sure your needs are taken care of. |
I agree with your olfactory diagnosis. I got the browser to log debug statements. With debugging on, it tends to connect successfully more frequently (same observation with server-side debugging above). Maddeningly, when it does fail to connect, there is nothing at all in the log from socket.io -- just the "helper connect error: timeout or connection refused" message straight up. When it succeeds, I get the following (note the connect/reconnect messages...)
And even with a successful connection, it's still not invoking fits2fits, and trying to download full images. Is it unable to find the helper shell script perhaps? (It would be nice BTW if this condition also displayed an error when |
And, on the server side, on a connect failure, all you see is this, right?
If so, this is consistent with the unix system call connect() not actually connecting to a socket, and eventually timing out -- and this failure/timeout is independent of the implementation in which its embedded. (See https://github.com/ericmandel/xpa for a socket-based communication mechanism from the 90's, still in heavy use in xray astronomy -- man, I tripped over every problem imaginable back then). Why is the connect just hanging and not connecting?? Especially when you see the TCP packets arriving on the server side! Have you tried executing netstat on the server side:
(or whatever port you are using to listen to). I'd be interested to know what is listening. I assume you won't find that port in a TIME_WAIT state ... that would be a problem. I'm starting to wonder if we need to replace the helper with another socket-based server to see what is happening at connect time. It doesn't seem like this is a socket.io problem, but only a replacement server will tell us that for sure. |
Also, let's try increasing the connect timeout, which is controlled by these globalOpts variables (probably the second, in your case):
Try:
Perhaps a miracle will happen. I envy the fact that you got to attend the commissioning of Meerkat during the day. I guess I'll try to get up at 3am again to see Parker Solar Probe launch (my division built SWEAP, although my only real contact with that project is through my friend, SWEAP's long-long-suffering grant administrator) but I can't guarantee it. |
BINGO! That's exactly what has happened. Seems 100% reliable now. And it makes perfect sense in hindsight -- initialization of the whole thing (Jupyter+JS9) via the tunnel takes a while, and there's a lot of latency in opening up new connections in particular... so that 1s was too optimistic... Right, new problems coming up soon though. :)
Yes, it was a pretty special event. Nothing quite like standing among a mob of radio dishes tracking around in unison... ;) |
Whoah, that was hard to find! I've increased the timeout to 10 seconds everywhere, so hopefully this won't happen again. And I'll look forward to some non-race-condition problems ... |
Be careful what you wish for! Here's one coming right up. I've broken something again in the Load() call. At the moment it looks like this: opts = {fits2fits:true, xcen:(xsz/2>>0), ycen:(ysz/2>>0), xdim:xsz, ydim:ysz, bin:bin,
onload: im => this.onLoadRebin(im, xsz, ysz),
zoom:'T'}
this.setDefaultImageOpts(opts)
JS9p.log("Loading", path, "with options", opts)
JS9.Load(path, opts, {display:this.disp_rebin}); And here's the console output: The opts object looks OK to me (there are some extra fields in a misguided effort to set the initial clipping limits). And yet, it no longer calls the helper to get the rebinned image. It just silently falls back to getting a full image (I opened an issue about the silent part in ericmandel/js9#42). The helper itself is configured correctly though -- when it goes to load a zoomed section later, I can see js9Xeq being invoked. |
On reflection, you should explicitly set JS9.globalOpts.lhtimeout to a value that makes sense for you. I now have it set to 10 seconds, but that will cause other problems: if a local helper is expected but is not running, there will be a 10 second delay before JS9 recovers and moves on without the helper. I'll need to see if anyone complains -- this might just be my debugging setup, but setting the timeout value yourself will be safest. |
OK, with the new requireFits2Fits mode, the reason became immediately apparent. The problem was with the Everything seems to work again now, so I'll close this thread, as it's getting unwieldy anyway. I'll open new ones when I break something else. |
Sorry @ericmandel, I'm struggling to get fits2fits to work, or wrap my head around it.
Firstly, a conceptual problem. Here you say "the browser knows to ask the js9 helper to extract a smaller file for display from the original file (which is one the server)", but in this comment you tell me to "...Using js9prefs.js in Jupyter, you'll need to turn off the helper". So if I do that, who's the browser going to ask for a smaller file?
I have tried to re-enable the helper -- but when I have
JS9.Load("my_fits_file.fits", {fits2fits:true})
in my script, I get this on the console:A simple
JS9.Load("my_fits_file.fits")
works fine.The text was updated successfully, but these errors were encountered: