Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aund gets stuck if the client stops talking #7

Closed
SteveFosdick opened this issue Feb 20, 2021 · 6 comments
Closed

Aund gets stuck if the client stops talking #7

SteveFosdick opened this issue Feb 20, 2021 · 6 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@SteveFosdick
Copy link

I came across when the client (B-Em emulating a BBC micro) malfunctioned and abandoned the dialogue in an OSFILE SAVE command part way through transferring the data. Upon re-starting the client, aund does not respond to *I AM. Re-starting aund restores operation.

@sai2791 sai2791 self-assigned this Feb 20, 2021
@sai2791 sai2791 added bug Something isn't working question Further information is requested labels Feb 20, 2021
@sai2791
Copy link
Owner

sai2791 commented Feb 22, 2021

I have been trying to replicate this using the load and save functionality for very large files then stopping the client half way through, but the server does recognise that the connection has timed out.

Can you add debug on to your /etc/aund.conf file and try the osfile method again please?
Sorry one more config thing, if you are using NFS3.60 or below can you had safehandles on to the /etc/aund.conf file too?

@SteveFosdick
Copy link
Author

Does the -d command line option do the same? With -d, a failed save looks like this:

fs_unixify_path: [TSTFIL]->[./TSTFIL]->[./TSTFIL]->[./TSTFIL]
fs_acornify_name: [.]->[]
{2=./TSTFIL} )

	(file server: {@=5,%=6} save [TSTFIL]
fs_unixify_path: [TSTFIL]->[./TSTFIL]->[./TSTFIL]->[./TSTFIL]
fs_acornify_name: [.]->[]
aund: send data: Connection timed out

Then, when attempting the '*I AM" which hangs, aund does not log any further messages. I can see from the client end that aund is acknowledging receipt of the command but then there seems to be no response with the result of the command. If I re-start aund I get the normal:

Cli: BOB has 3
Env: URD: . CSD: . LIB: .
{3=.} {5=.} {6=.} returning: urd=3, csd=5, lib=6, opt4=0
)

output from aund and the client is happy.

I am using NFS 2.62 (BBC B).

@SteveFosdick
Copy link
Author

Attaching gdb to the stuck aund process gives this backtrace:

(gdb) bt
#0  0x00007fe728b9a51a in recvfrom () from /usr/lib/libc.so.6
#1  0x0000555fdb6bf8fd in aun_recv (outsize=0x7fff3e4e1668, 
    vfrom=0x7fff3e4e1674, want_port=151) at aun.c:93
#2  0x0000555fdb6b9e4f in fs_data_recv (c=c@entry=0x7fff3e4e1760, 
    fd=fd@entry=9, size=8508320, size@entry=8523680, ackport=ackport@entry=145)
    at fs_fileio.c:1005
#3  0x0000555fdb6bb6eb in fs_save (c=0x7fff3e4e1760) at fs_fileio.c:856
#4  0x0000555fdb6b6d55 in file_server (pkt=pkt@entry=0x555fdb6cb480 <buf>, 
    len=<optimized out>, from=from@entry=0x7fff3e4e17c0) at fileserver.c:153
#5  0x0000555fdb6b68f8 in main (argc=<optimized out>, argv=0x7fff3e4e17c0)
    at aund.c:196

@sai2791
Copy link
Owner

sai2791 commented Feb 22, 2021

I think I found what is wrong, it is trying to send to each machine it finds in econet.cfg in turn, and if it does not get a reply, or the machine does not exist it gives an error. I think I will have to look at the errors and see if I can just quietly ignore some.

@sai2791
Copy link
Owner

sai2791 commented Feb 22, 2021

as a check if you only have the two machines, the one client and the file server specified in the econet.cfg can you replicate the error?

@SteveFosdick
Copy link
Author

SteveFosdick commented Feb 22, 2021

Yes, I can replicate this with only on client. I have found another client problem, though, in that the program I was testing with was failing to set bytes &10 and &11 of the OSFILE control block, i.e. the upper limit of the memory block to be saved. It looks like in this case NFS is sending the length, as calculated by a straight subtraction, but possibly not sending all the data. In one case it send a length of 8523680 (about 8.1Mb) but then sent 83 1K blocks. I can't tell at the moment if that is because of some internal thing within NFS or if the network comms (or the AUN <> Econet state machine) is not so robust and this much data shows it up.

sai2791 added a commit that referenced this issue Feb 23, 2021
…present or reachable on the network. The server attempts to reply to each machine it knows about, so when it encounters a machine that is not listening or is not present on the current network it fails hard. Added code to ignore these "failures" so that the aund server continues running instead of exiting. Note: This may not be the full list yet.

Github Issue #7
@sai2791 sai2791 closed this as completed Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants