Skip to content

Conversation

@jubeormk1
Copy link
Contributor

Hi there @mkj,

This is an early draft of a pull request to help with the SFTP subsystem for sunset. I have been reading and working on proto.rs and I would like you to see my commits and discuss problems and further steps. If you prefer other channels to discuss this I am happy to do it in nay other way.

To begin with, there were a number of build errors and to make incremental steps I opted to comment out and out mod sftpserver from lib.rs and the macro_rules! sftpmessages in recent commits (1,2,3). This way I could focus and use cargo-expand to analyse the SSHEncode/SSHDecode macro expansions.

The 20th of August I was trying to fix small things without getting into how the sftp module would work In the commits from the 20th of August I fixed some minor problems and is likely that I will have to revert some changes later on, like 4, 5.

Before working on the big entity in proto.rs, sftpmessages!, I need to fix some issues with StatusCode. I would like to ask you if I am using SSHEncode/SSHDecode macros correctly for StatusCode since I felt forced to add to it a lifetime which might not be necessary. Also I believe that SSHEncodeEnum is not working as expected (6, 7). I added sftp/out/cargo-expand-sftp.rs as reference to the expanded SSHEncode.

I also would like to mention that I had to modify some files outside of the sftp module as you will see in Files changed tab to implement or alter visibility of elements (src/packets.rs, src/sshwire.rs)

@jubeormk1
Copy link
Contributor Author

In 8 and 9 I have fixed the CI issue

@jubeormk1
Copy link
Contributor Author

After that I have:

  • added SSHDecode for StatusCode in 10,
  • added SSHEncode/Decode for name11
  • and in the rest of the commits I have been uncommenting more bits of sftpmessages! but it is not quite ready yet but it is not throwing building errors.

As a minor change I have clarified that the module will follow SFTP v3 specifications in 12

@jubeormk1
Copy link
Contributor Author

We have a complete, not complaining sftp/src/proto.rs. However I restricted the lifetime for SftpPacket and decode_{response, request} methods so it has the same lifetime as SSHSource. This might have consequences once this structures start being used in sftpserver.rs and future demos.

Apart from that, I have avoided the use or sunset::Error in decode_{response, request} methods. That means that no Eerror::bug() or error::SSHProto.fail() are returned.

There is a number of TODOs in the file, mainly regard to:

  • The points made before
  • handling extension packets or fields
  • Variable number of ExtPair

jubeormk1 added a commit to jubeormk1/sunset that referenced this pull request Aug 29, 2025
Fixed according to the indications in the next discussion:
mkj#29 (comment)

Adding the expanded version for reference too

Thanks
@jubeormk1
Copy link
Contributor Author

Hi again,

I am starting to process the sftp subsystem, I have deserialized init packets and the first request SSH_FXP_REALPATH packet using the definitions and functions of sftp/src/proto.rs. The demo sftp server is not ready for any review so I am not pushing it yet.

I have made mayor changes in sftp/src/proto.rs that are worth mentioning. They consist of a redefinition of thesftpmessages! macro. The changes solve the issue of deserializing and serialising the Request Ids while keeping them out of the inner sftp packet definition. Is the complexity introduced worth it versus adding the ReqId to the inner packet definition?

The use of the macro has changed as well and maybe for good.

@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Sep 5, 2025

I am working on providing basic functionality to handle the sftp initialisation and adjusting the sftpserver trait to expose as little as possible protocol details to the user. I also have to handle properly errors inside sftphandle.

The code at this point is able to start an sftp session after running debug_sftp_client.sh but it does not fail gracefully for commands not included in the sftpmessages macro.

@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Sep 25, 2025

At this point, I need to handle the case where after extracting from the buffer_in a sensible SFTP request, I have left only a fragment of a request that cannot be decoded without some extra information that will not be available until the buffer is read from stdio.

I am exploring storing the fragment of the request in an auxiliar buffer to make sense of it after the next buffer read. Did you have to deal with this situation at a different level @mkj?

For now, I am going to use an auxiliary buffer to assemble the request. More coming soon...

@mkj
Copy link
Owner

mkj commented Sep 26, 2025

For now, I am going to use an auxiliary buffer to assemble the request. More coming soon...

Yeah I think a buffer is probably necessary (if SSHWire had partial decoding it'd still need to store intermediate parts in memory somewhere).

As an optimisation it might be possible to special_case FXP_DATA and avoid needing to store the data in the aux buffer, instead copy it to the destination? There would still be a need for relative large SSH_FXP_NAME lists though I guess.

@jubeormk1
Copy link
Contributor Author

Thanks for the ideas. For now I am writing a helper where I can take enough data from buffer in to get the size and packet type packet type.

In the case of SSH_FXP_WRITE I copying enough data to create a multi part write of the content reusing the current technique for it.

SSH_FXP_NAME is an interesting case. And FXP_DATA will also require some thinking, since the sink length may limit the sftpserver implementer and would create a bottleneck reading.

@jubeormk1
Copy link
Contributor Author

Thanks to @mkj original design of structures in proto.rs, there is only one entity that for now cannot be used in no_std context. This is Name, the response for SSH_FXP_READDIR and SSH_FXP_READPATH: SSH_FXP_NAME.

The reason is simple and @mkj mentioned before. The response for SSH_FXP_READPATH is a collection of all the elements in the directory.

Since reading a directory is part of the basic features that I have proposed in the library roadmap, I think that I should address it.

I would like to share the approach that I am starting to put together but I am open to more ideas.

I am considering processing the data element by element in the directory to resolve the arbitrary number of elements with new traits and structures for collections of SSH_FXP_READPATH responses. I would like to capture the next workflow:

  • SftpServer readdir return would be a result including: Number of elements in the directory, total length of all the data required to encode the SSH_FXP_NAME , and importantly one reference to a DirectoryElement struct for the first element in the directory
  • the SftpHandler will encode the header of the SSH_FXP_NAME response, wait until the stdio.write is completed,
  • encode the DirectoryElement content, wait for the write completion,
  • obtain the next DirectoryElement reference and repeat until there is no more elements to proceed.

I have a lifetime challenge here because the SftpServer would need to keep an internal state with either the full list of elements in the directory or a slim implementation that only keeps the relevant directory element and provides it to the SftpHandler.

I would like to keep those internal details for the the Sftp library user by providing a trait and maybe a slim proposed implementation for constrained environments.

Does anybody have a simpler or better idea? Maybe there is a better pattern to use here that I am overlooking.

@mkj
Copy link
Owner

mkj commented Oct 8, 2025

One way to resolve lifetime problems could be to have the readdir be passed a closure to call with entries to add? Either a FnMut or AsyncFnMut. Just an idea, I've found that pattern useful before. There might be a reason it wouldn't fit here though.

@brainstorm
Copy link
Contributor

Thanks @mkj! We've been inspecting this piece of code (and your advice), for readdir and we have a few questions:

We assume that the developer would eventually implement readdir(dir_handle: &T, reply: &mut DirReply), now our question is: which types/data are behind the lifetimes 'g and 'a behind the DirReply/Chanout types?:

pub struct DirReply<'g, 'a> {
    chan: ChanOut<'g, 'a>,
}
(...)
// TODO Implement correct Channel Out
pub struct ChanOut<'g, 'a> {
    _phantom_g: PhantomData<&'g ()>,
    _phantom_a: PhantomData<&'a ()>,
}

We are a bit confused by the 'g and 'a lifetimes? We guess that PhantomData was a prototyping placeholder but you had more concrete types and/or data in mind that didn't want to elaborate on while writing the SftpSever draft? In other words:

What is 'g supposed to hold down the road?
What is 'a supposed to hold eventually?

We think that 'g could be a ChanIO perhaps? ...

As in: we noticed the impl DirReply's "reply" method passes a _data &[u8]... is this a buffer that is supposed to shuttle bytes between the user-provided bytes (dropped into data) and ChanIO?

@mkj
Copy link
Owner

mkj commented Oct 11, 2025 via email

@jubeormk1
Copy link
Contributor Author

Hi again @mkj,

I think that I finally understand what you where suggesting that could work. A visitor pattern to avoid borrowing/lifetime issues while allowing iteratively process the directory elements.

@brainstorm I am implementing a DirReply holding references to the output buffer and a channel out and now I understand better the lifetimes that you propose in the original DirReply prototype.

  • 'g: Probably the sunset ChannelOut
  • 'a: The buffer used for a SSHSink

For now I have a mock structure that simulates sending data. Soon I will push it.

Looping to write all bytes. The counters confirm that there is no missing bytes as they match.

Tested with a 100MB writing operation: ./test_long_write_requests.sh

I am confident that this the write issue is solved
packet type + request id + items (1 + 4 + 4 bytes) hardcoded but will refactor
…e read dir request

[skip ci] WIP: BUG **Sending loop includes sftp client**

- the ReqId increases

- The loop stop in a client interruption [ctl]+C

I am going to need more visibility on why the client keeps on sending new request for the same files

[skip ci] WIP: BUG I was not following V3 specification as did not send EOF

https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02#section-6.7

Next issue is that I am sending the relative path and not just the filename. Will fix next
… all the items and is asked again. Anything else fails

This is point I have not considered. I could have sent one item at a time, with different challenges.

In my opinion I should add this behavior to the SftpServer implementation, but that passes the responsibility to the library user and the idea is making the user not needing to care too much about the protocol implementation

[skip ci] WIP: Typo and cleanup commented out code
Found a bug sending back the status code. It is always Ok!

[skip ci] WIP: Temporary ways to track completed Read operations

I will add a test to understand the  unexpected behaviour in response status packet
…ode implementation for enum

sshwire-derive lib.rs defines the rules to encode enums in `encode_enum` and the encoding decision is not to encode the value. See [implementation here](https://github.com/mkj/sunset/blob/8e5d20916cf7b29111b90e4d3b7bb7827c9be8e5/sshwire-derive/src/lib.rs#L287)

This could be addressed by introducing a parameter for enum fields where  the encoded data type is declared (or not). There is probably a good reason for this behavior.
… generate encoded values?

To fix the issue with the message type encoding I have done a manual implementation of SSHEncode for `StatusCode`. It is tested in a proto_test mod.

From last commit:
sshwire-derive lib.rs defines the rules to encode enums in `encode_enum` and the encoding decision is not to encode the value. See [implementation here](https://github.com/mkj/sunset/blob/8e5d20916cf7b29111b90e4d3b7bb7827c9be8e5/sshwire-derive/src/lib.rs#L287)

This could be addressed by introducing a parameter for enum fields where  the encoded data type is declared (or not). There is probably a good reason for this behavior.
Not ready yet. In order to answer  EOF to a SSH_SXP_READDIR after all the dir items have been already sent. I am storing a `sftpserver::ReadStatus` passed as a result on  `SftpServer.dirread()` with PendingData or EndOfFile.

This would not work well in scenarios where there is more than one read transaction in course.

I have decided that I am going to   make the SftpServer implementators the responsibility to send the EOF status message but I will help them by providing a detailed trait documentation.

To test this listing you may run the demo/sftp/std and run the script testing/test_read_dir.sh

mind that the hard link counter is not recovered as it is not part of SFTP V3 implementation
@jubeormk1 jubeormk1 changed the title Dev/sftp start: fixing errors to get proto.rs Dev/sftp start: Implementing basic SFTP functionality Nov 11, 2025
@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Nov 11, 2025

In the commit Fixed Status Encoding..and the previous one, I came across with the fact that sshwire-derive does not encode enum values. Therefore, SSHEncoding structures with enum fields would not encode enum fields. To solve this situation I implemented the SSHEncode for StatusCode.

I identified the relevant function and point in sshwire-derive/src/lib.rs where automatic encoding of numeric enum fields could be done, but the enum data type would need to be provided somehow, maybe as a struct Attribute, to automate the numerical value encoding.

@mkj Do you recognise that implementing an automatic enum value encoding would be a good use of time?

I have rocorded packets (encrypted) and straces for exec and test script finding:

- invalid packets reading from the channel
- No data reading from channel after sending EOF
- demo-sftp-std server stucked reading from file
The next files have been updated to facilitate the collection of data.

The data collected will include:
- Output of trace level of the sunset-demo-sftp-std
- strace for sunset-demo-sftp-std
- Output for log_get_single.sh script
@jubeormk1
Copy link
Contributor Author

First situation. The Client :

  • sent_eof_waiting_4_data.tar.gz: In this archive we can find that after the server has responded an EOF. (request Id 133) and the client reflect it.
    Expected client behaviour: After. The sftp client receives an EOF it should continuing with a SSH_FXP_CLOSE
    Current behaviour: There is no data received in the server side.

Files in the archive:

  • log_get_single.log: log_get_single.sh output
  • log_demo_get_single.log: output of sunset-demo-sftp-std
  • strace_target_debug_sunset_demo_sftp_std__20251201_150207.log: strace for the server executable

@jubeormk1
Copy link
Contributor Author

completing_packet_waiting_for_incoming_bytes.tar.gz
In this archive we can observe that after receiving 24 bytes of data, the server cannot complete decoding the packet and waits for extra bytes.
Expected behaviour: channel.read() returns a number of bytes and these are used to complete the packet.
Current behaviour: channel.read() does not return

Files in the archive:

log_get_single.log: log_get_single.sh output
log_demo_get_single.log: output of sunset-demo-sftp-std
strace_target_debug_sunset_demo_sftp_std__XXX.log: strace for the server executable

@jubeormk1
Copy link
Contributor Author

missing_bytes_break_communication.tar.gz: In this archive I have collected all the bytes received, grouped them by packets and I could identify that:

  • ReqId 24, 25 are not received
  • ReqId 85 is incomplete
  • ReqId 86-91 are missing
  • Following that requests 92 to 95 are well formed and complete

Files in the archive:

  • log_demo_get_single.log
  • log_demo_get_single_check_received.log
  • log_get_single.log
  • strace_target_debug_sunset_demo_sftp_std__20251201_150703.log

error!("server progress failed: {:?}", e); // NoRoom: 2048 Bytes Output buffer
return Err(e);
match e {
Error::NoRoom {} => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it's related to the current problems, but catching an error after progress() isn't generally OK. Some operations modify internal state so it should just exit if it fails here.

I think the specific changes in 2869501 should be OK though for the window adjust case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I did not realise that your fix was not applied

My apologies, I though that It had been merged to the main branch and merged when the env_var (pull 35) feature was included
… 100MB get

Will provide further logs in a comment.
Improved log scripts.

In summary, the communication stalls because the ChanOut.write at some point does not return after being called.

The reception if packets does not show any anomaly.
@jubeormk1
Copy link
Contributor Author

I tested again getting files from the server and worked BUT when I made the file longer. 100MB, the communication stalls after 4 MB have been transfers at a non deterministic point. This is the only problem that I can record at this point.

I looked at the logs and can see that the communication stalls because ChanOut.write blocks and don't get more byte sent.

More details in the attached logs stalled_not_sending_bytes.zip and
strace_stalled_not_sending_bytes.zip.

Coordinating pcap files, test script output, demo program output and strace (only network traffic)
Not all the data that the producer tries to send reach the consumer, that is the one that writes the data in the ChannelOut.

This is likely to be the responsible for the stuck communication.

Apart from that, I added more testing and fixed issue in Proto::Data (SSH_FXP_DATA). It had too many fields.

More details to come
@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Dec 10, 2025

Analysing the logs with more detail and putting together all the received and sent packets I have noticed that not all the data gets sent in the ChannelOut.

In the library, the sunset_sftp::sftphandler::sftpoutputchannelhandler::SftpOutputProducer sends bytes to an SftpOutputConsumer that is the future that writes bytes to the sunset_async::async_channel::ChanOut.

I have noticed that the call to write_all gets blocked indefinitely at the point that the communication stalls.

You can find that in long_extract.log.

I have found some errors around the proto::Data structure that are fixed, but were not causing the problem. Moving forward, I am going to examine the packets sent to SftpOutputConsumer to discard any missed packet and then look at why the write_all does not return.

As a curiosity, I am attaching two sets of SftpPackets decoded. One for a short get transmission that successes and another for a long transmission that stalls.

It was introduced for read_packet_from_file.rs and it is no longer needed
At this point I can see that on stall, the backpressure from the ChanOut blocks the data put in the pipe.

I took measures in a previous commit (5d1c1f9) to guarantee that for SSH_FXP_DATA and SSH_FXP_NAME the announced length and real length matches.
- SftpServer Unit tests to check the length in header and data sent
- Runtime checks for ReadDir, Read and PathInfo requests

I am going to start digging in the ChanOut Write process
This gives us visibility on when the ChanOut won't take all the bytes. It hapens regularly and that is expected. Does not provide extra visibility on the problem.

I will leave it like this for now
@jubeormk1
Copy link
Contributor Author

One question @mkj:

Looking at async channels, the first thing that I have noticed is that I might not be using ChanOut as intended. According to ChanOut docs ChanOut

can also be obtained using [ChanInOut::split()] for cases where a channel's input should be discarded.

And later adds

only one instance each should be read from or written to (this applies to split()

Does this means that only one instance should be used for writing and only one (other) instance should be used for reading or only one instance should be used for read and writing?

I believe it is the first interpretation, given the 'each' word, but please correct me if I am wrong and will refactor accordingly.

@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Dec 12, 2025

I have notice something important. The ChanOut locks writing data because internally it relies on the channel window and this one keeps reducing from the beginning of the execution to the end of it:

# Note: This is an extract from `tail ./logs/*_long.log -n 2048 | grep -I "window"`
[2025-12-12T23:32:34.374668228Z TRACE sunset::channel] send_data: new window 15872
[2025-12-12T23:32:34.382384408Z TRACE sunset::channel] send_data: new window 15360
[2025-12-12T23:32:34.393365485Z TRACE sunset::channel] send_data: new window 14862
[2025-12-12T23:32:34.409278961Z TRACE sunset::channel] send_data: new window 14848
[2025-12-12T23:32:34.412931470Z TRACE sunset::channel] send_data: new window 14336
[2025-12-12T23:32:34.419289126Z TRACE sunset::channel] send_data: new window 13824

...

[2025-12-12T23:32:34.774689131Z TRACE sunset::channel] send_data: new window 2238
[2025-12-12T23:32:34.794697482Z TRACE sunset::channel] send_data: new window 2048
[2025-12-12T23:32:34.798146425Z TRACE sunset::channel] send_data: new window 1536
[2025-12-12T23:32:34.806313594Z TRACE sunset::channel] send_data: new window 1024
[2025-12-12T23:32:34.817257184Z TRACE sunset::channel] send_data: new window 512
[2025-12-12T23:32:34.824983202Z TRACE sunset::channel] send_data: new window 0

On every call of channel.send_data() the window gets reduced by the data.len(). Looks like somewhere this window length should be extended.

The only point in the code where the windows length is increased is in the dispatch_inner match packet case with Packet::ChannelWindowAdjust. That is 100% correct given the channel mechanism section of the Connection Protocol.

But I cannot find that case being called in my execution logs. I was getting only the tail of the logs. Opening the scope for the whole log file I can see some windows adjustments a few tens of times vs tens of thousands of send_data calls.

[2025-12-13T00:28:47.703250554Z TRACE sunset::channel] send_data: new window 153348
[2025-12-13T00:28:47.786302383Z TRACE sunset::channel] send_data: new window 153338
[2025-12-13T00:28:47.790751584Z TRACE sunset::channel] send_data: new window 152836
[2025-12-13T00:28:47.801830460Z TRACE sunset::channel] new window 251914
[2025-12-13T00:28:47.806584885Z TRACE sunset::channel] send_data: new window 251904
[2025-12-13T00:28:47.811248300Z TRACE sunset::channel] send_data: new window 251774
[2025-12-13T00:28:47.818151850Z TRACE sunset::channel] new window 351092
[2025-12-13T00:28:47.835522023Z TRACE sunset::channel] new window 449408
[2025-12-13T00:28:47.839309890Z TRACE sunset::channel] send_data: new window 449036
[2025-12-13T00:28:47.844627079Z TRACE sunset::channel] send_data: new window 449026
[2025-12-13T00:28:47.849423990Z TRACE sunset::channel] send_data: new window 448524
[2025-12-13T00:28:47.855707082Z TRACE sunset::channel] new window 547340
[2025-12-13T00:28:47.861189357Z TRACE sunset::channel] send_data: new window 547330
[2025-12-13T00:28:47.866222949Z TRACE sunset::channel] send_data: new window 546828
[2025-12-13T00:28:47.880106747Z TRACE sunset::channel] new window 645642
[2025-12-13T00:28:47.887855211Z TRACE sunset::channel] send_data: new window 645632
[2025-12-13T00:28:47.892086132Z TRACE sunset::channel] send_data: new window 645294
[2025-12-13T00:28:47.898543081Z TRACE sunset::channel] new window 744100
[2025-12-13T00:28:47.911107041Z TRACE sunset::channel] new window 842926
[2025-12-13T00:28:47.929046225Z TRACE sunset::channel] send_data: new window 842762
[2025-12-13T00:28:47.933627800Z TRACE sunset::channel] send_data: new window 842752
[2025-12-13T00:28:47.938207076Z TRACE sunset::channel] send_data: new window 842250
[2025-12-13T00:28:47.948200535Z TRACE sunset::channel] send_data: new window 842240

Checking why they stopped being sent by the client I realised that they where sent. Here goes an extract of the client vs server logs:

jubeor@ROG-WIN:~/repos/sunset/demo/sftp/std/testing$ cat ./logs/*_long_client.log | grep -I "window" | tail
[2025-12-13T00:28:45.117329302Z] debug2: channel 0: window 1998336 sent adjust 98816
[2025-12-13T00:28:49.137484672Z] debug2: channel 0: window 1998336 sent adjust 98816
[2025-12-13T00:28:53.874467705Z] debug2: channel 0: window 1997824 sent adjust 99328
[2025-12-13T00:28:57.749947274Z] debug2: channel 0: window 1998346 sent adjust 98806
[2025-12-13T00:29:01.568232186Z] debug2: channel 0: window 1998326 sent adjust 98826
[2025-12-13T00:29:05.362839669Z] debug2: channel 0: window 1997958 sent adjust 99194
[2025-12-13T00:29:09.151810514Z] debug2: channel 0: window 1998120 sent adjust 99032
[2025-12-13T00:29:12.958634612Z] debug2: channel 0: window 1998418 sent adjust 98734
[2025-12-13T00:29:16.750645884Z] debug2: channel 0: window 1998336 sent adjust 98816
[2025-12-13T00:29:20.659137579Z] debug2: channel 0: window 1998336 sent adjust 98816
jubeor@ROG-WIN:~/repos/sunset/demo/sftp/std/testing$ cat ./logs/*_long.log | grep -I "] new window" | tail
[2025-12-13T00:28:23.546574612Z TRACE sunset::channel] new window 578216
[2025-12-13T00:28:23.559009053Z TRACE sunset::channel] new window 676654
[2025-12-13T00:28:23.575699036Z TRACE sunset::channel] new window 774404
[2025-12-13T00:28:47.801830460Z TRACE sunset::channel] new window 251914
[2025-12-13T00:28:47.818151850Z TRACE sunset::channel] new window 351092
[2025-12-13T00:28:47.835522023Z TRACE sunset::channel] new window 449408
[2025-12-13T00:28:47.855707082Z TRACE sunset::channel] new window 547340
[2025-12-13T00:28:47.880106747Z TRACE sunset::channel] new window 645642
[2025-12-13T00:28:47.898543081Z TRACE sunset::channel] new window 744100
[2025-12-13T00:28:47.911107041Z TRACE sunset::channel] new window 842926

@mkj
Copy link
Owner

mkj commented Dec 13, 2025

Does this means that only one instance should be used for writing and only one (other) instance should be used for reading or only one instance should be used for read and writing?

I believe it is the first interpretation, given the 'each' word, but please correct me if I am wrong and will refactor accordingly.

Yes, one reading instance and one writing instance. If you want they can be the same (ChanInOut).

I have notice something important. The ChanOut locks writing data because internally it relies on the channel window and this one keeps reducing from the beginning of the execution to the end of it:

Is it reporting "noroom for adjustment" anywhere in the logs? (from the recent workaround) (ignore that, I was thinking the opposite direction)

@mkj
Copy link
Owner

mkj commented Dec 14, 2025

I'm not certain, but it looks like it might be a problem with OpenSSH's SFTP read request pipelining.
If the total amount of outstanding pipelined requests sent by the sftp client is larger than the current server send window, then it will deadlock.

The server is processing read requests sequentially, so it has to send all response channel data for a read request before proceeding to the next packet. If it runs out of channel window while there are outstanding read requests, it will get stuck. The Window Adjust is queued after the outstanding read requests.

I suspect it's a bigger problem with sunset vs other implementations because it's using a small buffer size, so the relative overhead of SFTP packet headers is a lot larger than usual. (SFTP packet headers are the same size regardless. Maybe sunset is just slower than others). Though in the right circumstances I think it could happen for any SFTP implementation if the server sending is a lot slower than the client requests. I'll have a think about how to avoid it.

By default the sftp client has 64 requests pending, each of size 32768. The default channel window is 64*32768 (maybe deliberate?). That doesn't have any leeway for SFTP headers.

https://github.com/openssh/openssh-portable/blob/94bf1154b4132727114f222a587daeac101f1f5b/channels.h#L229-L230

https://github.com/openssh/openssh-portable/blob/ab164f671609a3a25cd0efcd967aff29144081bb/sftp-client.c#L65

The lock always seems to occur when the gap between read requests and sent data reaches the 2MB gap (target window size). An example running log_get_single_long.sh with https://github.com/mkj/sunset/tree/dev/matt-sftp-debug (changed debugging a bit), the last request received by sunset-sftp is offset 4751360:

[2025-12-14T14:29:50.471551566Z DEBUG sunset_sftp::sftphandler::sftphandler] Read request: Read(ReqId(150), Read { handle: FileHandle(BinString(len=4)), offset: 4751360, len: 32768 })

In the client log, that's around the point the amount of outstanding data crosses the window size:

[2025-12-14T14:29:45.071438059Z] debug3: Received data 2654208 -> 2686975
[2025-12-14T14:29:45.074868377Z] debug3: Request range 4751360 -> 4784127 (63/64)

4751360 - 2654208 = 2097152 (64*32 kB)

@jubeormk1
Copy link
Contributor Author

That is a very interesting point and all your analysis adds up. Thanks for your insights!

Now that you have pointed out to the client logs, I have noticed something that I find peculiar. Doing a grep to select Request|adjust I can see that at the beginning of the GET action, the window adjustment and the number of request do not add up. As an example:

...
[2025-12-13T00:25:32.868605162Z] debug2: channel 0: window 1997824 sent adjust 99328
[2025-12-13T00:25:32.877251337Z] debug3: Request range 163840 -> 196607 (2/4)
[2025-12-13T00:25:32.880194201Z] debug3: Request range 196608 -> 229375 (3/4)
[2025-12-13T00:25:33.787715855Z] debug3: Request range 229376 -> 262143 (3/5)
[2025-12-13T00:25:33.790515847Z] debug3: Request range 262144 -> 294911 (4/5)
[2025-12-13T00:25:34.689963520Z] debug3: Request range 294912 -> 327679 (4/6)
[2025-12-13T00:25:34.692728774Z] debug3: Request range 327680 -> 360447 (5/6)
[2025-12-13T00:25:35.605152171Z] debug3: Request range 360448 -> 393215 (5/7)
[2025-12-13T00:25:35.608169177Z] debug3: Request range 393216 -> 425983 (6/7)
[2025-12-13T00:25:35.622657463Z] debug2: channel 0: window 1998336 sent adjust 98816
...

Here we have eight requests of 32kB each follow by a window adjust of 98816 bytes. This adjust deficit repeats for a good part of the client logs until at some point it starts alternating adjusts of around 98300 Bytes with three 32kB requests, which seems more reasonable.

Counting the 32kB request in my log files and adding up all the window adjustment sent by the client I can summarise this deficit:

~/repos/sunset/demo/sftp/std/testing$ echo $(( $(cat ./logs/*_long_client.log | grep -E "Request" | wc -l) * 32768 ))
8814592

cat ./logs/*_long_client.log | awk '/sent adjust/ {sum += $NF} END {print sum}'
6724608

which accounts a total of 2,089,984 less bytes for adjust messages than request sent. Is it not strange?

@mkj
Copy link
Owner

mkj commented Dec 15, 2025

Here we have eight requests of 32kB each follow by a window adjust of 98816 bytes. This adjust deficit repeats for a good part of the client logs until at some point it starts alternating adjusts of around 98300 Bytes with three 32kB requests, which seems more reasonable.

The reason is that the adjust occurs after the read response is received by the client, and that may be some time later than the read request. So when the deficit reaches 2097152 (64*32kB) it can't send any more data, the deadlock occurs. The exact window adjust sizes are kind of arbitrary (since data packets aren't necessarily a nice multiple), the client will send those whenever the window drops below 2097152 I think.

I added some logging here to openssh client, the window adjustments matched what was being done in sunset.

@jubeormk1
Copy link
Contributor Author

jubeormk1 commented Dec 15, 2025

You are right, the receiver window adjustments are not preemptive but need to follow the reception of bytes from the sending peer.

So really, the window is use to prevent deadlocks from a peer sending more bytes than the receiver can process. In this case the client is sending more bytes than the server can process. Without a full understanding on the SSH Connection protocol, my intuition tells me that maybe we should slow down the number of request that the client sends by delaying the server window reception adjustment. Looking at src/config.rs I can see that the window is already small and the adjustment is automatic in check_window_adjust.

I believe that since sending a 32kB data block in a SSH_FXP_DATA response is orders of magnitud slower than receiving short sftp requests ideally the SFTP subsystem should slow the request rate. How to do this is tricky since the SSH Connection Protocol layer is not concern about the subsystem nature. @mkj: Would you consider an optional ready signal mechanism from the channel user to the connection protocol layer to help balancing the number of requests and response rate?

Maybe a transparent way to do this is using the ChanInOut.read() method to trigger a check_window_adjust, since at this point the channel users is processing more data, therefore it has processed the previous requests received.

I have tried with different thresholds in check_window_adjust and could not find a "sweet spot" to prevent the deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants