Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic #45

Closed
pronebird opened this issue May 7, 2024 · 12 comments
Closed

Panic #45

pronebird opened this issue May 7, 2024 · 12 comments

Comments

@pronebird
Copy link

pronebird commented May 7, 2024

Using 0.0.9. Caught a panic today:

[2024-05-07 11:53:23.668][ipstack::stream::tcp][WARN] 1615879314
[2024-05-07 11:53:23.671][ipstack::stream::tcp][WARN] 1615887506
[2024-05-07 11:53:23.677][ipstack::stream::tcp][WARN] 1615879314
[2024-05-07 11:53:23.678][ipstack::stream::tcp][WARN] 1894419854
[2024-05-07 11:53:23.882][panic][ERROR] thread 'tokio-runtime-worker' panicked at 'Please report these values at: https://github.com/narrowlink/ipstack/': /Users/pronebird/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ipstack-0.0.9/src/stream/tcp.rs:495
   0: backtrace::capture::Backtrace::new
   1: <backtrace::capture::Backtrace as core::default::Default>::default
   2: log_panics::Config::install_panic_hook::{{closure}}
   3: <REDACTED>::abort_on_panic::{{closure}}
   4: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2029:9
      std::panicking::rust_panic_with_hook
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:785:13
   5: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:651:13
   6: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:171:18
   7: rust_begin_unwind
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
   8: core::panicking::panic_fmt
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
   9: <ipstack::stream::tcp::IpStackTcpStream as tokio::io::async_write::AsyncWrite>::poll_flush
  10: <ipstack::stream::tcp::IpStackTcpStream as tokio::io::async_read::AsyncRead>::poll_read
  11: <ipstack::stream::tcp_wrapper::IpStackTcpStream as tokio::io::async_read::AsyncRead>::poll_read
  12: tokio::io::util::copy::CopyBuffer::poll_copy
  13: tokio::io::util::copy_bidirectional::transfer_one_direction
  14: <REDACTED>::forwarder::Forwarder::handle_tcp::{{closure}}::{{closure}}
  15: tokio::runtime::task::core::Core<T,S>::poll
  16: tokio::runtime::task::harness::Harness<T,S>::poll
  17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  18: tokio::runtime::scheduler::multi_thread::worker::Context::run
  19: tokio::runtime::context::runtime::enter_runtime
  20: tokio::runtime::scheduler::multi_thread::worker::run
  21: tokio::runtime::task::core::Core<T,S>::poll
  22: tokio::runtime::task::harness::Harness<T,S>::poll
  23: tokio::runtime::blocking::pool::Inner::run
  24: std::sys_common::backtrace::__rust_begin_short_backtrace
  25: core::ops::function::FnOnce::call_once{{vtable.shim}}
  26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2015:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2015:9
      std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys/pal/unix/thread.rs:108:17
  27: __pthread_start

By looking at the code issuing the panic I can conclude that inflights_packets is empty, because only four lines are logged and the loop is entirely skipped.
What's really going on in this piece of code? Maybe it shouldn't panic if there are no more inflight packets to flush?

@pronebird
Copy link
Author

pronebird commented May 7, 2024

I have also noticed this piece of code:

https://github.com/narrowlink/ipstack/blob/main/src/stream/tcp.rs#L459-L462

Note that early return there on UnexpectedEof. self.tcb.retransmission remains set. Maybe we should reset retransmission before packet_sender.send()?

@RoDmitry
Copy link
Contributor

RoDmitry commented May 7, 2024

I'm using self.tcb.retransmission.take() to preliminarily set it to None.

@SajjadPourali
Copy link
Collaborator

SajjadPourali commented May 7, 2024

The problem is looks related to the logic of tcp stack, and I don't believe that it can be fixed by taking. If the function returns UnexpectedEof, it means the stream is terminated, and poll_flush won't be called anymore! Based on what you provided, ipstack received a retransmission of data that is not available in the inflight_packets anymore, as you mentioned. There are two reasons that this can happen:

  1. We removed that buffer from inflight_packets improperly.
  2. We are miscalculating the RetransmissionRequest in the check_pkt_type function.

I intentionally left the panic there because I suspected the accuracy of the check_pkt_type for this part and aimed to identify the potential miscalculation.

@RoDmitry
Copy link
Contributor

RoDmitry commented May 7, 2024

The problem is that inflight_packets is empty.

@SajjadPourali
Copy link
Collaborator

inflight_packets should not be empty at this stage.
tcp_sequence: 1615887506
last_ack: 1615879314
8192 bytes should be available in the inflight_packets

@SajjadPourali
Copy link
Collaborator

I suspect that a regression occurred here: 2d54e7e

@SajjadPourali
Copy link
Collaborator

self.inflight_packets.retain(|p| {
    let last_byte = p.seq.wrapping_add(p.payload.len() as u32);
    last_byte.saturating_sub(self.last_ack) > 0
        && self.seq.saturating_sub(last_byte) > 0
});

I am suspected to this part, 'seq' in 'inflight_packets' refers to the packet's sequence at the time it was pushed into the buffer. 'p.seq + 1' indicates the first byte, while 'p.seq' itself isn't available in the buffer.

@SajjadPourali
Copy link
Collaborator

SajjadPourali commented May 7, 2024

self.inflight_packets.retain(|p| {
    let last_byte = p.seq.wrapping_add(p.payload.len() as u32);
    last_byte.saturating_sub(self.last_ack) > 0
        && self.seq.saturating_sub(last_byte) > 0
});

I am suspected to this part, 'seq' in 'inflight_packets' refers to the packet's sequence at the time it was pushed into the buffer. 'p.seq + 1' indicates the first byte, while 'p.seq' itself isn't available in the buffer.

self.seq.saturating_sub(last_byte) > 0 Based on the explanation provided above this condition is incorrect and removing it should solve the issue.
@RoDmitry Could you please verify my conclusion?

@RoDmitry
Copy link
Contributor

RoDmitry commented May 7, 2024

If I would understand the meaning of the self.last_ack and self.seq, then maybe I could verify 😁. Maybe we need tests for this?

pronebird added a commit to pronebird/ipstack that referenced this issue May 8, 2024
pronebird added a commit to pronebird/ipstack that referenced this issue May 8, 2024
@pronebird
Copy link
Author

pronebird commented May 8, 2024

@SajjadPourali patched it as per your suggestion. Will run the ipstack the entire day and monitor logs. f64ca2c

@SajjadPourali
Copy link
Collaborator

If I would understand the meaning of the self.last_ack and self.seq, then maybe I could verify 😁. Maybe we need tests for this?

In the TCB of ipstack, "last_ack" means the last byte acknowledged by the peer (i.e., SND.UNA), and "seq" is the last byte sent and still not acknowledged (i.e., SND.NEXT).

                   1         2          3          4
              ----------|----------|----------|----------
                     SND.UNA    SND.NXT    SND.UNA
                                          +SND.WND

@RoDmitry
Copy link
Contributor

RoDmitry commented May 8, 2024

Then yes, checking for self.seq.saturating_sub(last_byte) > 0 is invalid, because self.seq can be equal to the last_byte of the packet, and so it will be dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants