Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eviction of entries from owned neighbor cache #83

Closed
whitequark opened this Issue Nov 21, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@whitequark
Copy link
Member

whitequark commented Nov 21, 2017

Right now the neighbor cache will simply grow for as long as it can when unseen before IPs are added. This can potentially take a lot of memory if we're talking about a /8 network that someone runs an ARP scan against. We should probably do something about it.

@squidarth

This comment has been minimized.

Copy link
Contributor

squidarth commented Jun 6, 2018

hi @whitequark! I'm new here and v. interested in contributing to this library! I want to confirm what needs to happen here before diving in:

It looks like in the case where the neighbor cache is not using heap-allocated storage that there is already logic in place to evict old entries:

     // src/iface/neighbor.rs:94

    Err((protocol_addr, neighbor)) => {
     // If we're going down this branch, it means that a fixed-size cache storage
     // is full, and we need to evict an entry.
     let old_protocol_addr = match self.storage {
         ManagedMap::Borrowed(ref mut pairs) => {
             pairs
                 .iter()
                 .min_by_key(|pair_opt| {
                     let (_protocol_addr, neighbor) = pair_opt.unwrap();
                     neighbor.expires_at
                 })
                 .expect("empty neighbor cache storage") // unwraps min_by_key
                 .unwrap() // unwraps pair
                 .0
         }
         // Owned maps can extend themselves.
         #[cfg(any(feature = "std", feature = "alloc"))]
         ManagedMap::Owned(_) => unreachable!()
     };

     let _old_neighbor =
         self.storage.remove(&old_protocol_addr).unwrap();
     match self.storage.insert(protocol_addr, neighbor) {
         Ok(None) => {
             net_trace!("filled {} => {} (evicted {} => {})",
                        protocol_addr, hardware_addr,
                        old_protocol_addr, _old_neighbor.hardware_addr);
         }
         // We've covered everything else above.
         _ => unreachable!()
     }
 }

is the task here to repeat this logic for the heap-allocated case?

  1. Do we want to make the the maximum size of the cache configurable, or do we want to be opinionated and set it to some reasonable number? Seems like the type of thing that ought to be configurable.

Thanks a lot for any guidance & for building this awesome library--I'm excited to dive in!

@whitequark

This comment has been minimized.

Copy link
Member Author

whitequark commented Jun 6, 2018

Hi @squidarth! The challenge here, first and foremost, in figuring out what logic exactly should be followed here. I didn't really have much time to put into this issue myself. Do you think you could look at other TCP/IP stacks and see what sort of logic they use for the neighbor table?

@squidarth

This comment has been minimized.

Copy link
Contributor

squidarth commented Jun 6, 2018

Yep--sounds good

@dlrobertson

This comment has been minimized.

Copy link
Collaborator

dlrobertson commented Jun 6, 2018

@squidarth You might also check out the NDISC RFC. It has some good info about the Neighbor Cache. I can't remember if it says anything about evicting entries from the cache though.

@squidarth

This comment has been minimized.

Copy link
Contributor

squidarth commented Jun 11, 2018

Alright, so today I spent some time digging into how Linux handles cache eviction:

My takeaway: I think we can simply add automatic garbage collection in the fill method that goes through the cache and removes any expired entries.

How the Linux Kernel handles cache eviction

First off, the kernel keeps a lot more state on entries in the ARP translation table does. It refers to items in the table as "neighbors", which go through a state machine (defined here), where they start out as NUD_REACHABLE, but might go into NUD_STALE if they have an old confirmation timestamp. These states are updated in the neigh_timer_handler that I linked to above.

There is another function neigh_periodic_work that actually goes through the items in the neighbor table and removes nodes that are NUD_FAILED.

Garbage collection is also triggered when allocating new neighbors if the number of entries in the table is above some constant value. On linux, these constants live at:

  • /proc/sys/net/ipv4/neigh/default/gc_thresh3
  • /proc/sys/net/ipv4/neigh/default/gc_thresh2

More thoughts

I don't think there's a huge need to redo how the ARP table is managed in smoltcp to adapt to having multiple states, and also think that an approach that uses another thread to garbage collect in the background would add a lot of complexity (we would have to move the NeighborCache into a shared data structure, etc.). An approach that synchronously removes the expired entries I feel like spiritually matches up with what linux does.

Re: the constants that linux provides, we probably want to allow users to supply these, but can use similar defaults.

Let me know what you think!

@whitequark

This comment has been minimized.

Copy link
Member Author

whitequark commented Jun 11, 2018

also think that an approach that uses another thread to garbage collect in the background would add a lot of complexity

No threads--all functionality should be available either with core or with core+alloc.

Otherwise, sounds great.

@dlrobertson

This comment has been minimized.

Copy link
Collaborator

dlrobertson commented Jun 12, 2018

I agree with @whitequark, this sounds great!

I don't think there's a huge need to redo how the ARP table is managed in smoltcp to adapt to having multiple states

The states mentioned above are the states of an entry in the Neighbor Cache required by NDISC that are required by Neighbor Unreachability Detection (NUD). So we will eventually need to add this state to the cache in order to be compliant with the RFC. I have a bullet in the issue #103 for this, but it should probably be its own issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.