Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub-module for reading tc stats #8210

Closed
wants to merge 21 commits into from
Closed

Conversation

mmynk
Copy link
Contributor

@mmynk mmynk commented Oct 31, 2023

  • Adds the sub-crate for reading tc stats
  • Uses netlink-packet-route library which reads qdiscs via rtnetlink

Result:

$ below dump tc -b '5s ago'
Datetime            Interface   Kind       Queue Length   Bps        Pps        Bytes      Packets    Backlog    Drops      Requeues   Overlimits   MaxPacket   EcnMark    NewFlowsLen   OldFlowsLen   CeMark     DropOverlimit   NewFlowCount   MemoryUsage   DropOvermemory   Target     Limit      Interval   Ecn        Quantum    CeThreshold   DropBatchSize   MemoryLimit   Flows      Timestamp
2024-01-30 20:24:31 lo          noqueue    0              0.0 B/s    0/s        0.0 B/s    0/s        0/s        0/s        0/s        0/s          ?           ?          ?             ?             ?          ?               ?              ?             ?                ?          ?          ?          ?          ?          ?             ?               ?             ?          1706646271
2024-01-30 20:24:31 ens5        mq         0              0.0 B/s    0/s        167 B/s    2/s        0/s        0/s        0/s        0/s          ?           ?          ?             ?             ?          ?               ?              ?             ?                ?          ?          ?          ?          ?          ?             ?               ?             ?          1706646271
2024-01-30 20:24:31 ens5        fq_codel   0              0.0 B/s    0/s        0.0 B/s    0/s        0/s        0/s        0/s        0/s          110         0          0             0             0          0/s             0/s            0/s           0/s              4999       10240      99999      1          1514       0             64              33554432      0/s        1706646271
2024-01-30 20:24:31 ens5        fq_codel   0              0.0 B/s    0/s        167 B/s    2/s        0/s        0/s        0/s        0/s          182         0          0             0             0          0/s             0/s            0/s           0/s              4999       10240      99999      1          1514       0             64              33554432      0/s        1706646271

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2023
@mmynk mmynk force-pushed the feat/tc branch 10 times, most recently from 54cabb0 to bb0c013 Compare January 31, 2024 19:04
@mmynk mmynk marked this pull request as ready for review January 31, 2024 19:25
Copy link
Contributor

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only reviewed first commit so far

below/tc/src/lib.rs Outdated Show resolved Hide resolved
below/tc/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 62 to 64
'out: while let Ok(size) = socket.recv(&mut &mut recv_buf[offset..], 0) {
loop {
let bytes = &recv_buf[offset..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  1. why double borrow?
  2. why keep track of offset? seems like you could always write-to and read-from the beginning of the buffer

Copy link
Contributor Author

@mmynk mmynk Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mostly inspired from this example in netlink-packet-route library
from my understanding:

  1. recv() accepts a mutable reference to BufMut, I believe that's why we need double borrow. it refuses to compile otherwise.
pub fn recv<B>(&self, buf: &mut B, flags: libc::c_int) -> Result<usize>
where
    B: bytes::BufMut,
{
    ...
}

i can change it to: &mut recv_buf[offset..].borrow_mut() instead of &mut &mut recv_buf[offset..]

  1. we write to the beginning of offset but don't read from beginning:
    let bytes = &recv_buf[offset..];

return Err(TcError::Netlink(err.to_string()));
}
if let NetlinkPayload::Done(_) = payload {
break 'out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i misunderstanding or could this be a continue?

Copy link
Contributor Author

@mmynk mmynk Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the outer loop is for each packet received from the kernel
the inner loop is for parsing those packets into netlink object(s)
Done implies this is the last packet for the object we requested, so we're done parsing [1]

[1] from the library example

below/tc/src/lib.rs Outdated Show resolved Hide resolved
below/tc/src/test.rs Outdated Show resolved Hide resolved
below/tc/src/types.rs Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have discussed this before, but why create wrappers for netlink_packet_route structs? Doesn't seem like there's much value add. And later when new fields are supported you have to wire it all up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, I think current approach is good. If we are going to save these types to disk, then it's better we control the layout.

@mmynk mmynk force-pushed the feat/tc branch 2 times, most recently from 21a3519 to dff7a60 Compare February 19, 2024 09:36
Copy link
Contributor

@brianc118 brianc118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Just gave it a first pass with some comments.

below/model/src/tc_model.rs Show resolved Hide resolved
below/tc/src/types.rs Outdated Show resolved Hide resolved
below/tc/src/types.rs Outdated Show resolved Hide resolved
/// The kernel responds with a message of type `RTM_NEWQDISC` for each qdisc.
fn get_netlink_qdiscs() -> Result<Vec<TcMessage>> {
// open a socket
let socket = Socket::new(NETLINK_ROUTE).map_err(|e| TcError::Netlink(e.to_string()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to support IPv6 (NETLINK_ROUTE6)? The most important thing is to make sure the sample type in types.rs is extensible and will make sense in the future if more stats are added. Changes to it are difficult as we need to compare about forward/backward compatibility.

Copy link
Contributor Author

@mmynk mmynk Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the sample type in types.rs will not change regardless of whether we use IPv6 or IPv4. The reading part might change, ie, get_netlink_qdiscs() but not the types that the underlying response is translated into (since we are basically interested in the qdisc fields which should be independent of the reading).

tc: if !options.enable_tc_stats {
None
} else {
match tc::tc_stats() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is this since we're talking over netlink sockets and it's blocking? We generally try to make sure the main collection is fast to minimize collection skew and make sure we're still able to collect data under high system load.

More expensive collection can be made "best effort" and delegated to some other thread (e.g. via AsyncCollectorPlugin).

Copy link
Contributor Author

@mmynk mmynk Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think netlink operations would be expensive since they directly communicate with the kernel. However, I believe the receive operation is blocking. It could be slow only if there are, say 1000+ interfaces, which I am not sure is likely.

That being said, I tried to duplicate the implementation for gpu_stats for tc in this commit. It seems to work but would appreciate a closer look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.

below/model/src/collector_plugin.rs Outdated Show resolved Hide resolved
)]
pub struct TcModel {
#[queriable(subquery)]
pub tc: Vec<SingleTcModel>,
Copy link
Contributor

@brianc118 brianc118 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TcModel is queriable on index (e.g. queriable field id might be like "tc.tc.<idx>.bps"). Is there some field in SingleTcModel that uniquely identifies the SingleTcModel (e.g. interface?). If so we should pull it out and make this a BTreeMap<String, SingleTcModel>, so that it's easier to query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not straightforward because there's no clear identifier for qdiscs, also an interface could have multiple qdiscs.
From this website:

qdiscs are always referred to using a combination of a major node number and a minor node number. For any given qdisc, the major node number has to be unique for the root hook for a given Ethernet interface.

I believe we could use a combination of interface, handle and parent to uniquely identify a qdisc. We could add a key field and assign to it the hashed or concatenated combination of the 3 fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).

You just need to change this to BTreeMap<MajMin, SingleTcModel>.

use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;

// A Queriable MajMin
#[derive(
    Default,
    Clone,
    PartialEq,
    PartialOrd,
    Ord,
    Eq,
    Debug,
    DeserializeFromStr,
    SerializeDisplay,
    below_derive::Queriable
)]
pub struct MajMin {
    pub minor_id: u64,
    pub major_id: u64,
}

impl std::fmt::Display for MajMin {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}:{}", self.major_id, self.minor_id)
    }
}

impl std::str::FromStr for MajMin {
    type Err = anyhow::Error;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        if let Some(colon_idx) = s.find(':') {
            Ok(Self {
                major_id: (&s[..colon_idx]).parse::<u64>()?,
                minor_id: (&s[colon_idx + 1..]).parse::<u64>()?,
            })
        } else {
            Err(anyhow!("No colon in 'maj:min'"))
        }
    }
}

@mmynk
Copy link
Contributor Author

mmynk commented Apr 1, 2024

@brianc118 Could you take a look at the PR again? Thanks!

Copy link
Contributor

@brianc118 brianc118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Mostly want to make sure we agree on the right unique identifier for Tc/qdiscs and make sure that's recorded in samples.

below/tc/src/types.rs Outdated Show resolved Hide resolved
memory_usage: qdisc.memory_usage,
drop_overmemory: qdisc.drop_overmemory,
},
_ => Self::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, would it be better to just return None and make new return Option<FqCodelXStats>?

Or better, directly take TcFqCodelQdStats and leave it to caller to figure out what to do if xstats is not Qdisc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question for the second, makes sense to only deal with TcFqCodelQdStats but say in the future we want to expose class stats TcFqCodelClStats as well, then we'll have to change it back to something similar as above.
I wanted for FqCodelXStats to represent a combined struct of Qdisc and Class, if it's a Qdisc the fields relevant to Class would be None and vice-versa.

Or would it better to have another struct identify a classful FqCodel xstats if and when we need to? Then it would make sense to implement your second suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would it better to have another struct identify a classful FqCodel xstats if and when we need to?

I don't have a strong preference for either - I'm not familiar with this stuff enough to have a strong opinion :), but my feeling is it might be better to split it so each struct has clear meaning? I'll leave the decision up to you.

Right now I don't think it's taking either approach though. We'd want to make the fields Option<> if we are to futureproof it for overloading with TcFqCodelClStats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have addressed this in commit 97ac6bf
create an enum wrapper which can be extended with TcFqCodelClStats when we decide to add classful qdiscs support

)]
pub struct TcModel {
#[queriable(subquery)]
pub tc: Vec<SingleTcModel>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).

You just need to change this to BTreeMap<MajMin, SingleTcModel>.

use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;

// A Queriable MajMin
#[derive(
    Default,
    Clone,
    PartialEq,
    PartialOrd,
    Ord,
    Eq,
    Debug,
    DeserializeFromStr,
    SerializeDisplay,
    below_derive::Queriable
)]
pub struct MajMin {
    pub minor_id: u64,
    pub major_id: u64,
}

impl std::fmt::Display for MajMin {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}:{}", self.major_id, self.minor_id)
    }
}

impl std::str::FromStr for MajMin {
    type Err = anyhow::Error;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        if let Some(colon_idx) = s.find(':') {
            Ok(Self {
                major_id: (&s[..colon_idx]).parse::<u64>()?,
                minor_id: (&s[colon_idx + 1..]).parse::<u64>()?,
            })
        } else {
            Err(anyhow!("No colon in 'maj:min'"))
        }
    }
}


/// `Tc` represents a traffic control qdisc.
#[derive(Default, Clone, PartialEq, Debug, Serialize, Deserialize)]
pub struct TcStat {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should include a new MajMin struct to uniquely identify the qdisc. I believe the TcHandle (maj, min) is all part of the TcHeader of TcMessage. https://docs.rs/netlink-packet-route/latest/netlink_packet_route/tc/struct.TcMessage.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have handle but from some testing it looks like it does not uniquely identify the qdisc.
From my understanding, a handle would uniquely identify the qdisc for a given interface and a parent. So it's the combination of all these 3 fields (interface, parent, handle), eg:

TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 1, if_name: "lo", kind: "noqueue", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "mq", ... }
TcStat { handle: 0, parent: 2, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... }
TcStat { handle: 0, parent: 1, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 3, if_name: "docker0", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 4, if_name: "br-bddb616e0d48", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 76, if_name: "vethdc90bd5", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 108, if_name: "veth2dc7d25", ... }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agreed, we'd need the combination of the 3. I'm happy with it as is, but in the long term, I think we should be keying on the unique identifier.

@@ -0,0 +1,13 @@
[package]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a README to the crate?

tc: if !options.enable_tc_stats {
None
} else {
match tc::tc_stats() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.

@mmynk
Copy link
Contributor Author

mmynk commented Apr 10, 2024

@brianc118 not sure if you got a chance to read my last comment regarding the implementation of maj:min identifier for qdiscs: #8210 (comment)
I think that's the only remaining part for this PR. I'll add README in the meanwhile.

The struct `FqCodelXStats` is changed into enum.
The enum has type `FqCodelQdiscStats` which represents `tc_fq_codel_qd_stats`.
We can add type to represent `tc_fq_codel_cl_stats` when classful qdisc
support is added.
@mmynk
Copy link
Contributor Author

mmynk commented Apr 17, 2024

@brianc118 added README and refactored the fq_codel xstats definition. Could you please take another look?

I will refactor TcModel to be uniquely identifiable in a subsequent PR. Thanks!

Copy link
Contributor

@brianc118 brianc118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for this

below/model/src/tc_model.rs Outdated Show resolved Hide resolved
below/model/src/tc_model.rs Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mmynk
Copy link
Contributor Author

mmynk commented Apr 18, 2024

@brianc118 fixed the lint issues, for some reason the test seems to be running for over a day, would re-triggering help? regardless, could you please re-import to trigger the linter again? thanks!

@facebook-github-bot
Copy link
Contributor

@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@brianc118
Copy link
Contributor

@mmynk don't worry about the internal tests. I suspect the test hook isn't working as expected.

@mmynk
Copy link
Contributor Author

mmynk commented Apr 22, 2024

@brianc118 is the PR okay to be merged? just wanted to make sure if there is something I need to fix (lint or test). thanks!

@brianc118
Copy link
Contributor

@mmynk I was just working on vendoring the netlink-* crates internally which proved to be a pain. Expect to have this merged in the next few days.

@mmynk
Copy link
Contributor Author

mmynk commented Apr 22, 2024

@mmynk I was just working on vendoring the netlink-* crates internally which proved to be a pain.

Oh shoot! I wish I could help with that.

Expect to have this merged in the next few days.

Great, thanks for letting me know.

@facebook-github-bot
Copy link
Contributor

@brianc118 merged this pull request in 4acd748.

@mmynk mmynk deleted the feat/tc branch April 25, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants