Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE IDEA] improving ports <1024 and >=1024 #59

Closed
candlerb opened this issue Mar 10, 2017 · 11 comments
Closed

[FEATURE IDEA] improving ports <1024 and >=1024 #59

candlerb opened this issue Mar 10, 2017 · 11 comments

Comments

@candlerb
Copy link
Contributor

nfdump has this feature:

-B
Like -b but automagically swaps flows, such that src port is > 1024 and dst port
is < 1024 as some exporters do not care sending the flows in proper order.

Unfortunately this doesn't work for services running on higher ports. For example, it's quite common to run a webserver on port 8080, or an NFS server on port 2049.

In theory, all ports below 49152 are available for IANA to assign to services. In practice, the Linux kernel defaults to using ports 32768 and above for ephemeral ports.

So my first thought was to update the algorithm with several tests:

  • if src port < 49152 and dst port >= 49152, then swap
  • if src port < 32768 and dst port >= 32768, then swap
  • if src port < 1024 and dst port >= 1024, then swap
  • else leave alone

But this begs the question, why not simplify it to this:

  • if src port < dst port, then swap!

That seems to work for the vast majority of cases. It would be rare to initiate a connection from a low port number to a higher port number; either the destination is way up in the ephemeral range, or the source has explicitly bound to a low port number. (Only examples I can think of are peer-to-peer apps and this)


This then suggests another feature. In nfdump you can aggregate by srcport, dstport, or any port. The latter counts each flow twice, once for src and once for dst.

It would be useful to have a new aggregate "minport" to aggregate on the lower of src and dst ports. This would in the vast majority of cases show you the service being used, without the extraneous noise of ephemeral ports mixed in. (A side effect is that flows both to and from that port would be aggregated, so heavy uploads and heavy downloads would both count)

However, I think this can be generalised.

Note that you can also currently aggregate by srcip, dstip or any ip. It would be possible to add new aggregates for the ip address corresponding to the lower port and the higher port in the packet respectively. This would show you traffic in/out of server and traffic in/out of client respectively.

But rather than adding a whole load of new aggregate types like "srvip" and "cliip", I think it would be better to have a single flag which changes the meaning of "src" and "dst" so that "src" is the side with the higher port, and "dst" is the side with the lower port.

Setting this flag, and aggregating on dstport, would give the minport behaviour I described above. Setting this flag and aggregating on dstip would give the total traffic in and out of a given server (i.e. "busiest server"). Aggregating on srcip would give the total traffic in and out of a given client (i.e. "busiest client")

Example:

srcip:srcport    dstip:dstport  bytes
1.2.3.4:45678 -> 192.0.2.1:80   200KB
19.2.0.1.1:80 -> 1.2.3.4:45678  400KB

with this flag becomes:

srcip:srcport    dstip:dstport  bytes
1.2.3.4:45678    192.0.2.1:80   200KB
1.2.3.4:45678    192.0.2.1:80   400KB
phaag pushed a commit that referenced this issue Nov 5, 2017
@phaag
Copy link
Owner

phaag commented Nov 5, 2017

Your proposal has been integrated. -B now swappes flows if if src port < dst port. II agree, that it has little impact and brings an advantage.

To implement minport, this needs a bit more work. It's on the todo list

@aldem
Copy link

aldem commented Feb 2, 2018

Just an idea for TCP flows: why not track SYN packets and determine who is the server?

@candlerb
Copy link
Contributor Author

candlerb commented Feb 2, 2018

Simple: because Netflow doesn't give you that information.

You get one flow from A to B, and a separate flow from B to A, without any information about which had SYN only and which had SYN ACK. The flow start times are probably not accurate enough to deduce it either.

(Some devices do bi-directional flows, like ASA NSEL. There's a separate issue regarding the meaning of "in" and "out" in that context)

@phaag
Copy link
Owner

phaag commented Dec 5, 2019

The repo code has now an improved handling for swapping flows. -B swaps flows only if:

  • protocol is TCP or UDP
  • src port < 1024 and dst port > 1024

HighPort traffic is untouched.

@candlerb
Copy link
Contributor Author

candlerb commented Dec 5, 2019

Do you mean it has been rolled back to the old behaviour of -B ? What's the reason?

@phaag
Copy link
Owner

phaag commented Dec 14, 2019

I rolled back to 1024 due to many requests having false swaps in high/high port connections.
Would that be too much of an issue for you? I am still flexibel to discuss. There is still -b and -B.
The swap is limited anyway to TCP/UDP - other protocols are not affected.
Would a flexibel condition help, such as a new filter term: ' <any flow filter and swap ( src port < dst port)' just as an idea to discuss.

@candlerb
Copy link
Contributor Author

Happy to discuss ideas. Obviously in the absence of bidirectional stateful flows, it's only a best guess. Even the current algorithm can be wrong: e.g. I see NFS traffic sourced from low ports, so it might originate from port 750 and connect to port 2049.

What reverting the change seems to say is: "there's no point swapping the ports if both the low and high port are over 1024". But services running on high ports (e.g. web servers on 8080) are not uncommon.

Maybe it's worth going back to the initial suggestion I made:

  • if src port < 1024 and dst port >= 1024, then swap; OR
  • if src port < 32768 and dst port >= 32768, then swap; OR
  • if src port < 49152 and dst port >= 49152, then swap

This is based on the observation that clients tend to use very high ports for ephemeral ports.

phaag added a commit that referenced this issue Feb 16, 2020
@phaag
Copy link
Owner

phaag commented Feb 16, 2020

ok - Agreed and added in repo.

@phaag phaag closed this as completed Feb 16, 2020
@foogitiff
Copy link

I think there is an issue in #dce3e36, instead or OR you are using AND, which make the -B option way worse than before.

@candlerb
Copy link
Contributor Author

I think what you are saying is

	   			   ( flow_record->srcport < 1024 ) && ( flow_record->dstport >= 1024 ) &&
	   			   ( flow_record->srcport < 32768 ) && ( flow_record->dstport >= 32768 ) &&
	   			   ( flow_record->srcport < 49152 ) && ( flow_record->dstport >= 49152 ))

should be

	   			   ( flow_record->srcport < 1024 ) && ( flow_record->dstport >= 1024 ) ||
	   			   ( flow_record->srcport < 32768 ) && ( flow_record->dstport >= 32768 ) ||
	   			   ( flow_record->srcport < 49152 ) && ( flow_record->dstport >= 49152 ))

Is that right?

I think you're right. As it stands, it's impossible for this condition to be true unless src_port is below 1024 and dest_port is above 49152.

@foogitiff
Copy link

Yes that's what I meant. It seems it was partially fixed in #215 , but I still see some possible issues here: https://github.com/phaag/nfdump/blob/master/bin/nfstat.c#L1728

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants