Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache fields initialization #642

Merged
merged 1 commit into from Oct 30, 2018

Conversation

Projects
5 participants
@guedou
Copy link
Member

commented May 4, 2017

This PR attempts to enhance Scapy performance by caching the initialization of packets fields. Currently, they are initialized at each packet instantiation.

The benchmarks look quite nice and indicate that we can gain between 15% and 35% execution time depending on the use case.

Type  Before - 2.7   Before - 3.6 After - 2.7 After - 3.6
Build 15.77s 12.02s 11.28s (+28%) 9.87s (+17% / + 37%)
Dissect  6.84s 4.62s 4.50s (+34%) 3.82s (+17% / +44%)
Build & dissect 23.58s  17.52s 16.04s (+31%) 14.69s (+16% / +38%)

Here is the script used to bench this PR:

import time

from scapy.all import *
from scapy.modules.six.moves import range

N = 20000
raw_packet = b'E\x00\x00(\x00\x01\x00\x00@\x11|\xc2\x7f\x00\x00\x01\x7f\x00\x00\x01\x005\x005\x00\x14\x00Z\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00'

start = time.time()
for i in range(N):
    p = IP() / UDP() / DNS()
    assert raw(p) == raw_packet
print("Build - %.2fs" % (time.time() - start))

start = time.time()
for i in range(N):
    p = IP(raw_packet)
    assert DNS in p
print("Dissect - %.2fs" % (time.time() - start))

start = time.time()
for i in range(N):
    p = IP() / UDP() / DNS()
    s = raw(p)
    assert s == raw_packet
    p = IP(s)
    assert DNS in p
print("Build & dissect - %.2fs" % (time.time() - start))

fixes #619

@guedou guedou added the enhancement label May 4, 2017

@guedou guedou requested a review from p-l- May 4, 2017

@guedou guedou force-pushed the guedou:cache_init_fields branch from 74ff67e to f1e86c4 May 4, 2017

@@ -11,7 +11,7 @@ gtp.dport == 2123 and gtp.teid == 2807 and len(gtp.IE_list) == 5

= GTPCreatePDPContextRequest(), basic dissection
random.seed(0x2807)
str(gtp) == b"E\x00\x00O\x00\x01\x00\x00@\x11|\x9b\x7f\x00\x00\x01\x7f\x00\x00\x01\x08K\x08K\x00;{N2\x10\x00+\x00\x00\n\xf7\xd2y\x00\x00\x10\xf8>\x14\x05\x14\t\x85\x00\x04\xa6A\xd8+\x85\x00\x04z\xafnt\x87\x00\x0fxKbPaePK9oq0pb5"
str(gtp) == b'E\x00\x00O\x00\x01\x00\x00@\x11|\x9b\x7f\x00\x00\x01\x7f\x00\x00\x01\x08K\x08K\x00;\x97\xbd2\x10\x00+\x00\x00\n\xf7\xd2y\x00\x00\x10\xdeM\xb8\xf5\x14\x0f\x85\x00\x04\xabyk\xc1\x85\x00\x04\xb8\xcf\x96\xfe\x87\x00\x0f9Co27Fbj65eKHyQ'

This comment has been minimized.

Copy link
@p-l-

p-l- May 4, 2017

Member

What's the reason of this change?

This comment has been minimized.

Copy link
@guedou

guedou May 5, 2017

Author Member

A RandShort() is consumed while the Packet fields are cached. Therefore, the randomly built packet is slightly different after the patch.

@gpotter2

This comment has been minimized.

Copy link
Member

commented May 4, 2017

Looks like an amazing PR ! Looking forward to it

@guedou Could you have a small look at #619 to see if you could implement the idea with your system ?

PS: you were right about wireshark, #644 failed too...

@guedou

This comment has been minimized.

Copy link
Member Author

commented May 5, 2017

While debugging the HTTP2 issue @X-Cli found this corner case which is not a smaller version of the issue:

$ cat test_642.py
from scapy.all import *

print '-----'

class SmallPacket(Packet):
  name = 'Small Packet'
  fields_desc = [ ByteField('byte', 0) ]

class TestPacket(Packet):
  name = 'TestPacket'
  fields_desc = [ PacketListField('list', [], SmallPacket) ]

a = TestPacket()
a.list.append(SmallPacket('a'))
a.list.append(SmallPacket('b'))
a.show()

print '-----'
TestPacket().show()
$ python test_642.py
-----
###[ TestPacket ]###
  \list      \
   |###[ Small Packet ]###
   |  byte      = 97
   |###[ Small Packet ]###
   |  byte      = 98

-----
###[ TestPacket ]###
  \list      \
   |###[ Small Packet ]###
   |  byte      = 97
   |###[ Small Packet ]###
   |  byte      = 98

@gpotter2

This comment has been minimized.

Copy link
Member

commented May 27, 2017

@guedou Hey I've found a fix for that 😄

Edited: Hey no it did not work... Sorry @X-Cli

@guedou

This comment has been minimized.

Copy link
Member Author

commented May 27, 2017

@guedou guedou force-pushed the guedou:cache_init_fields branch from f1e86c4 to f8cd268 May 30, 2017

@guedou

This comment has been minimized.

Copy link
Member Author

commented May 30, 2017

I have been working on this new patch with @X-Cli but forgot to push it ... It fixes the issue triggered by http2.uts but can slightly change the output of repr().

@gpotter2 & @p-l- I have several questions:

  1. is it ok to cache in the Packet class ? We can also cache in subclass like IP or UDP
  2. do you agree to alter the repr() behavior ?
@codecov-io

This comment has been minimized.

Copy link

commented May 30, 2017

Codecov Report

❗️ No coverage uploaded for pull request base (master@e64b261). Click here to learn what that means.
The diff coverage is 98.11%.

@@           Coverage Diff            @@
##             master    #642   +/-   ##
========================================
  Coverage          ?   85.3%           
========================================
  Files             ?     179           
  Lines             ?   42300           
  Branches          ?       0           
========================================
  Hits              ?   36082           
  Misses            ?    6218           
  Partials          ?       0
Impacted Files Coverage Δ
scapy/packet.py 77.11% <98.11%> (ø)
@p-l-

This comment has been minimized.

Copy link
Member

commented Jun 1, 2017

@guedou: Can you rebase against current master?

Also, can you be more specific about the changes to repr() output: what changes and why?

@guedou guedou force-pushed the guedou:cache_init_fields branch from f8cd268 to 0d9b714 Jun 1, 2017

@guedou

This comment has been minimized.

Copy link
Member Author

commented Jun 1, 2017

With the patch, repr() displays the options field as if it was specified by the user. I did not find a way to keep the old behavior. I think that we can introduce this regression as repr() already display more fields than specified (i.e. repr(IP()/TCP()).

Before:

>>> repr(IP())
'<IP  |>'

After:

>>> repr(IP())
'<IP  options=[] |>'
@X-Cli

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2017

Hello,
Just to clarify and shed more light on the problem: @guedou's patch uncovered a very old bug in Scapy.
In fact, in Scapy master revision, any field whose default value has an internal representation as a mutable object (anything except str, int and float, really) is affected by this bug.

When someone fetches the internal representation of such a field and there is no user data for this field, then the returned value is a reference to that mutable object. Any alteration of that object leads to the corruption/alteration of the default value of that field for that Packet.
Before this patch, this was not really a problem, because the default value of each field of each Packet was unique to each field instance of each Packet instance. Almost no one would ever notice this.

@guedou's patch makes all instances of a given Packet type to share the same reference to a unique default value object. So, when one of the Packet instance returns the reference to that shared object, and the referenced object is mutated, the default value of all Packet of the same type are affected by this modification. This is what was happening in the earlier code excerpt that @guedou posted.

To work around this, I submitted a patch to @guedou that deepcopies into self.fields the default value of a field upon a Packet type first instantiation, if that field internal representation of the default value is a mutable object. Unfortunately, this messes with repr() because it displays all fields which contain user values (i.e. values that are in self.fields).

@X-Cli

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2017

Yeah, no. Your patch is similar to one of my early ideas to fix this issue. Unfortunately, it does not work. Returning a new mutable object upon __getattr__ on a field that contains only the default mutable value prevents the default value from being corrupted. However, this also breaks the expectation that self.mylist.append(...) would work uniformely (i.e. when there is already a user value in self.fields vs when there is none).

@gpotter2

This comment has been minimized.

Copy link
Member

commented Jun 1, 2017

@X-Cli (my bad for the fix)
Question:
What if i suddenly want to use hide_defaults ? (it breaks your part :/)

@gpotter2

This comment has been minimized.

Copy link
Member

commented Jun 1, 2017

@X-Cli Ok, here's another possibility:

  • we keep your change
  • we mark hide_defaults as useless / remove it
  • we add:
diff --git a/scapy/packet.py b/scapy/packet.py
index debbee7..cf95e6d 100644
--- a/scapy/packet.py
+++ b/scapy/packet.py
@@ -323,9 +323,17 @@ class Packet(BasePacket):
             if isinstance(f, ConditionalField) and not f._evalcond(self):
                 continue
             if f.name in self.fields:
-                val = f.i2repr(self, self.fields[f.name])
+                _fval = self.fields[f.name]
+                _def = self.default_fields[f.name]
+                if _fval.__class__ == _def.__class__ and _fval == _def:
+                    continue
+                val = f.i2repr(self, _fval)

             elif f.name in self.overloaded_fields:
-                val =  f.i2repr(self, self.overloaded_fields[f.name])
+                _over = self.overloaded_fields[f.name]
+                _def = self.default_fields[f.name]
+                if _over.__class__ == _def.__class__ and _over == _def:
+                    continue
+                val =  f.i2repr(self, _over)
             else:
                 continue
             if isinstance(f, Emph) or f in conf.emph:

to keep the same repr behavior ?
(this checks if the value is the same than the default one)...

The triggered question then is the lost of performances... I guess that as __repr__ is only used to show the results, it could be a possible way of doing the things...

@X-Cli

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2017

That sounds good to me. I would probably recommend the use of type(_fval) instead of _fval.__class__, though.
@guedou, @p-l-, any thoughts?

@guedou guedou force-pushed the guedou:cache_init_fields branch from 0d9b714 to b8b2d12 Jun 2, 2017

@guedou

This comment has been minimized.

Copy link
Member Author

commented Jun 2, 2017

Here is the updated version. I don't know what to do with hide_defaults() =/

@gpotter2

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

We can simply remove it as defaults are hidden automatically.

@guedou guedou closed this Sep 11, 2017

@guedou guedou deleted the guedou:cache_init_fields branch Sep 11, 2017

@guedou guedou restored the guedou:cache_init_fields branch Sep 12, 2017

@guedou guedou reopened this Sep 12, 2017

@guedou guedou force-pushed the guedou:cache_init_fields branch from b8b2d12 to 985af22 Sep 12, 2017

@gpotter2

This comment has been minimized.

Copy link
Member

commented Apr 28, 2018

So. Where are we here ?

@guedou

This comment has been minimized.

Copy link
Member Author

commented May 2, 2018

I need to find time to port it to Python3.

@guedou guedou force-pushed the guedou:cache_init_fields branch from 985af22 to 0a89cee May 31, 2018

@gpotter2

This comment has been minimized.

Copy link
Member

commented Jun 21, 2018

I have done several changes to MultipleTypeField in #1485, will check if it fixes anything

@gpotter2 gpotter2 added this to To do in Release 2.4.1 Jun 25, 2018

@guedou guedou force-pushed the guedou:cache_init_fields branch 2 times, most recently from 98e2b2d to df65e66 Oct 14, 2018

@guedou

This comment has been minimized.

Copy link
Member Author

commented Oct 15, 2018

@p-l- @gpotter2 ready to be reviewed and discussed!

p.hide_defaults()
assert(repr(p) in ["<IP frag=0 proto=icmp |<ICMP |>>", "<IP frag=0 proto=1 |<ICMP |>>"])
assert(repr(p) == "<IP ttl=42 proto=icmp |<ICMP |>>")

This comment has been minimized.

Copy link
@gpotter2

gpotter2 Oct 15, 2018

Member

So what does hide_defaults() do now ?

This comment has been minimized.

Copy link
@guedou

guedou Oct 15, 2018

Author Member

It is broken. You are correct.

@guedou guedou force-pushed the guedou:cache_init_fields branch 6 times, most recently from 584598d to 2c7d6cb Oct 15, 2018

@guedou guedou force-pushed the guedou:cache_init_fields branch from 2c7d6cb to e905cb8 Oct 23, 2018

@guedou

This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2018

I made changes that will make codacity happy.

@gpotter2
Copy link
Member

left a comment

Amazing PR ! All good to me

@guedou guedou force-pushed the guedou:cache_init_fields branch from e905cb8 to de07974 Oct 24, 2018

@p-l-

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

Just restarted a failed test, thanks @guedou!

@p-l- p-l- merged commit d91b3c9 into secdev:master Oct 30, 2018

3 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Release 2.4.1 automation moved this from To do to Done Oct 30, 2018

@guedou guedou referenced this pull request Dec 4, 2018

Closed

Remove raw() #1728

gpotter2 added a commit to gpotter2/scapy that referenced this pull request Dec 22, 2018

@gpotter2 gpotter2 referenced this pull request Dec 22, 2018

Merged

Random cleanups #1757

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.