Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache fields initialization #642

Merged
merged 1 commit into from
Oct 30, 2018
Merged

Cache fields initialization #642

merged 1 commit into from
Oct 30, 2018

Conversation

guedou
Copy link
Member

@guedou guedou commented May 4, 2017

This PR attempts to enhance Scapy performance by caching the initialization of packets fields. Currently, they are initialized at each packet instantiation.

The benchmarks look quite nice and indicate that we can gain between 15% and 35% execution time depending on the use case.

Type  Before - 2.7   Before - 3.6 After - 2.7 After - 3.6
Build 15.77s 12.02s 11.28s (+28%) 9.87s (+17% / + 37%)
Dissect  6.84s 4.62s 4.50s (+34%) 3.82s (+17% / +44%)
Build & dissect 23.58s  17.52s 16.04s (+31%) 14.69s (+16% / +38%)

Here is the script used to bench this PR:

import time

from scapy.all import *
from scapy.modules.six.moves import range

N = 20000
raw_packet = b'E\x00\x00(\x00\x01\x00\x00@\x11|\xc2\x7f\x00\x00\x01\x7f\x00\x00\x01\x005\x005\x00\x14\x00Z\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00'

start = time.time()
for i in range(N):
    p = IP() / UDP() / DNS()
    assert raw(p) == raw_packet
print("Build - %.2fs" % (time.time() - start))

start = time.time()
for i in range(N):
    p = IP(raw_packet)
    assert DNS in p
print("Dissect - %.2fs" % (time.time() - start))

start = time.time()
for i in range(N):
    p = IP() / UDP() / DNS()
    s = raw(p)
    assert s == raw_packet
    p = IP(s)
    assert DNS in p
print("Build & dissect - %.2fs" % (time.time() - start))

fixes #619

@@ -11,7 +11,7 @@ gtp.dport == 2123 and gtp.teid == 2807 and len(gtp.IE_list) == 5

= GTPCreatePDPContextRequest(), basic dissection
random.seed(0x2807)
str(gtp) == b"E\x00\x00O\x00\x01\x00\x00@\x11|\x9b\x7f\x00\x00\x01\x7f\x00\x00\x01\x08K\x08K\x00;{N2\x10\x00+\x00\x00\n\xf7\xd2y\x00\x00\x10\xf8>\x14\x05\x14\t\x85\x00\x04\xa6A\xd8+\x85\x00\x04z\xafnt\x87\x00\x0fxKbPaePK9oq0pb5"
str(gtp) == b'E\x00\x00O\x00\x01\x00\x00@\x11|\x9b\x7f\x00\x00\x01\x7f\x00\x00\x01\x08K\x08K\x00;\x97\xbd2\x10\x00+\x00\x00\n\xf7\xd2y\x00\x00\x10\xdeM\xb8\xf5\x14\x0f\x85\x00\x04\xabyk\xc1\x85\x00\x04\xb8\xcf\x96\xfe\x87\x00\x0f9Co27Fbj65eKHyQ'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A RandShort() is consumed while the Packet fields are cached. Therefore, the randomly built packet is slightly different after the patch.

@gpotter2
Copy link
Member

gpotter2 commented May 4, 2017

Looks like an amazing PR ! Looking forward to it

@guedou Could you have a small look at #619 to see if you could implement the idea with your system ?

PS: you were right about wireshark, #644 failed too...

@guedou
Copy link
Member Author

guedou commented May 5, 2017

While debugging the HTTP2 issue @X-Cli found this corner case which is not a smaller version of the issue:

$ cat test_642.py
from scapy.all import *

print '-----'

class SmallPacket(Packet):
  name = 'Small Packet'
  fields_desc = [ ByteField('byte', 0) ]

class TestPacket(Packet):
  name = 'TestPacket'
  fields_desc = [ PacketListField('list', [], SmallPacket) ]

a = TestPacket()
a.list.append(SmallPacket('a'))
a.list.append(SmallPacket('b'))
a.show()

print '-----'
TestPacket().show()
$ python test_642.py
-----
###[ TestPacket ]###
  \list      \
   |###[ Small Packet ]###
   |  byte      = 97
   |###[ Small Packet ]###
   |  byte      = 98

-----
###[ TestPacket ]###
  \list      \
   |###[ Small Packet ]###
   |  byte      = 97
   |###[ Small Packet ]###
   |  byte      = 98

@gpotter2
Copy link
Member

gpotter2 commented May 27, 2017

@guedou I'm gonna try to get this fixed

@guedou
Copy link
Member Author

guedou commented May 27, 2017 via email

@guedou
Copy link
Member Author

guedou commented May 30, 2017

I have been working on this new patch with @X-Cli but forgot to push it ... It fixes the issue triggered by http2.uts but can slightly change the output of repr().

@gpotter2 & @p-l- I have several questions:

  1. is it ok to cache in the Packet class ? We can also cache in subclass like IP or UDP
  2. do you agree to alter the repr() behavior ?

@codecov-io
Copy link

codecov-io commented May 30, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@e64b261). Click here to learn what that means.
The diff coverage is 98.11%.

@@           Coverage Diff            @@
##             master    #642   +/-   ##
========================================
  Coverage          ?   85.3%           
========================================
  Files             ?     179           
  Lines             ?   42300           
  Branches          ?       0           
========================================
  Hits              ?   36082           
  Misses            ?    6218           
  Partials          ?       0
Impacted Files Coverage Δ
scapy/packet.py 77.11% <98.11%> (ø)

@p-l-
Copy link
Member

p-l- commented Jun 1, 2017

@guedou: Can you rebase against current master?

Also, can you be more specific about the changes to repr() output: what changes and why?

@guedou
Copy link
Member Author

guedou commented Jun 1, 2017

With the patch, repr() displays the options field as if it was specified by the user. I did not find a way to keep the old behavior. I think that we can introduce this regression as repr() already display more fields than specified (i.e. repr(IP()/TCP()).

Before:

>>> repr(IP())
'<IP  |>'

After:

>>> repr(IP())
'<IP  options=[] |>'

@X-Cli
Copy link
Contributor

X-Cli commented Jun 1, 2017

Hello,
Just to clarify and shed more light on the problem: @guedou's patch uncovered a very old bug in Scapy.
In fact, in Scapy master revision, any field whose default value has an internal representation as a mutable object (anything except str, int and float, really) is affected by this bug.

When someone fetches the internal representation of such a field and there is no user data for this field, then the returned value is a reference to that mutable object. Any alteration of that object leads to the corruption/alteration of the default value of that field for that Packet.
Before this patch, this was not really a problem, because the default value of each field of each Packet was unique to each field instance of each Packet instance. Almost no one would ever notice this.

@guedou's patch makes all instances of a given Packet type to share the same reference to a unique default value object. So, when one of the Packet instance returns the reference to that shared object, and the referenced object is mutated, the default value of all Packet of the same type are affected by this modification. This is what was happening in the earlier code excerpt that @guedou posted.

To work around this, I submitted a patch to @guedou that deepcopies into self.fields the default value of a field upon a Packet type first instantiation, if that field internal representation of the default value is a mutable object. Unfortunately, this messes with repr() because it displays all fields which contain user values (i.e. values that are in self.fields).

@X-Cli
Copy link
Contributor

X-Cli commented Jun 1, 2017

Yeah, no. Your patch is similar to one of my early ideas to fix this issue. Unfortunately, it does not work. Returning a new mutable object upon __getattr__ on a field that contains only the default mutable value prevents the default value from being corrupted. However, this also breaks the expectation that self.mylist.append(...) would work uniformely (i.e. when there is already a user value in self.fields vs when there is none).

@gpotter2
Copy link
Member

gpotter2 commented Jun 1, 2017

@X-Cli (my bad for the fix)
Question:
What if i suddenly want to use hide_defaults ? (it breaks your part :/)

@gpotter2
Copy link
Member

gpotter2 commented Jun 1, 2017

@X-Cli Ok, here's another possibility:

  • we keep your change
  • we mark hide_defaults as useless / remove it
  • we add:
diff --git a/scapy/packet.py b/scapy/packet.py
index debbee7..cf95e6d 100644
--- a/scapy/packet.py
+++ b/scapy/packet.py
@@ -323,9 +323,17 @@ class Packet(BasePacket):
             if isinstance(f, ConditionalField) and not f._evalcond(self):
                 continue
             if f.name in self.fields:
-                val = f.i2repr(self, self.fields[f.name])
+                _fval = self.fields[f.name]
+                _def = self.default_fields[f.name]
+                if _fval.__class__ == _def.__class__ and _fval == _def:
+                    continue
+                val = f.i2repr(self, _fval)

             elif f.name in self.overloaded_fields:
-                val =  f.i2repr(self, self.overloaded_fields[f.name])
+                _over = self.overloaded_fields[f.name]
+                _def = self.default_fields[f.name]
+                if _over.__class__ == _def.__class__ and _over == _def:
+                    continue
+                val =  f.i2repr(self, _over)
             else:
                 continue
             if isinstance(f, Emph) or f in conf.emph:

to keep the same repr behavior ?
(this checks if the value is the same than the default one)...

The triggered question then is the lost of performances... I guess that as __repr__ is only used to show the results, it could be a possible way of doing the things...

@X-Cli
Copy link
Contributor

X-Cli commented Jun 1, 2017

That sounds good to me. I would probably recommend the use of type(_fval) instead of _fval.__class__, though.
@guedou, @p-l-, any thoughts?

@guedou
Copy link
Member Author

guedou commented Jun 2, 2017

Here is the updated version. I don't know what to do with hide_defaults() =/

@gpotter2
Copy link
Member

gpotter2 commented Jun 2, 2017

We can simply remove it as defaults are hidden automatically.

@guedou guedou closed this Sep 11, 2017
@guedou guedou deleted the cache_init_fields branch September 11, 2017 14:04
@guedou guedou restored the cache_init_fields branch September 12, 2017 06:54
@guedou guedou reopened this Sep 12, 2017
@gpotter2
Copy link
Member

So. Where are we here ?

@guedou
Copy link
Member Author

guedou commented May 2, 2018

I need to find time to port it to Python3.

@guedou
Copy link
Member Author

guedou commented May 31, 2018

@gpotter2 Here is a version that works with Python3. Unfortunately, it does not work with MultiFlagField introduced int #1431

@p-l- can you find a way to fix it ? Given the performances benefit, that is definitely a PR that we need to merge.

@@ -463,6 +464,9 @@ x = IP(dst="8.8.8.8")/fuzz(UDP()/NTP(version=4))
x.show2()
x = IP(raw(x))
assert NTP in x
=======
str(fuzz(IP()/ICMP())) == '5\xe1\x00\x1c\x9dC@\x007\x01\xb7\xba\x7f\x00\x00\x01\x7f\x00\x00\x01*\xdb\xf7,9\x8e\xa4i'
>>>>>>> Cache fields initialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflicts

@gpotter2
Copy link
Member

gpotter2 commented Jun 21, 2018

I have done several changes to MultipleTypeField in #1485, will check if it fixes anything

@guedou
Copy link
Member Author

guedou commented Oct 15, 2018

@p-l- @gpotter2 ready to be reviewed and discussed!

p.hide_defaults()
assert(repr(p) in ["<IP frag=0 proto=icmp |<ICMP |>>", "<IP frag=0 proto=1 |<ICMP |>>"])
assert(repr(p) == "<IP ttl=42 proto=icmp |<ICMP |>>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what does hide_defaults() do now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is broken. You are correct.

@guedou guedou force-pushed the cache_init_fields branch 6 times, most recently from 584598d to 2c7d6cb Compare October 16, 2018 08:14
@guedou
Copy link
Member Author

guedou commented Oct 23, 2018

I made changes that will make codacity happy.

Copy link
Member

@gpotter2 gpotter2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR ! All good to me

@p-l-
Copy link
Member

p-l- commented Oct 30, 2018

Just restarted a failed test, thanks @guedou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplication of dissect in show2() in case Packet was made from buffer.
5 participants