Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for pyre2 #11

Merged
merged 1 commit into from
Nov 6, 2018
Merged

Conversation

fim
Copy link
Contributor

@fim fim commented Jul 30, 2018

As part of a quick test I did to measure the performance of pyre2, I noticed
hat we spent a lot of time allocating objects and that we could improve
the performance simply by altering some of the C code to limit unnecessary
object allocations. The main culprit was Py_BuildValue but it seems like the
changes had a positive effect overall in terms of performance.

As part of that I did:

  • Deleted code that populated attributes of the attr_dict struct
  • Deleted code that initialized the struct itself

This is more of a request for comments about these changes, just to get some initial feedback based on the changes because I might be missing something obvious but as far as I can see nothing has broken and we have performance improvements ranging from 5% to 50% depending on the use case.

  • Before change:
In [1]: import re2

In [2]: import timeit

In [3]: r = re2.compile('\w+')

In [4]: %timeit r.match('foo bar')
1000000 loops, best of 3: 408 ns per loop

In [5]: r = re2.compile('(?P<test>\w*)')

In [6]: %timeit r.match('foo bar')
1000000 loops, best of 3: 419 ns per loop

In [7]: email_regex = '(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'

In [8]: r = re2.compile(email_regex)

In [9]: %timeit r.match('foo@bar.com')
1000000 loops, best of 3: 1.04 µs per loop
  • After change:
In [1]: import re2

In [2]: import timeit

In [3]: r = re2.compile('\w+')

In [4]: %timeit r.match('foo bar')
1000000 loops, best of 3: 178 ns per loop

In [5]: r = re2.compile('(?P<test>\w*)')

In [6]: %timeit r.match('foo bar')
1000000 loops, best of 3: 184 ns per loop

In [7]: email_regex = '(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'

In [8]: r = re2.compile(email_regex)

In [9]: %timeit r.match('foo@bar.com')
1000000 loops, best of 3: 774 ns per loop

PS: Also here's pprof execution graphs for timeit() test before and after the changes:
upstream-pprof

dev-pprof

@dreiss
Copy link
Member

dreiss commented Jul 30, 2018

Doesn't this break compatibility with programs that access m.pos, etc.?

@fim
Copy link
Contributor Author

fim commented Jul 30, 2018

With the current state of the PR, these are not exposed to the python module mostly because I wasn't sure what's the desired API/ABI that the re2 module should expose.

However, that info is already saved on the MatchObject2 and can be extracted really easy and efficiently with getter methods. Possible patch below:

diff --git a/_re2.cc b/_re2.cc
index 8e3d17d..673a3ba 100644
--- a/_re2.cc
+++ b/_re2.cc
@@ -39,6 +39,8 @@ using std::nothrow;
 using re2::RE2;
 using re2::StringPiece;
 
+#include <structmember.h>
+
 
 typedef struct _RegexpObject2 {
   PyObject_HEAD
@@ -59,6 +61,41 @@ typedef struct _MatchObject2 {
   StringPiece* groups;
 } MatchObject2;
 
+// getters for MatchObject2
+static PyObject* match_pos_get(MatchObject2* self)
+{
+  return PyLong_FromSsize_t(self->pos);
+}
+
+static PyObject* match_endpos_get(MatchObject2* self)
+{
+  return PyLong_FromSsize_t(self->endpos);
+}
+
+static PyObject* match_re_get(MatchObject2* self)
+{
+  if (self->re != NULL)
+    Py_INCREF(self->string);
+    return self->re;
+  Py_RETURN_NONE;
+}
+
+static PyObject* match_string_get(MatchObject2* self)
+{
+  if (self->string != NULL)
+    Py_INCREF(self->string);
+    return self->string;
+  Py_RETURN_NONE;
+}
+
+static PyGetSetDef match_getset[] = {
+    {"pos", (getter)match_pos_get, (setter)NULL},
+    {"endpos", (getter)match_endpos_get, (setter)NULL},
+    {"re",      (getter)match_re_get,      (setter)NULL},
+    {"string",      (getter)match_string_get,      (setter)NULL},
+    {NULL}
+};
+
 typedef struct _RegexpSetObject2 {
   PyObject_HEAD
   // True iff re2_set_obj has been compiled.
@@ -245,7 +282,7 @@ static PyTypeObject Match_Type2 = {
   0,                           /*tp_iternext*/
   match_methods,               /*tp_methods*/
   0,                           /*tp_members*/
-  0,                           /*tp_getset*/
+  match_getset,                /*tp_getset*/
 };
 
 static PyTypeObject RegexpSet_Type2 = {
@@ -475,13 +512,20 @@ create_match(PyObject* re, PyObject* string,
     StringPiece* groups)
 {
   MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
+
   if (match == NULL) {
     delete[] groups;
     return NULL;
   }
+  Py_INCREF(groups);
   match->groups = groups;
+  Py_INCREF(re);
   match->re = re;
+  Py_INCREF(string);
   match->string = string;
+  match->pos = pos;
+  match->endpos = endpos;
+
   return (PyObject*)match;
 }
 

That gives us:

In [1]: import re2

In [2]: r = re2.compile('\w+')

In [3]: m = r.match('foo bar')

In [4]: m.re
Out[4]: <_re2.RE2_Regexp at 0x7f096eb58618>

In [5]: m.string
Out[5]: 'foo bar'

In [6]: m.pos, m.endpos
Out[6]: (0L, 7L)

@dreiss
Copy link
Member

dreiss commented Jul 30, 2018

Cool. Can you update the PR with the full change? Unless @sunshowers has objections, I'm not going to worry about the performance of lazy access since these seem like they'd be rarely accessed.

@fim
Copy link
Contributor Author

fim commented Jul 31, 2018

Sounds good. In that case, I'll merge the proposed patch to the PR, add some unit tests and do some more thorough testing to make sure I haven't messed up with any object referencing and then push the update.

@fim
Copy link
Contributor Author

fim commented Aug 6, 2018

Any thoughts on this @sunshowers, @dreiss?


typedef struct _RegexpObject2 {
PyObject_HEAD
// __dict__. Simpler than implementing getattr and possibly faster.
PyObject* attr_dict;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the change necessary or beneficial? Regexp objects should be constructed very rarely in performance-sensitive programs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got that. The dict stores groups/groupindex and pattern fields. I didn't change the way re2 objects are constructed/destroyed.

However, based on your feedback I'll re-implement these fields for RegexpObject2 (probably using getter/setter to avoid the performance penalties)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. In general execution, you create the regex once and then just match against it! That's true but I think we can get some performance improvements out of this so, why not :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default choice should be to leave working code as-is. Let's try to keep this PR scoped to changes that have measurable benefits. If you think changing RegexpObject2 is useful, let's handle that separately.

_re2.cc Outdated
PyObject* re;
PyObject* string;
Py_ssize_t pos, endpos; /* current target slice */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put these on separate lines.

_re2.cc Outdated

static PyObject* match_re_get(MatchObject2* self)
{
if (self->re != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be null?

_re2.cc Outdated

static PyObject* match_string_get(MatchObject2* self)
{
if (self->string != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be null?

Py_RETURN_NONE;
}

static PyGetSetDef match_getset[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks weird.

_re2.cc Outdated
@@ -65,6 +61,43 @@ typedef struct _MatchObject2 {
StringPiece* groups;
} MatchObject2;

// getters for MatchObject2
static PyObject* match_pos_get(MatchObject2* self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the proper getter signature here and just cast to MatchObject2 internally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question but when you say "proper getter signature", what do you mean?

I had a look at other python extensions and the python documentation before implementing the getters and they both suggest using this kind of signatures. Eg: https://docs.python.org/2.7/extending/newtypes.html

_re2.cc Outdated
@@ -216,14 +249,6 @@ static PyTypeObject Regexp_Type2 = {
regexp_methods, /*tp_methods*/
0, /*tp_members*/
0, /*tp_getset*/
0, /*tp_base*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default all omitted entries are 0 and since we're no longer setting the attr_dict I thought i'd trim down the list to be consistent with the other typedef in the source. I can zero the fields out and leave it as is though if that's preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd leave it. Explicit is better than implicit.

_re2.cc Outdated
@@ -259,15 +284,7 @@ static PyTypeObject Match_Type2 = {
0, /*tp_iternext*/
match_methods, /*tp_methods*/
0, /*tp_members*/
0, /*tp_getset*/
0, /*tp_base*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we no longer have the attr_dict so I trimmed the PyType object after the last member we manually set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave the zeroes? I think explicit is better than implicit in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

// to groupindex.
regexp->attr_dict = Py_BuildValue("{sisNsO}",
"groups", regexp->re2_obj->NumberOfCapturingGroups(),
"groupindex", groupindex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that groupindex no longer works? In case it wasn't clear before, I don't think this change should be breaking any functionality that currently works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address that.

I will do some checks to see if the attr_dict is performant enough otherwise I'll just extend the RegexObject2 definition similiar to MatrchObject2

@fim
Copy link
Contributor Author

fim commented Aug 9, 2018

Updated the PR and unit tests to cover the re attributes as well.

@fim
Copy link
Contributor Author

fim commented Oct 1, 2018

@dreiss have you had any time to review my latest changes?

@ivan
Copy link

ivan commented Oct 9, 2018

I have nothing to do with this project, but I tested your PR:

pyre2 1.0.6:

# ~/re2-venv/bin/python
Python 3.7.0 (default, Sep 28 2018, 07:40:17)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re2
>>> re2.compile(")")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/grab/re2-venv/lib/python3.7/site-packages/re2.py", line 56, in compile
    return _compile(pattern, error)
re.error: 6

with this PR applied:

# ~/re2-venv/bin/python
Python 3.7.0 (default, Sep 28 2018, 07:40:17)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re2
>>> re2.compile(")")
zsh: segmentation fault (core dumped)  ~/re2-venv/bin/python

I tested with a pyenv-built Python 3.7.0 on an amd64 Debian 9.5 machine.

# dpkg -l | grep re2
ii  libre2-3:amd64                  20170101+dfsg-1                   amd64        efficient, principled regular expression library
ii  libre2-dev                      20170101+dfsg-1                   amd64        efficient, principled regular expression library, development kit

I see the same with a pyenv-built Python 3.7.0 on an amd64 Ubuntu 16.04.5 machine.

# dpkg -l | grep re2
ii  libre2-1v5:amd64                 20160201+dfsg-1                            amd64        efficient, principled regular expression library
ii  libre2-dev                       20160201+dfsg-1                            amd64        efficient, principled regular expression library, development kit

ivan added a commit to ArchiveTeam/grab-site that referenced this pull request Oct 9, 2018
Copy link
Member

@dreiss dreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to repro that crash?

_re2.cc Outdated
typedef struct _RegexpSetObject2 {
PyObject_HEAD
// True iff re2_set_obj has been compiled.
bool compiled;
RE2::Set* re2_set_obj;
} RegexpSetObject2;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the extra lines in order to be consistent with spacing but I'll revert all style changes (including the tab -> space change)

_re2.cc Outdated
@@ -96,7 +119,6 @@ static PyObject* regexp_set_add(RegexpSetObject2* self, PyObject* pattern);
static PyObject* regexp_set_compile(RegexpSetObject2* self);
static PyObject* regexp_set_match(RegexpSetObject2* self, PyObject* text);


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

_re2.cc Outdated
@@ -170,9 +192,9 @@ static int
_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
(void)name;
(void)v;
PyErr_Format(PyExc_AttributeError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as bove

_re2.cc Outdated
@@ -259,15 +284,7 @@ static PyTypeObject Match_Type2 = {
0, /*tp_iternext*/
match_methods, /*tp_methods*/
0, /*tp_members*/
0, /*tp_getset*/
0, /*tp_base*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave the zeroes? I think explicit is better than implicit in this case.


static PyObject* regexp_pattern_get(RegexpObject2* self)
{
return self->pattern;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this incref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you are correct this should increase the reference count. Not sure why/how I missed it but I'll fix it!

@fim
Copy link
Contributor Author

fim commented Oct 12, 2018

I tried various python versions (3.5, 3.6 and 3.7) and the segfault only happens in 3.7. I'll take a look into this and look at what might have changed in python3.7 to cause this behavior.

@fim
Copy link
Contributor Author

fim commented Oct 12, 2018

I think PyObject_New() up to python3.6 initialized all the fields defined in the TYPE object that we pass to it which resulted in all PyObject * fields defined in the RegexpObject2 struct to be null by default. For some reason it looks like the value for groupindex in python3.7 is initialized to be 0x1 which in turn segfaults when we call Py_XDECREF() for it since we expect either NULL or a valid pointer.

We can easily work around that by nulling the field after PyObject_New() but I want to dig a bit deeper first in order to understand why that is happening and whether it might indicate there is a more systematic problem with either our approach or something in Python's object initialization process.

On the bright side, while testing on python-3.7, I noticed the following:

Python 3.7.0 (default, Oct 12 2018, 06:58:16) 
[...]
In [1]: import re                                                                                                                                                                                                   
In [2]: import re2                                                                                                                                                                                                  
In [3]: r2 = re2.compile('\w+')                                                                                                                                                                                     
In [4]: r = re.compile('\w+')                                                                                                                                                                                       

In [5]: %timeit r.match('foo bar')                                                                                                                                                                                  
319 ns ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit r2.match('foo bar')                                                                                                                                                                                 
199 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

@fim
Copy link
Contributor Author

fim commented Oct 25, 2018

I haven't had the chance to look into the new way memory allocation/reference handling works in Python 3.7 yet (life and work got in the way) so I've updated the PR with the fix and the suggested changes (removed style fixes, readded the empty struct members for PyTypeObjects etc).

I'll try to look into the python object initialization at another time but I don't think that should hold up this set of changes.

@ivan can you check to see if you get any issues with this latest version?

@fim fim force-pushed the re2_speedup branch 2 times, most recently from 3555007 to 1f42a38 Compare November 5, 2018 08:15
@fim
Copy link
Contributor Author

fim commented Nov 5, 2018

I also checked with python core, their suggestion was: if we need the field to be NULL, mark it as NULL explicitly which is what I did. Otherwise, we could use tp_alloc which initializes memory as well but that'll be slower.

@dreiss do you mind taking another look ?

_re2.cc Outdated
// consistent across Py3 and Py2.
PyObject* index = PyLong_FromLong(it->second);
if (index == NULL) {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to immediately return NULL (after decref) here instead of storing an incomplete groupindex in self.

_re2.cc Outdated
break;
}
}
Py_INCREF(groupindex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Shouldn't we transfer the reference that we got from PyDict_New?

_re2.cc Show resolved Hide resolved
As part of a quick test I did to measure the perfomance of pyre2, I noticed
thathat we spent a lot of time allocating objects and that we could improve
thethe performance simply by altering some of the C code to limit unnecessary
object allocations. The main culprit was Py_BuildValue but it seems like the
changes had a positive effect overall in terms of performance.

As part of that I did:

* Deleted code that populated attributes of the __dict__ struct
* Deleted code that initialized the struct itself
@fim
Copy link
Contributor Author

fim commented Nov 6, 2018

Addressed latest comments.

@dreiss dreiss merged commit 0d4980c into facebook:master Nov 6, 2018
noxiouz added a commit to noxiouz/pyre2 that referenced this pull request Jan 22, 2019
Changelog:
* Performance improvements facebook#11 
* Bugfixes facebook#12
@noxiouz noxiouz mentioned this pull request Jan 22, 2019
dreiss pushed a commit that referenced this pull request Feb 5, 2019
Changelog:
* Performance improvements #11 
* Bugfixes #12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants