makehashes: Regular expression not quoted #94

Closed
poeml opened this Issue Jun 5, 2015 · 0 comments

1 participant

@poeml
Owner
                                                                                                                                                                                   [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue94

Title    makehashes: Regular expression not quoted
 Priority   bug    Status   resolved
Superseder        Nosy List poeml, sascha_silbe, toma
Assigned To poeml Keywords

msg345 (view) Author: sascha_silbe Date: 2012-01-09.22:36:12

When running "mb -d makehashes /srv/upload -t /srv/mirrorbrain/hashes/srv/upload" on the server hosting http://download.sugarlabs.org/, MirrorBrain breaks with the following error:

[...]
2/QueryAll: SELECT filearr.path, hash.file_id
FROM filearr
LEFT JOIN hash
ON hash.file_id = filearr.id
WHERE filearr.path ~ '^services/gcc-c++/[^/]$'
2/QueryR : SELECT filearr.path, hash.file_id
FROM filearr
LEFT JOIN hash
ON hash.file_id = filearr.id
WHERE filearr.path ~ '^services/gcc-c++/[^/]
$'
2/COMMIT : auto
Traceback (most recent call last):
File "/usr/bin/mb", line 1638, in
r = mirrordoctor.main()
File "/usr/lib/pymodules/python2.6/cmdln.py", line 257, in main
return self.cmd(args)
File "/usr/lib/pymodules/python2.6/cmdln.py", line 280, in cmd
retval = self.onecmd(argv)
File "/usr/lib/pymodules/python2.6/cmdln.py", line 412, in onecmd
return self._dispatch_cmd(handler, argv)
File "/usr/lib/pymodules/python2.6/cmdln.py", line 1100, in _dispatch_cmd
return handler(argv[0], opts, *args)
File "/usr/bin/mb", line 1024, in do_makehashes
for i, j in mb.files.dir_filelist(self.conn, dst_dir_db)]
File "/usr/lib/pymodules/python2.6/mb/files.py", line 160, in dir_filelist
result = conn.Server._connection.queryAll(query)
File "/usr/lib/python2.6/dist-packages/sqlobject/dbconnection.py", line 356, in queryAll
return self._runWithConnection(self._queryAll, s)
File "/usr/lib/python2.6/dist-packages/sqlobject/dbconnection.py", line 256, in _runWithConnection
val = meth(conn, *args)
File "/usr/lib/python2.6/dist-packages/sqlobject/dbconnection.py", line 349, in _queryAll
self._executeRetry(conn, c, s)
File "/usr/lib/python2.6/dist-packages/sqlobject/dbconnection.py", line 335, in _executeRetry
return cursor.execute(query)
psycopg2.DataError: invalid regular expression: quantifier operand invalid

services/gcc-c++ is the name of a directory below /srv/upload:

silbe@sunjammer:~$ ls -d /srv/upload/services/gcc-c++
/srv/upload/services/gcc-c++

MirrorBrain should escape special (regular expression) characters in paths before using them as part of a regular expression.

Additional info:
The host is running MirrorBrain 2.15.0-1 on Ubuntu 10.04:

silbe@sunjammer:~$ lsb_release -ir
Distributor ID: Ubuntu
Release: 10.04
silbe@sunjammer:~$ dpkg -l mirrorbrain|grep ^ii
ii mirrorbrain 2.15.0-1 MirrorBrain is a scalable download redirector and Metalink generator.

msg358 (view) Author: toma Date: 2012-03-25.10:50:45

Bug confirmed. KDE runs into this one as well.

msg364 (view) Author: poeml Date: 2012-03-26.22:31:22

It seems to me that the only way to deal with this is to manually escape regexp
special characters in the path names.

select 'services/gcc-c++/a' ~ '**:^services/gcc-c++/[^/]$' as result;

result

t

There is no PostgreSQL function to do this, and there doesn't seem to be a way to
embed a literal string inside a regular expression.

msg377 (view) Author: poeml Date: 2012-04-11.21:22:20

So the task is to pass a regexp from Python to PostgreSQL, through SQLobject and psycopg2, that
contains some characters that need to be treated as literals.

So, let's pass all literal characters as literal characters! I.e., using octal \000 syntax.

Fixed in r8271.

Index: ../mb/mb/files.py

--- ../mb/mb/files.py (revision 8270)
+++ ../mb/mb/files.py (revision 8271)
@@ -1,5 +1,6 @@
from sqlobject.sqlbuilder import AND

+from mb import util

def has_file(conn, path, mirror_id):
"""check if file 'path' exists on mirror 'mirror_id'
@@ -156,7 +157,7 @@
FROM filearr
LEFT JOIN hash
ON hash.file_id = filearr.id

  • WHERE filearr.path ~ '^%s/[^/]*$'""" % path
  •           WHERE filearr.path ~ '^""" + util.pgsql_regexp_esc(path) +"""/[^/]*$'"""
    

    result = conn.Server._connection.queryAll(query)
    return result

    Index: ../mb/mb/util.py

    --- ../mb/mb/util.py (revision 8270)
    +++ ../mb/mb/util.py (revision 8271)
    @@ -210,3 +210,9 @@
    netloc = netloc.split('@')[1]
    return urlparse.urlunsplit((u[0], netloc, u[2], u[3], u[4]))

+def pgsql_regexp_esc(s):

  • if s:
  • return '\\' + '\\'.join(['%03o' % ord(c) for c in s])
  • else:
  • return s +
History
         Date             User     Action             Args
2012-04-11 21:22:20 poeml        set    status: chatting -> resolved
                                          messages: + msg377
2012-03-26 22:31:22 poeml        set    messages: + msg364
2012-03-26 21:47:00 poeml        set    assignedto: poeml
                                          nosy: + poeml
                                          status: unread -> chatting
2012-03-25 10:50:45 toma         set    nosy: + toma
                                          messages: + msg358
2012-01-09 22:36:14 sascha_silbe create

(end of migrated issue)
@poeml poeml added bug resolved labels Jun 5, 2015
@poeml poeml closed this Jun 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment