New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CkMinions with a sensible parser #10953

Closed
dw opened this Issue Mar 4, 2014 · 10 comments

Comments

Projects
None yet
5 participants
@dw

dw commented Mar 4, 2014

Hi,

Today while digging into CkMinions to try and trace the source of a problem, I was horrified to discover _check_compound_minions method. I can't really begin to comment on all the things wrong with this code, it's enough to realize that I stood no chance whatsoever of tracing my problem in a reasonable timeframe.

I'd like to suggest replacing it with a new implementation, either hand-written, or using an off-the-shelf parser. This problem is sufficiently frustrating to my day job that I spent time at home tonight prototyping a replacement.

The code sample below relies on the third party Parsley library, is 6 times smaller than the existing mess, and various bugs like the inability to specify "not" at the start of an expression are removed. Notice that this compound parser contains sub-parsers for the remaining Salt matcher syntaxes.

Although just a prototype, it is functionally complete, and the code size is representative of the weight of a complete implementation. Mostly what's missing is an implementation of the (trivial) MinionInfo class.

import fnmatch
import re
import parsley


_GRAMMAR_SOURCE = """
    wschar = ' ' | '\\t' | '\\n' | '\\r' | '\\v'
    bws = wschar+

    colon_word = <(~':' anything)+>
    comma_word = <~((',' | wschar) anything)+>
    ws_word = <(~' ' anything)+>

    ip_octet = digit{1,3}:oct ?(0 <= int(oct, 10) <= 255)
    subnet_mask = digit{1,2}:subn ?(0 <= int(subn, 10) <= 32)
    ip_addr = <ip_octet ('.' ip_octet){3}>
    subnet = <ip_addr '/' subnet_mask>

    glob_list = comma_word:one (',' comma_word)+:rest -> [one]+rest

    grain_rule = 'G' '@' colon_word:name ':' ws_word:pat
        -> lambda m,name=name,pat=pat: match(pat, m.get_grain(name) or '')

    pcre_id_rule = 'E' '@' word:pat
        -> lambda m,pat=pat: re.match(pat, m.name)

    grain_pcre_rule = 'P' '@' colon_word:name ':' word:pat
        -> lambda m,pat=pat,nat=name: re.match(pat, m.get_grain(name) or '')

    minion_list_rule = 'L' '@' glob_list:pats
        -> lambda m,pats=pats: any(match(pat, m.name) for pat in pats)

    pillar_rule = 'I' '@' comma_word:name ':' word:pat
        -> lambda m,name=name,pat=pat: match(m.get_pillar(name) or '')

    subnet_rule = 'S' '@' subnet:subnet
        -> lambda m,subnet=subnet: in_subnet(subnet, m.get_grain('ipv4'))

    ip_rule = 'S' '@' ip_addr:ip
        -> lambda m,ip=ip: m.get_grain('ipv4') == ip

    glob_rule = word:pat
        -> lambda m,pat=pat: match(pat, m.name)

    compound_rule = grain_rule
        | pcre_id_rule
        | grain_pcre_rule
        | minion_list_rule
        | pillar_rule
        | subnet_rule
        | ip_rule
        | glob_rule

    paren_expr = '(' ws expr:E ws ')'
        -> E

    not_expr = 'n' 'o' 't' bws expr:E
        -> lambda m,E=E: not E(m)

    or_expr = expr:L bws 'o' 'r' bws expr:R
        -> lambda m,L=L,R=R: L(m) or R(m)

    and_expr = expr:L bws 'a' 'n' 'd' bws expr:R
        -> lambda m,L=L,R=R: L(m) and R(m)

    expr = paren_expr
        | not_expr
        | or_expr
        | and_expr
        | compound_rule
"""


class MinionInfo(object):
    def __init__(self, opts, name):
        self.name = name

    def get_grain(self, name, default=None):
        return '123'

    def get_pillar(self, name, default=None):
        return '123'



GRAMMAR = parsley.makeGrammar(_GRAMMAR_SOURCE, {
    'in_subnet': lambda *_: False
    'match': fnmatch.fnmatch
})

_matcher_cache = {}

def get_matcher(expr):
    if expr not in _matcher_cache:
        _matcher_cache[expr] = GRAMMAR(expr).expr()
    return _matcher_cache[expr]


matcher = get_matcher('G@pants:321')
minion = MinionInfo({}, 'saltmaster')
print matcher(minion)
@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Mar 5, 2014

Collaborator

You're right, the existing compound matcher is a mess. I wrote the master-side version, and was horrified as I wrote it, but it needed to exactly match the behavior of the minion-side version, and I didn't have time to rewrite both at the time.

I love how clean your new implementation is, but the biggest problem is that it adds a hard dependency for salt, which we like to avoid whenever possible.

I will also have to spend some time to learn the Parsley grammar syntax -- not a fan of merging in code that I don't know how to maintain.

Finally, this would break the existing implementation because it doesn't handle minion-side matching. Before I wrote this master-side matcher, we would broadcast compound commands to all minions and the minions would decide whether or not to run the command. In the current implementation, the master only sends the command to minions which match the target, but the minions still do their own calculation as well. We would need to evaluate whether we could rip out this redundant check safely, or if we'd need to implement separate matching rules for minion-side use of this grammar.

PS: I'd also be interested to hear the problem you were having that prompted this investigation. Having written that code, I'm fairly confident I could help you solve it.

Collaborator

basepi commented Mar 5, 2014

You're right, the existing compound matcher is a mess. I wrote the master-side version, and was horrified as I wrote it, but it needed to exactly match the behavior of the minion-side version, and I didn't have time to rewrite both at the time.

I love how clean your new implementation is, but the biggest problem is that it adds a hard dependency for salt, which we like to avoid whenever possible.

I will also have to spend some time to learn the Parsley grammar syntax -- not a fan of merging in code that I don't know how to maintain.

Finally, this would break the existing implementation because it doesn't handle minion-side matching. Before I wrote this master-side matcher, we would broadcast compound commands to all minions and the minions would decide whether or not to run the command. In the current implementation, the master only sends the command to minions which match the target, but the minions still do their own calculation as well. We would need to evaluate whether we could rip out this redundant check safely, or if we'd need to implement separate matching rules for minion-side use of this grammar.

PS: I'd also be interested to hear the problem you were having that prompted this investigation. Having written that code, I'm fairly confident I could help you solve it.

@basepi basepi added the Feature label Mar 5, 2014

@basepi basepi added this to the Pending Discussion milestone Mar 5, 2014

@dw

This comment has been minimized.

Show comment
Hide comment
@dw

dw Mar 6, 2014

Hello again!

Tonight I looked at options for generating a standalone parser, and seems there is exactly one option for Python: http://www.seehuhn.de/pages/wisent. As wisent also wants a separate lexing stage, it might be simpler just to carefully hand-code a new recursive descent parser, which would probably come out about the same size as the old code (but at least we could fix the 'not' issues and suchlike).

The problem is sufficiently small that if you give me a few days I might be able to finish this over the course of a few morning commutes ;)

Regarding the original bug, we have a commercial support ticket open for it already, it relates to mine.get() returning totally junk data (matching minions that the expression did not specify) following restart of saltmaster, specifically in the case of replacing the host EC2 machine but not the EBS volume that contains cachedir and pki_dir. In the process of trying to understand how Salt was performing matches, I found the parser :)

dw commented Mar 6, 2014

Hello again!

Tonight I looked at options for generating a standalone parser, and seems there is exactly one option for Python: http://www.seehuhn.de/pages/wisent. As wisent also wants a separate lexing stage, it might be simpler just to carefully hand-code a new recursive descent parser, which would probably come out about the same size as the old code (but at least we could fix the 'not' issues and suchlike).

The problem is sufficiently small that if you give me a few days I might be able to finish this over the course of a few morning commutes ;)

Regarding the original bug, we have a commercial support ticket open for it already, it relates to mine.get() returning totally junk data (matching minions that the expression did not specify) following restart of saltmaster, specifically in the case of replacing the host EC2 machine but not the EBS volume that contains cachedir and pki_dir. In the process of trying to understand how Salt was performing matches, I found the parser :)

@dw

This comment has been minimized.

Show comment
Hide comment
@dw

dw Mar 6, 2014

Hi there,

Just another follow up. I chatted to the Parsley author, and seems not only is he receptive to a patch that modifies it to generate a self-contained parser, but the work involved would be quite straightforward. I might hack on this for fun anyway, but I was just wondering how receptive you are to using something like this in Salt.

Internally Parsley already generated Python source code to implement the grammar, it just depends on a "pymeta.runtime" module. It would be a simple task to inline pymeta.runtime into the source code, and have a command that output e.g. a totally self-contained "salt/utils/compound_parser.py" or similar.

FWIW as far as parsers go, the example Parsley syntax provided above is about as simple as they come. The PEG parsing algorithm is very straightforward and intuitive, so I think perhaps upstreaming might be practical.

My only remaining concern regarding Parsley is performance, but given the size of the input (usually <100 bytes compound expressions) and relatively small number (meaning caching of the parse result (aka. the matcher function) is effective, at least in the installs I have worked on).

Just thinking aloud. Let me know if you're interested

dw commented Mar 6, 2014

Hi there,

Just another follow up. I chatted to the Parsley author, and seems not only is he receptive to a patch that modifies it to generate a self-contained parser, but the work involved would be quite straightforward. I might hack on this for fun anyway, but I was just wondering how receptive you are to using something like this in Salt.

Internally Parsley already generated Python source code to implement the grammar, it just depends on a "pymeta.runtime" module. It would be a simple task to inline pymeta.runtime into the source code, and have a command that output e.g. a totally self-contained "salt/utils/compound_parser.py" or similar.

FWIW as far as parsers go, the example Parsley syntax provided above is about as simple as they come. The PEG parsing algorithm is very straightforward and intuitive, so I think perhaps upstreaming might be practical.

My only remaining concern regarding Parsley is performance, but given the size of the input (usually <100 bytes compound expressions) and relatively small number (meaning caching of the parse result (aka. the matcher function) is effective, at least in the installs I have worked on).

Just thinking aloud. Let me know if you're interested

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Mar 6, 2014

Contributor

@dw If you're worried about performance you might want to use Cython for the parser after it's generated.

Contributor

thedrow commented Mar 6, 2014

@dw If you're worried about performance you might want to use Cython for the parser after it's generated.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Mar 7, 2014

Collaborator

@dw So it would generate code that would be a standalone parser? My only worry at that point would be license compatibility -- we're Apache licensed, and that's not always compatible. (Haven't had time to check into Parsley licensing)

Also, I looked up your commercial support ticket -- it turned out that it wasn't related to the compound matcher, correct?

Collaborator

basepi commented Mar 7, 2014

@dw So it would generate code that would be a standalone parser? My only worry at that point would be license compatibility -- we're Apache licensed, and that's not always compatible. (Haven't had time to check into Parsley licensing)

Also, I looked up your commercial support ticket -- it turned out that it wasn't related to the compound matcher, correct?

@SEJeff

This comment has been minimized.

Show comment
Hide comment
@SEJeff

SEJeff Mar 10, 2014

Member

Actually, could you use pyparsing, which has been around for EONS and is the de-facto tool for doing what you're trying to do using python?

Member

SEJeff commented Mar 10, 2014

Actually, could you use pyparsing, which has been around for EONS and is the de-facto tool for doing what you're trying to do using python?

@stale stale bot added the stale label Jun 22, 2017

@stale

This comment has been minimized.

Show comment
Hide comment
@stale

stale bot Jun 22, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale bot commented Jun 22, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot closed this Jun 30, 2017

@The-Loeki

This comment has been minimized.

Show comment
Hide comment
@The-Loeki

The-Loeki Jun 30, 2017

Contributor

I find it a bit disappointing this one never got any traction; I think this was a wonderful idea fixing half a ton of hairy bugs in the compound matcher and quite likely obsoleting all others...

As it stands I'm not sure how anyone would feel; the mentioned Wisent library seems dead, Parsley ain't looking too good either. Pyparsing seems alive & kicking and a very quick round-o-Google found http://compilers.pydata.org/ as a useful starting point.

Contributor

The-Loeki commented Jun 30, 2017

I find it a bit disappointing this one never got any traction; I think this was a wonderful idea fixing half a ton of hairy bugs in the compound matcher and quite likely obsoleting all others...

As it stands I'm not sure how anyone would feel; the mentioned Wisent library seems dead, Parsley ain't looking too good either. Pyparsing seems alive & kicking and a very quick round-o-Google found http://compilers.pydata.org/ as a useful starting point.

@dw

This comment has been minimized.

Show comment
Hide comment
@dw

dw Jun 30, 2017

@The-Loeki I no longer have any interest in fixing this (long since moved on from Salt), but perhaps upstream Salt team might. Hand writing a recursive descent parser is probably the simplest way to alleviate the worries about extra dependencies, a single working day should do it

dw commented Jun 30, 2017

@The-Loeki I no longer have any interest in fixing this (long since moved on from Salt), but perhaps upstream Salt team might. Hand writing a recursive descent parser is probably the simplest way to alleviate the worries about extra dependencies, a single working day should do it

@The-Loeki

This comment has been minimized.

Show comment
Hide comment
@The-Loeki

The-Loeki Jun 30, 2017

Contributor

@dw Ah that's not a problem, I don't think anyone was expecting you to pick this up, it's just that I found the idea very tantalizing ;)

Shameless plug: Of course I don't know why you 'moved on from salt' but it has dramatically improved since march 2014, w/Py3 support in this month's major release ;)

Contributor

The-Loeki commented Jun 30, 2017

@dw Ah that's not a problem, I don't think anyone was expecting you to pick this up, it's just that I found the idea very tantalizing ;)

Shameless plug: Of course I don't know why you 'moved on from salt' but it has dramatically improved since march 2014, w/Py3 support in this month's major release ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment