Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using as config language #1

Open
lud opened this issue Dec 4, 2020 · 15 comments
Open

Using as config language #1

lud opened this issue Dec 4, 2020 · 15 comments

Comments

@lud
Copy link

lud commented Dec 4, 2020

Hi,

I just saw your comment on hacker news and I would like to do just that.

I am struggling to find a configuration language for my application because I want it to be created by users, not developers (though there is always a syntax to learn but I am targeting tech savvy people), so I need it to be safe in a way.

I wanted tagging but yamerl does not seem to support custom tagging, and yaml is too much. I need tag to set some special values in the config files, like !credentials "not the actual credentials but a reference".

Because that would be open to users, I do not want the config to be able to create arbitrary atoms, I just need binary strings (using elixir, not erlang). And it would have basically every command disabled besides the explicit commands I give to it (I just need them to declare some data structures that I would map to elixir structs).

Do you think that it is possible with your library?

@YellowApple
Copy link
Member

Howdy! You've sure come to the right place.

Starting with your second need (around only enabling the commands you explicitly provide), that's available out of the box (since I indeed had the same exact desire 😄). Basically, you'd want to use the 2-arity versions of :otpcl.eval and/or :otpcl.eval_file, which (in addition to either the script text or the script file, respectively) take a second parameter with the starting interpreter state. So if you take an empty state, you can then build it up (with calls to :otpcl.cmd/3 or :otpcl.import/2 to define the configuration commands).

So something like this (pardon my Elixir, it's a bit rusty):

defmodule Conf do
  defstruct [:something, credentials: %{}]
end

defmodule ConfParserCommands do
  # We do this so that OTPCL knows that we can treat these functions
  # as actual OTPCL-aware commands rather than Erlang functions
  @otpcl_cmds [:credentials, :something]
  
  def credentials([tag, username, password], state) do
    {conf = %Conf{}, state} = :otpcl.get(:RETVAL, state)
    creds = conf.credentials |>
      Map.put(tag, %{username: username, password: password})
    {%{conf | credentials: creds}, state}
    # From here, OTPCL's interpreter will automatically stick conf
    # into the RETVAL variable.
  end
  
  def something([cred_tag], state) do
    {conf = %Conf{}, state} = :otpcl.get(:RETVAL, state)
    case conf.credentials |> Map.get(cred_tag) do
      %{username: _, password: _} ->
	{%{conf | something: cred_tag}, state}
      _ ->
	{:error, {:invalid_cred_tag, cred_tag}, state}
    end
  end
end

defmodule ConfParser do  
  def parse(filename) do
    # Minimal state doesn't define any commands at all; if you want to
    # use pipes, you'll want :otpcl_env.core_state() here instead.
    state = :otpcl_env.minimal_state()
    # We'll want to include the commands we defined above in our
    # config parser's initial state.
    {:ok, state} = :otpcl.import(ConfParserCommands, state)
    # We'll also want to initialize the special RETVAL variable with
    # an empty Conf
    {:ok, state} = :otpcl.set(:RETVAL, %Conf{}, state)
    # And now we do the actual config parsing
    case :otpcl.eval_file(filename, state) do
      {conf = %Conf{}, _} -> conf
      {:error, reason, errstate} -> {:error, reason, errstate}
    end
  end
end

As for the atom creation, unfortunately there's no way (yet) to turn that off, though in theory it should be straightforward to add that functionality; I'll investigate the feasibility of a sort of "safe mode" interpreter that treats everything as strings.

That said, depending on how many of these configs you're expecting to read, it might not cause many issues in practice; you'd have to be parsing quite a substantially-large config file to blow out the limit. And you can always bump up that limit quite a bit higher than the default (e.g. elixir --erl "+t 2147483647" for the absolute maximum), at the presumable cost of memory consumption.

@lud
Copy link
Author

lud commented Dec 4, 2020

Hey thanksfor the detailed reply ! I will look into it :)

For atoms, the problem is that a malicious user could paste a big chunk of code just to blow the runtime. But that is not a real problem at the moment.

@YellowApple
Copy link
Member

Yeah, I'd say until OTPCL has support for an atom-free interpreter (I'll add it to the TODO list) the best approach would be to have the application limit the amount of code someone can paste in (which is probably a good idea anyway).

YellowApple added a commit that referenced this issue Apr 1, 2021
This should address the concerns raised in issue #1 re: interpreter
safety w/ potentially-adversarial user-input OTPCL scripts/configs.
The meta module is now aware of and able to accept non-atom arguments
for its commands, and only creates atoms as necessary (i.e. to
interface with Erlang functions that expect them); further, the
interpreter state now uses binstrings instead of atoms for its map
keys (unsure of the performance implications here), as does the
`subcmd` command in its generated dispatchers.

TODO: check the rest of the modules for atom-expecting commands.
@YellowApple
Copy link
Member

YellowApple commented Apr 1, 2021

Alright, so after a bit of a long delay, I think I've made some good headway on a "stringy" (i.e. atom-safe) interpreter mode. Modifying the previous example:

defmodule Conf do
  defstruct [:something, credentials: %{}]
end

defmodule ConfParserCommands do
  @otpcl_cmds [:credentials, :something]
  
  def credentials([tag, username, password], state) do
    # Notice that variables are now accessible via binstring rather
    # than atom keys.  Same thing for commands.
    {conf = %Conf{}, state} = :otpcl.get("RETVAL", state)
    creds = conf.credentials |>
      Map.put(tag, %{username: username, password: password})
    {%{conf | credentials: creds}, state}
  end
  
  def something([cred_tag], state) do
    # Variables/commands can also be queried with atoms, too; they'll
    # internally be converted into binstrings.  This is mostly to make
    # it easier for me to gradually convert the whole codebase over.
    {conf = %Conf{}, state} = :otpcl.get(:RETVAL, state)
    case conf.credentials |> Map.get(cred_tag) do
      %{username: _, password: _} ->
	{%{conf | something: cred_tag}, state}
      _ ->
	{:error, {:invalid_cred_tag, cred_tag}, state}
    end
  end
end

defmodule ConfParser do  
  def parse(filename) do
    # Stringy state is effectively the same as the core state -
    # i.e. return and | commands exist, but nothing else - but with
    # the addition of the $STRINGY_INTERPRETER variable being set; if
    # this variable exists at all in a state, then any interpreters
    # run with that state will output (binary) strings instead of
    # atoms.
    state = :otpcl_env.stringy_state()
    {:ok, state} = :otpcl.import(ConfParserCommands, state)
    {:ok, state} = :otpcl.set("RETVAL", %Conf{}, state)
    case :otpcl.eval_file(filename, state) do
      {conf = %Conf{}, _} -> conf
      {:error, reason, errstate} -> {:error, reason, errstate}
    end
  end
end

So now if you have a config file:

credentials primary foo {TotallySecurePa$$w0rd}
credentials secondary bar password

something 123

This should now (in theory) produce:

%Conf{
  something: "123",
  credentials: %{
    "primary" => %{username: "foo", password: "TotallySecurePa$$word"},
    "secondary" => %{username: "bar", password: "password"}
  }
}

@lud Thoughts?

(EDIT: now all scalars are binstrings when in "stringy" mode (rationale). Long story short: consistency in decoded types is something that probably matters quite a bit for config files.)

YellowApple added a commit that referenced this issue Apr 2, 2021
This should address the concerns raised in issue #1 re: interpreter
safety w/ potentially-adversarial user-input OTPCL scripts/configs.
The meta module is now aware of and able to accept non-atom arguments
for its commands, and only creates atoms as necessary (i.e. to
interface with Erlang functions that expect them); further, the
interpreter state now uses binstrings instead of atoms for its map
keys (unsure of the performance implications here), as does the
`subcmd` command in its generated dispatchers.

TODO: check the rest of the modules for atom-expecting commands.
@lud
Copy link
Author

lud commented Apr 6, 2021

Hi,

Thank you for having spent some time to add this atom-safe feature. It looks like it works well, I even checked the atom table :D

I had to write replace :otpcl.import(ConfParserCommands, state) with :otpcl_meta.import([:otpcl, ConfParserCommands, ["credentials", "something"]], state) though. I am not sure if it is the correct way but it worked.

If I read correctly, the import command is able to create arbitrary atoms too. Would it be possible to redefine the import command before running unsafe code?

@YellowApple
Copy link
Member

I had to write replace :otpcl.import(ConfParserCommands, state) with :otpcl_meta.import([:otpcl, ConfParserCommands, ["credentials", "something"]], state) though. I am not sure if it is the correct way but it worked.

I'll dig into that. OTPCL should detect those commands on its own (and normally does in Erlang) via the @otpcl_cmds module attribute, so it's weird that the same wouldn't happen under Elixir. Could be an Elixir-specific quirk, or could be something I missed.

If I read correctly, the import command is able to create arbitrary atoms too. Would it be possible to redefine the import command before running unsafe code?

Yeah, OTPCL's own import command works by looking up the relevant Erlang (or Elixir, or LFE, or whatever) module by name, and (AFAICT) the relevant functions on the Erlang side only accept atoms there. This is why the "stringy" state by default doesn't provide said import command.

If you did want to allow stringy-interpreted scripts to import modules (i.e. from some whitelist), you can of course define your own version of import that wraps or replicates the OTPCL version; for example (pardon my rusty Elixir):

def import([name|args], state) do
  allowed = ["foo", "bar", "baz"]
  if Enum.member?(allowed, name) do
    :otpcl_meta.import([name|args], state)
  else
    {:error, {:import_not_allowed, name}, state}
  end
end

This way, even though some dynamic atom creation does happen, you have control over it, and can know with reasonable certainty that scripts won't create atoms other than :foo, :bar, and :baz.

@lud
Copy link
Author

lud commented Apr 7, 2021

Ok I did not took attention to the attribute. I copied it but didn't think of it.

Attributes are not persisted by default with Elixir, but it works if I add this line:

defmodule ConfParserCommands do
  Module.register_attribute(__MODULE__, :otpcl_cmds, persist: true)
  @otpcl_cmds [:credentials, :something]

If it were an Elixir library you would add a module Otpcl.CommandProvider or something alike, with a macro like

defmacro __using__(opts) do
  quote do
      Module.register_attribute(__MODULE__, :otpcl_cmds, persist: true)
      @otpcl_cmds unquote(opts[:commands])
  end
end

And then a user would use like that:

defmodule ConfParserCommands do
  use Otpcl.CommandProvider, commands: [:something, :credentials]
  
  ...

I guess defining an Elixir macro in erlang is possible, but manually calling the register_attribute is fine.

To handle imports, would it be better to use erlang:binary_to_existing_atom(Name, utf8) ? you would have to catch a badarg though.


Just for fun I tried to do it in Erlang and it works:

%% otpcl_command.erl

-module(otpcl_command).

-export(['MACRO-__using__'/2, '__info__'/1]).


'__info__'(macros) -> [{'__using__', 1}].

'MACRO-__using__'(Caller, Opts) ->
    {'__block__',
     [],
     [{{'.',
        [],
        [{'__aliases__', [{alias, false}], ['Module']},
         register_attribute]},
       [],
       [{'__MODULE__', [], otpcl_command},
        otpcl_cmds,
        [{persist, true}]]},
      {'@',
       [{context, otpcl_command}, {import, 'Elixir.Kernel'}],
       [{otpcl_cmds,
         [{context, otpcl_command}],
         ['Elixir.Access':get(Opts, commands)]}]}]}.

And then

defmodule ConfParserCommands do
  use :otpcl_command, commands: [:something, :credentials]

  # ...
end

This is cool because you do not need an Elixir compiler for your library but can still provide elixir macros. Though I am not sure if it is maintainable in the long term.

YellowApple added a commit that referenced this issue Apr 8, 2021
Ties into issue #1.  Since module and function names should already
exist as atoms, this should make import/use safe(r) to use with
stringy interpreters.

TODO: consider better error handling if someone passes in a
module/function name that ain't already defined (e.g. typos); users
can probably figure out what happened via the stacktrace, but it might
be a good idea to catch the generated `badarg` error and rethrow
something that makes it clear that the module/function in question
doesn't exist.
@YellowApple
Copy link
Member

To handle imports, would it be better to use erlang:binary_to_existing_atom(Name, utf8) ? you would have to catch a badarg though.

Pushed up an experimental change to that effect. Seems to pass tests, but I haven't put it through its paces beyond that, so ¯\_(ツ)_/¯

This is cool because you do not need an Elixir compiler for your library but can still provide elixir macros. Though I am not sure if it is maintainable in the long term.

Yeah, I'll have to look into that maintainability, but this does seem neat, and assuming this doesn't have a whole lot of breakage risk it's something that'd be trivial enough to include.

For now, though, adding an extra line of boilerplate to persist that attribute on the Elixir side seems like a reasonable approach.

@YellowApple
Copy link
Member

YellowApple commented Apr 8, 2021

As an aside, I'm also wondering if I should reconsider using module attributes to specify OTPCL commands; going by EUnit test cases and (per above) Elixir macros as precedent, it might be cleaner and more ergonomic (and more "conventional", I guess) to add some prefix or suffix to the function name itself (e.g. 'OTPCL-foo'(Args, State) -> whatever would correspond to foo in an OTPCL script). Food for thought; I'll have to play around with that and see how easy that is to do.

@YellowApple
Copy link
Member

So after a lot of on-and-off work (which I've finally pushed up), I've migrated OTPCL's master branch to use prefixed function names for OTPCL commands instead of defining a separate attribute - with the bonus that it's now a lot easier for a module to simultaneously maintain OTPCL-aware and Erlang-native interfaces. The downside, however, is that this is currently kinda ugly to handle from within Elixir:

defmodule ConfParserCommands do
  # Ew...
  def unquote(:"CMD_credentials")([tag, username, password], state) do
    {conf = %Conf{}, state} = :otpcl.get("RETVAL", state)
    creds = conf.credentials |>
      Map.put(tag, %{username: username, password: password})
    {%{conf | credentials: creds}, state}
  end
  
  # Gross...
  def unquote(:"CMD_something")([cred_tag], state) do
    {conf = %Conf{}, state} = :otpcl.get(:RETVAL, state)
    case conf.credentials |> Map.get(cred_tag) do
      %{username: _, password: _} ->
	{%{conf | something: cred_tag}, state}
      _ ->
	{:error, {:invalid_cred_tag, cred_tag}, state}
    end
  end
end

However, this should be pretty simple to clean up with a macro - possibly directly in Erlang, assuming Elixir's AST is sufficiently stable. The ideal would then be something like:

defmodule ConfParserCommands do
  use :otpcl
  
  # Much nicer; automatically adds prefix and state handling in signature
  defcommand credentials(tag, username, password) do
    {conf = %Conf{}, state} = :otpcl.get("RETVAL", state)
    creds = conf.credentials |>
      Map.put(tag, %{username: username, password: password})
    {%{conf | credentials: creds}, state}
  end
  
  defcommand something(cred_tag) do
    {conf = %Conf{}, state} = :otpcl.get(:RETVAL, state)
    case conf.credentials |> Map.get(cred_tag) do
      %{username: _, password: _} ->
	{%{conf | something: cred_tag}, state}
      _ ->
	{:error, {:invalid_cred_tag, cred_tag}, state}
    end
  end
end

Thoughts @lud? Ain't sure yet if I want to try to tackle the Elixir niceties as part of 0.3.0; I might postpone that for either 0.3.1 or 0.4.0, especially if having to do the whole def unquote(:"CMD_foo")(args, state) shenanigans is deemed "good enough".

@lud
Copy link
Author

lud commented Oct 13, 2021

Hi !

I guess the macro should rather just prefix the function name, but not hide the state argument as we can match/destructure on it.

When defining def unquote(:"CMD_credentials")([tag, username, password], state), can we be sure that there will always be 3 elements in the list, so we can define defcommand credentials(tag, username, password) ?

Edit: Thank you for all your work ! Honestly we are evaluating Lua too and it might be the final choice, though we are far from even starting on the user-scripting part of the application. For configuration we are still stuck on YAML because it has dicts, no loops, and we can generate schemas, samples and elixir code from a single source. I hope you are not doing those changes just for me :S

@YellowApple
Copy link
Member

YellowApple commented Oct 14, 2021

I guess the macro should rather just prefix the function name, but not hide the state argument as we can match/destructure on it.

I'd advise against that, since state is meant to be somewhat opaque.

When defining def unquote(:"CMD_credentials")([tag, username, password], state), can we be sure that there will always be 3 elements in the list, so we can define defcommand credentials(tag, username, password) ?

That's the idea, yep. Passing in less or greater than 3 arguments to credentials within an OTPCL script would produce a function_clause error, just like if you tried to call that function directly with less or greater than 3 list elements.

For configuration we are still stuck on YAML because it has dicts, no loops, and we can generate schemas, samples and elixir code from a single source. I hope you are not doing those changes just for me :S

Not just for you, but so far these changes have made OTPCL more useful and robust, so happy to at least give 'em a shot :)

@lud
Copy link
Author

lud commented Oct 14, 2021

Great !

So ok for the state being an opaque, but in that case I would rather have my own state passed to the callbacks, the %Conf{} value, which I can match on or destructure, and return {ok, NewConf} or {error, Reason} ; in the way gen_server works by passing you your own state only and not its own full state. Are there cases when you would pull something else that the RETVAL from the state in custom code ?

For the commands arities, I don't think the people writing TCL code would understand a function_clause error if they are only using TCL in the app. So I guess the callback module could define defcommand something(a) and defcommand something(a, b) for instance, which would be compiled in a __optcl__/1 function (typical elixir naming but it can be exported_commands/0 in erlang, that have to be done manually) which would return [{something,1}, {something, 2}] so the runtime could know if the length of the list of command arguments corresponds to an existing command arity, and call the custom code or return a "wrong number of arguments" error in a TCL-like fashion.

@YellowApple
Copy link
Member

YellowApple commented Oct 16, 2021

So ok for the state being an opaque, but in that case I would rather have my own state passed to the callbacks, the %Conf{} value, which I can match on or destructure, and return {ok, NewConf} or {error, Reason} ; in the way gen_server works by passing you your own state only and not its own full state. Are there cases when you would pull something else that the RETVAL from the state in custom code ?

Up to the programmer, really. For example, if you want to define some other variable, like $conf, to store that %Conf{}, then that's totally doable, too - just return :otpcl.set("conf", conf, state) instead of {conf, state}.

In any case, for destructuring a variable's contents, it'd be something like {%Conf{credentials: creds}, state} = :otpcl.get(variable, state) (where variable could be "RETVAL", "conf", whatever).

It's slightly less convenient than GenServer's "just pass in literally anything as a state", but that's probably unavoidable given that the interpreter itself relies on that state having a specific structure, and easy enough to work around by designating some OTPCL variable(s) (like $RETVAL or $conf) to hold the application-specific state.

That is: state is really for OTPCL, not for the programmer using OTPCL, and anything application-specific should be stored as a variable within state (and should be easy to do via :otpcl.get/2 and :otpcl.set/3).

For the commands arities, I don't think the people writing TCL code would understand a function_clause error if they are only using TCL in the app.

Probably not. Right now, the solution would be for a command author to handle this explicitly as a catch-all, e.g.:

def unparse(:"CMD_credentials")(args, state), do: {:error, {:invalid_args, args}, state}

Or possibly go into further detail there (e.g. matching specifically on is_list guards and returning explicit errors for too few / too many args).

In interactive contexts it might be worthwhile to print some help/usage blurb (or even include it in the error result). At some point I'd like to build in a help system of sorts that would automate this to some degree (i.e. a help command, and some way to store/retrieve help/usage info on a per-module or per-command basis). I suspect that's a longer ways out, though, and kinda orthogonal to use as a config language (but definitely invaluable for use as a shell/REPL language).

If I do get around to defining some Elixir and/or Erlang macros to automate command creation (e.g. defcommand and ?OTPCL_CMD, respectively), nicer argument-related error handling (e.g. via a catch-all per above) is something that could probably be added automatically. Also should be possible to add for commands defined within OTPCL via the cmd, use, etc. commands.

Or I guess either the application or OTPCL itself could just catch function_clause errors and re-present them as something more descriptive. Or folks could just live with knowing that a function_clause error means that the command didn't get the arguments it was expecting (i.e. how it's understood to mean already for Erlang/Elixir users).

In any case, probably something for 0.4.0 or later rather than trying to bolt yet another overhaul into 0.3.0 ;)

@lud
Copy link
Author

lud commented Oct 18, 2021

Or I guess either the application or OTPCL itself could just catch function_clause errors and re-present them as something more descriptive.

That means matching on error details to be sure it is a function clause error at the command dispatch level and not further in user code. I wouldn't bother.

Anyway, building a command dispatcher based on arguments count, command names, and passing the full TCL state or a custom state can be built on top of a lower level API easily, notably with Elixir macros (and maybe parse transforms in erl if need be) so yeah, keeping it simple for 0.3 seems right :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants