Skip to content

Rb: Add an unsafe-code-construction query #10862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jan 16, 2023

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Oct 17, 2022

The query is generally the same pattern as unsafe-shell-command-construction, but for code-execution.

I got 4 CVEs I'm looking at in relation to this query, but none of them are currently flagged.
Although some of those projects have other alerts.
But all 4 sinks are recognized, so I would like to merge the query before looking further into the taint-steps.

Evaluations (nightly, rails-projects) both look good.
The new alerts are from the drive-by StringLiteral -> StringlikeLiteral in unsafe-shell-command-construction.

Here is the result from a MRVA run: https://github.com/github/codeql-dca-main/issues/8790#issuecomment-1327274386. (Indirect link to keep it internal).
It's loud on some projects, but I don't see a good reason for the use of e.g. module_eval.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 17, 2022

QHelp previews:

ruby/ql/src/queries/security/cwe-094/UnsafeCodeConstruction.qhelp

Unsafe code constructed from library input

When a library function dynamically constructs code in a potentially unsafe way, it's important to document to clients of the library that the function should only be used with trusted inputs. If the function is not documented as being potentially unsafe, then a client may incorrectly use inputs containing unsafe code fragments, and thereby leave the client vulnerable to code-injection attacks.

Recommendation

Properly document library functions that construct code from unsanitized inputs, or avoid constructing code in the first place.

Example

The following example shows two methods implemented using eval: a simple deserialization routine and a getter method. If untrusted inputs are used with these methods, then an attacker might be able to execute arbitrary code on the system.

module MyLib
    def unsafeDeserialize(value)
        eval("foo = #{value}")
        foo
    end

    def unsafeGetter(obj, path)
        eval("obj.#{path}")
    end
end

To avoid this problem, either properly document that the function is potentially unsafe, or use an alternative solution such as JSON.parse or another library that does not allow arbitrary code to be executed.

require 'json'

module MyLib
    def safeDeserialize(value)
        JSON.parse(value)
    end

    def safeGetter(obj, path)
        obj.dig(*path.split("."))
    end
end

Example

As another example, consider the below code which dynamically constructs a class that has a getter method with a custom name.

require 'json'

module BadMakeGetter
  # Makes a class with a method named `getter_name` that returns `val`
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval <<-END
      def #{getter_name}
        #{JSON.dump(val)}
      end
    END
    new_class
  end
end

one = BadMakeGetter.define_getter_class(:one, "foo")
puts "One is #{one.new.one}"

The example dynamically constructs a string which is then executed using module_eval. This code will break if the specified name is not a valid Ruby identifier, and if the value is controlled by an attacker, then this could lead to code-injection.

A more robust implementation, that is also immune to code-injection, can be made by using module_eval with a block and using define_method to define the getter method.

# Uses `define_method` instead of constructing a string
module GoodMakeGetter
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval do
      define_method(getter_name) { val }
    end
    new_class
  end
end

two = GoodMakeGetter.define_getter_class(:two, "bar")
puts "Two is #{two.new.two}"

Example

This example dynamically registers a method on another class which forwards its arguments to a target object. This approach uses module_eval and string interpolation to construct class variables and methods.

module Invoker
  def attach(klass, name, target)
    klass.module_eval <<-CODE
      @@#{name} = target

      def #{name}(*args)
        @@#{name}.#{name}(*args)
      end
    CODE
  end
end

A safer approach is to use class_variable_set and class_variable_get along with define_method. String interpolation is still used to construct the class variable name, but this is safe because class_variable_set is not susceptible to code-injection.

send is used to dynamically call the method specified by name. This is a more robust alternative than the previous example, because it does not allow arbitrary code to be executed, but it does still allow for any method to be called on the target object.

module Invoker
  def attach(klass, name, target)
    var = :"@@#{name}"
    klass.class_variable_set(var, target)
    klass.define_method(name) do |*args|
      self.class.class_variable_get(var).send(name, *args)
    end
  end
end

References

@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 36d9c05 to 865d5ab Compare October 18, 2022 10:35
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 7d8bf01 to a7745ac Compare November 24, 2022 14:55
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 5decea2 to 4c6478d Compare November 24, 2022 17:08
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 78cd2c8 to 8ec6643 Compare November 25, 2022 08:45
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 8ec6643 to c00a950 Compare November 25, 2022 09:27
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from c00a950 to 53f24a5 Compare November 25, 2022 09:32
@erik-krogh erik-krogh marked this pull request as ready for review November 25, 2022 09:39
@erik-krogh erik-krogh requested a review from a team as a code owner November 25, 2022 09:39
@calumgrant calumgrant requested a review from hmac November 28, 2022 09:55
@hmac
Copy link
Contributor

hmac commented Dec 12, 2022

Looking at the results for this query, I'm still concerned that it is too noisy.

It's loud on some projects, but I don't see a good reason for the use of e.g. module_eval.

module_eval and friends are a common way to implement macros in Ruby. Taking a simplified example from dmendel/bindata, how would you rewrite the following to not use module_eval?

module BitField
  def self.define_class(nbits)
    new_class = Class.new
    BitField.define_methods(new_class, nbits)
  end

  def self.define_methods(bit_class, nbits)
    bit_class.module_eval <<-END
      def assign(val)
        #{create_nbits_code(nbits)}
        super(val)
      end
    END
  end

  def self.create_nbits_code(nbits)
    if nbits == :nbits
      "nbits = eval_parameter(:nbits)"
    else
      ""
    end
  end
end

By alerting on flow from library inputs to code execution like this, we are effectively outlawing any library that exposes a macro to its consumers. This seems too strict to me.

@erik-krogh
Copy link
Contributor Author

how would you rewrite the following to not use module_eval?

The example is incomplete so thats hard to say, but I don't see anything that couldn't be rewritten to not use module_eval with a constructed string.

To be clear, using module_eval is not the issue, using module_eval with a constructed string is the issue.
The documentation should be clearer about that, I'll look into it.

Below is an example I made of how to rewrite from the bad pattern to the good pattern.
I'm going to include that in the .qhelp, I'll push that when I've done it.

# Makes a class with a method named `getter_name` that returns `val` in a nasty way
module BadMakeGetter
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval <<-END
      def #{getter_name}
        #{val}
      end
    END
    new_class
  end
end

# Same as above, but uses `define_method` instead of constructing a string
module GoodMakeGetter
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval do
      define_method(getter_name) { val }
    end
    new_class
  end
end

one = BadMakeGetter.define_getter_class(:one, 1)
puts "One is #{one.new.one}"

two = GoodMakeGetter.define_getter_class(:two, 2)
puts "Two is #{two.new.two}"

@hmac
Copy link
Contributor

hmac commented Dec 15, 2022

My example is excerpted and simplified but it is complete in the sense that it is fully executable Ruby code. However I probably simplified it a bit too much. The original is here. Having looked at this again I think this example is representable without module_eval, though it behaves slightly differently. The original version does a comparison with signed at macro time and the rest of the computation at method call time. This version does more at macro time, but arguably that's actually an improvement:

# Original (simplified a bit)
class Int
  def self.define_class(name, nbits, signed)
    new_class = Class.new
    Int.define_methods(new_class, nbits, signed.to_sym)
    new_class
  end

  def self.define_methods(int_class, nbits, signed)
    raise "nbits must be divisible by 8" unless (nbits % 8).zero?

    int_class.module_eval <<-END
      def assign(val)
        #{create_clamp_code(nbits, signed)}
      end

      def do_num_bytes
        #{nbits / 8}
      end
    END
  end

  private

  def self.create_clamp_code(nbits, signed)
    if signed == :signed
      max = "(1 << (#{nbits} - 1)) - 1"
      min = "-((#{max}) + 1)"
    else
      max = "(1 << #{nbits}) - 1"
      min = "0"
    end

    "val = val.clamp(#{min}, #{max})"
  end
end

# Rewritten
class Int2
  def self.define_class(name, nbits, signed)
    new_class = Class.new
    Int2.define_methods(new_class, nbits, signed.to_sym)
    new_class
  end

  def self.define_methods(int_class, nbits, signed)
    raise "nbits must be divisible by 8" unless (nbits % 8).zero?

    max = signed == :signed ? (1 << (nbits - 1)) -1 : (1 << nbits) - 1

    min = signed == :signed ? -(max + 1) : 0

    num_bytes = nbits / 8

    int_class.define_method(:assign) { |val| val = val.clamp(min, max) }
    int_class.define_method(:do_num_bytes) { num_bytes }
  end
end

I would like us to spot check a few other different cases to make sure that this transformation is always possible before we go ahead with this query.

@hmac
Copy link
Contributor

hmac commented Dec 20, 2022

Here's some more examples taken from the MRVA run. The old code is commented out, my replacement is below it.

class Dsl
  def self.create_subclasses_with_endian(bnl_class)
    # instance_eval "class ::#{bnl_class}Le < ::#{bnl_class}; endian :little; end"
    const_set("#{bnl_class}Le", Class.new(bnl_class) { endian :little })
  end
end

module Do
  VISIBILITY_WORD = {
        public: "",
        private: "private ",
        protected: "protected "
    }.freeze

  def self.wrap_method(target, method, visibility)
    # target.module_eval(<<-RUBY, __FILE__, __LINE__ + 1)
    #   #{VISIBILITY_WORD[visibility]} def #{method}(...)
    #     if block_given?
    #       super
    #     else
    #       Do.() { super { |*ms| Do.bind(ms) } }
    #     end
    #   end
    # RUBY

    meth = target.define_method(method) do |*args|
      if block_given?
        super
      else
        Do.() { super { |*ms| Do.bind(ms) } }
      end
    end

    raise ArgumentError unless visibility.in? %i[public private protected]

    target.send(visibility)
  end
end

class PublicCall
  def self.call_interface(method, safe)
    @interfaces.fetch_or_store([method, safe]) do
            ::Module.new do
              if safe
          module_eval(<<~RUBY, __FILE__, __LINE__ + 1)
            def call(input, &block)                # def call(input, &block)
              @target.#{method}(input, &block)   #   @target.coerve(input, &block)
            end                                    # end
            RUBY
        else
          module_eval(<<~RUBY, __FILE__, __LINE__ + 1)
            def call(input, &block)                                          # def call(input, &block)
              @target.#{method}(input)                                     #   @target.coerce(input)
            rescue ::NoMethodError, ::TypeError, ::ArgumentError => error    # rescue ::NoMethodError, ::TypeError, ::ArgumentError => error
                  CoercionError.handle(error, &block)                          #   CoercionError.handle(error, &block)
            end                                                              # end
            RUBY
        end
      end
        end

        @interfaces.fetch_or_store([method, safe]) do
          ::Module.new do
            if safe
              define_method(:call) do |input, &block|
                @target.send(method, input)
              end
            else
              define_method(:call) do |input, &block|
                @target.send(method, input)
              rescue ::NoMethodError, ::TypeError, ::ArgumentError => error
                CoercionError.handle(error, &block)
              end
            end
          end
        end
  end
end

module ElasticAPM
  module Deprecations
    def deprecate(name, replacement = nil)
      alias_name = "#{name.to_s.chomp('=')}__deprecated_"
      alias_name += '=' if name.to_s.end_with?('=')

      # class_eval <<-RUBY, __FILE__, __LINE__ + 1
      #   alias :"#{alias_name}" :"#{name}"

      #   def #{name}(*args, &block)
      #     warn "[ElasticAPM] [DEPRECATED] `#{name}' is being removed. " \
      #       "#{replacement && "See `#{replacement}'."}" \
      #       "\nCalled from \#{caller.first}"
      #     send("#{alias_name}", *args, &block)
      #   end
      # RUBY

      self.class.alias_method alias_name, name
      self.class.define_method name do |*args, &block|
        warn "[ElasticAPM] [DEPRECATED] `#{name}' is being removed. " \
            "#{replacement && "See `#{replacement}'."}" \
            "\nCalled from \#{caller.first}"
        send(alias_name, *args, &block)
      end
    end
  end
end

module FFI
  class VariadicInvoker
    def attach(mod, mname)
      # invoker = self
      # params = "*args"
      # call = "call"
      # mod.module_eval <<-code
      #   @@#{mname} = invoker
      #   def self.#{mname}(#{params})
      #     @@#{mname}.#{call}(#{params})
      #   end
      #   def #{mname}(#{params})
      #     @@#{mname}.#{call}(#{params})
      #   end
      # code
      # invoker

      var = :"@@#{mname}"
      mod.class_variable_set(var, self)
      mod.define_singleton_method(mname) { |*args| class_variable_get(var).call(*args) }
      mod.define_method(mname) { |*args| class_variable_get(var).call(*args) }
      self
    end
  end
end


module RbNacl
  module Sodium
    def sodium_function(name, function, arguments)
      # module_eval <<-RUBY, __FILE__, __LINE__ + 1
      #   attach_function #{function.inspect}, #{arguments.inspect}, :int
      #   def self.#{name}(*args)
      #     ret = #{function}(*args)
      #     ret == 0
      #   end
      # RUBY

      self.class.attach_function function.inspect, arguments.inspect, :int
      self.class.define_singleton_method(name) do |*args|
        ret = self.send(function, *args)
        ret == 0
      end
    end
  end
end

In general I think this looks promising: there's a straightforward conversion for every case I've looked at. I think if the qhelp includes examples that use define_method, Class.new, {class,instance}_variable_{get,set} and send then that should cover most cases. We should also note that when using send (which I think is unavoidable in some cases) should you not pass unsanitized user input to the first arg (if you can avoid it). Otherwise their fix will just trigger another CodeQL alert.

@erik-krogh
Copy link
Contributor Author

@hmac are you up for writing some examples I can put in the .qhelp?
You seem to be more into the finer details of e.g. class_variable_get than I am at this point.

I think the only methods missing from your list are {class,instance}_variable_{get,set} and send.

A note about send:
The code-injection query only flags unsafe uses of send if an attacker can control the entire string, so if an attacker can only control e.g. a prefix/suffix, then we don't produce an alert.
That was the change I made using flow-labels recently.
And this query does not treat send() as a sink. This query only uses code-execution sinks where the runsArbitraryCode() predicate hold, which is not the case for e.g. send.

@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 824544a to 3815a5a Compare January 2, 2023 09:19
@github github deleted a comment from github-actions bot Jan 2, 2023
@github github deleted a comment from github-actions bot Jan 2, 2023
@github github deleted a comment from github-actions bot Jan 2, 2023
The `send()` example is not flagged by any current query, so it was weird talking about it as "vulnerable".
@erik-krogh erik-krogh force-pushed the unsafeCodeConstruction branch from 3901df6 to 3811eae Compare January 2, 2023 12:34
@calumgrant calumgrant requested a review from alexrford January 3, 2023 09:35
Copy link
Contributor

@alexrford alexrford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the code, and the qhelp is extremely informative. Just some minor nitpicks/questions.


// override to require the path doesn't have unmatched return steps
override DataFlow::FlowFeature getAFeature() {
result instanceof DataFlow::FeatureHasSourceCallContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - I'm not familiar with FlowFeatures - why is predicate necessary for this particular flow configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents flow out of functions that weren't called as part of the path.
I have a test-case for it here:

def id(x)
IO.popen("cat #{x}", "w") # NOT OK - the parameter is not a constant.
return x
end
def thisIsSafe()
IO.popen("echo #{id('foo')}", "w") # OK - only using constants.
end

In that example the x parameter in id() is correctly identified as a source.
However, if that value flows out of the id() function (into e.g. thisIsSafe()), then it is no longer a valid source.

This flow-feature blocks that kind of flow for returns that doesn't have a matching call.

erik-krogh and others added 2 commits January 5, 2023 20:02
Co-authored-by: Alex Ford <alexrford@users.noreply.github.com>
@erik-krogh erik-krogh added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jan 5, 2023
alexrford
alexrford previously approved these changes Jan 6, 2023
Copy link
Contributor

@alexrford alexrford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mchammer01 mchammer01 self-requested a review January 10, 2023 09:42
@mchammer01
Copy link
Contributor

I'll review this on behalf of Docs 😃

mchammer01
mchammer01 previously approved these changes Jan 10, 2023
Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-krogh - LGTM ✨
A few minor nits for your consideration.

<example>
<p>
The following example shows two methods implemented using <code>eval</code>: a simple
deserialization routine and a getter method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some formatting here, like this? 🤔

Suggested change
deserialization routine and a getter method.
deserialization routine and a `getter` method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need any formatting here.
There is not a method named "getter". "getter" describes a kind of method in this context.

Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>
@erik-krogh erik-krogh dismissed stale reviews from mchammer01 and alexrford via f2658a0 January 10, 2023 11:56
@erik-krogh erik-krogh requested a review from alexrford January 10, 2023 11:56
@erik-krogh erik-krogh removed the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jan 10, 2023
Copy link
Contributor

@alexrford alexrford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@erik-krogh erik-krogh merged commit 59a8b21 into github:main Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants