Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZipSlip and TarSlip query to ruby #12208

Merged
merged 13 commits into from
Mar 6, 2023
Merged

Add ZipSlip and TarSlip query to ruby #12208

merged 13 commits into from
Mar 6, 2023

Conversation

gregxsunday
Copy link
Contributor

This query detects previously undetected RCE in Discourse by uploading malicious ZIP/Tar archives.

CVE-2022-36066

Similarly as with Python's TarSlip query, I consider unarchiving a file as a source and file creation as a sink. I've modelled all 3 libraries used by Discourse (Gem::Package::TarReader, Zip::File and Zlib::GzipReader).

There are two things are I think should be improved here but I simply don't know how to do them as I'm new to CodeQL.

Flaw 1: Flow in blocks

The flow is followed correctly when the zipfile is returned from a function with a yield keyword but when it's inline like this, the flow breaks. I've tried debugging it with partial flow queries but I don't know how to model that.

Zip::File.open('example.zip') do |zip_file|
  zip_file.each do |entry|
    # Do something with each entry in the zip file
    puts entry.name
  end
end

Flaw 2: Sanitizer

The way the bug was fixed in Discourse is by adding a function

   def is_safe_path_for_extraction?(path, dest_directory)
       File.expand_path(path).start_with?(dest_directory)
     end

And calling it with the untrusted path as the first argument. While my predicate isSanitizer can detect inline File.expand_path(path), it doesn't catch this case. I probably need to look at more than one node in isSanitizer but I don't quite know how to do it.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 17, 2023

QHelp previews:

ruby/ql/src/experimental/cwe-022-zipslip/ZipSlip.qhelp

Arbitrary file write during zipfile/tarfile extraction

Extracting files from a malicious tar archive without validating that the destination file path is within the destination directory can cause files outside the destination directory to be overwritten, due to the possible presence of directory traversal elements (..) in archive paths.

Tar archives contain archive entries representing each file in the archive. These entries include a file path for the entry, but these file paths are not restricted and may contain unexpected special elements such as the directory traversal element (..). If these file paths are used to determine an output file to write the contents of the archive item to, then the file may be written to an unexpected location. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

For example, if a tar archive contains a file entry ..\sneaky-file, and the tar archive is extracted to the directory c:\output, then naively combining the paths would result in an output file path of c:\output\..\sneaky-file, which would cause the file to be written to c:\sneaky-file.

Recommendation

Ensure that output paths constructed from tar archive entries are validated to prevent writing files to unexpected locations.

The recommended way of writing an output file from a tar archive entry is to check that ".." does not occur in the path.

Example

In this example an archive is extracted without validating file paths. If archive.tar contained relative paths (for instance, if it were created by something like tar -cf archive.tar ../file.txt) then executing this code could write to locations outside the destination directory.

class FilesController < ActionController::Base
  def zipFileUnsafe
    path = params[:path]
    Zip::File.open(path).each do |entry|
      File.open(entry.name, "wb") do |os|
        entry.read
      end
    end
  end

  def tarReaderUnsafe
    path = params[:path]
    file_stream = IO.new(IO.sysopen(path))
    tarfile = Gem::Package::TarReader.new(file_stream)
    tarfile.each do |entry|
      ::File.open(entry.full_name, "wb") do |os|
        entry.read
      end
    end
  end  
end

To fix this vulnerability, we need to check that the path does not contain any ".." elements in it.

class FilesController < ActionController::Base
  def zipFileSafe
    path = params[:path]
    Zip::File.open(path).each do |entry|
      entry_path = entry.name
      next if !File.expand_path(entry_path).start_with?('/safepath/')
      File.open(entry_path, "wb") do |os|
        entry.read
      end
    end
  end

  def tarReaderSafe
    path = params[:path]
    file_stream = IO.new(IO.sysopen(path))
    tarfile = Gem::Package::TarReader.new(file_stream)
    tarfile.each do |entry|
      entry_path = entry.full_name
      raise ExtractFailed if entry_path != "/safepath"
      ::File.open(entry_path, "wb") do |os|
        entry.read
      end
    end
  end  
end

References

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request. I see you used the Python version of this query as inspiration. There are a couple of Python related bits left in the documentation, let's fix that.

Copy link
Contributor

@alexrford alexrford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicks, generally LGTM.

Since this is a new query for Ruby, our CI will want a change note. This can be added as a ruby/ql/src/change-notes/2023-02-17-zipslip-query.md with contents like:

---
category: newQuery
---
* Added a new query, `rb/zip-slip`, to detect arbitrary file writes during extraction of zip/tar archives.

ruby/ql/lib/codeql/ruby/security/ZipSlipQuery.qll Outdated Show resolved Hide resolved
ruby/ql/lib/codeql/ruby/security/ZipSlipCustomizations.qll Outdated Show resolved Hide resolved
@gregxsunday
Copy link
Contributor Author

Thanks for those suggestions @alexrford and sorry @aibaars for leaving out python references. I think it's better now

@aibaars
Copy link
Contributor

aibaars commented Feb 17, 2023

From https://github.com/github/codeql/actions/runs/4203950763/jobs/7293988883

...
Formatting ruby/ql/lib/codeql/ruby/security/ZipSlipCustomizations.qll.
ruby/ql/lib/codeql/ruby/security/ZipSlipCustomizations.qll would change by autoformatting.
Formatting ruby/ql/lib/codeql/ruby/security/OverlyLargeRangeQuery.qll.
Formatting ruby/ql/lib/codeql/ruby/security/ServerSideRequestForgeryQuery.qll.
Formatting ruby/ql/lib/codeql/ruby/security/ZipSlipQuery.qll.
ruby/ql/lib/codeql/ruby/security/ZipSlipQuery.qll would change by autoformatting.
...

Could you auto-format the files? You can run: find ruby/ql -type f \( -name "*.qll" -o -name "*.ql" \) -print0 | xargs -0 codeql query format -i or use "format document" in VScode if you are using the IDE for query development . I recommend enabling "format document on save" in VSCode.

/**
* A call to `Zip::File.open(path)`, considered a flow source.
*/
private class ZipFileOpen extends Source {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following project resulted in some strange looking results:
https://gist.github.com/aibaars/9f25acb1dcc81b4731d9781b955d3ddd

I think you need to model the case where the Zip::File.open method is called with a block differently from the case where it is called without block; Zip::File.open(path) do |file| ... end and file = Zip::File.open(path) are different.

@aibaars
Copy link
Contributor

aibaars commented Feb 20, 2023

@gregxsunday Could you also move the query and its support files to the experimental folder? The query is pretty good as-is, but for new queries we like to do some fine-tuning.

@aibaars
Copy link
Contributor

aibaars commented Feb 20, 2023

@gregxsunday These are the results of the query on the latest version of discourse: https://gist.github.com/aibaars/b1456bbab6c13ed6e3192819ad95632d

The results look sensible, but I think the query is not picking up checks like the one in https://github.com/discourse/discourse/blob/1a653d2ce931b76b3b217d9e4f571bcef124d2a9/lib/compression/strategy.rb#LL21C35-L21C35 . If you can teach the query to handle this type of sanitizer that would be great. However, this is not a requirement for the query being accepted.

@gregxsunday
Copy link
Contributor Author

Hi @aibaars,

I will move it to experimental and I will take a look at the results you've sent.

Regarding handling blocks and the sanitizer, as I wrote in the original PR message, I am aware of those limitations and I had tried to solve them but I simply don't know how to. I'm happy to improve the query if you hint me into the good direction.

@aibaars
Copy link
Contributor

aibaars commented Feb 21, 2023

Regarding handling blocks and the sanitizer, as I wrote in the original PR message, I am aware of those limitations and I had tried to solve them but I simply don't know how to. I'm happy to improve the query if you hint me into the good direction.

Sorry about that, I had forgotten you already mentioned those in the description. Let's add handling blocks, but leave the sanitizer as future work.

I think something like the following should work. We treat the return value as a source if there is no block, and if there is a block then we treat the block's first parameter as a source:

    ZipFileOpen() {
      exists(API::MethodAccessNode zipOpen |
        zipOpen = API::getTopLevelMember("Zip").getMember("File").getMethod("open") and
        // If argument refers to a string object, then it's a hardcoded path and
        // this file is safe.
        not zipOpen.getParameter(0).asSource().getConstantValue().isStringlikeValue(_)
      |
        not exists(zipOpen.getBlock()) and this = zipOpen.getReturn().asSource()
        or
        this = zipOpen.getBlock().getParameter(0).asSource()
      )
    }

@gregxsunday gregxsunday requested a review from a team as a code owner February 23, 2023 10:50
@gregxsunday gregxsunday requested review from hvitved and removed request for a team February 23, 2023 10:50
@gregxsunday
Copy link
Contributor Author

  • moved to experimental
  • added handling blocks to TarReader and Zip classes. Haven't found Gzip to be usable as the block
  • I took a look at the results and it indeed looks strange when the source and the sink are in the same place. The reason for it is that Zip::File.open is defined as one of the FileSystemAccess sinks in Archive.qll. The files in the roo repo are responsible for handling ZIP-like excel and office files and I think it's not impossible that there's an archive inside the archive inside those files. It might be hard to exploit but I don't see any obvious error in the query here.

@gregxsunday gregxsunday requested review from alexrford and aibaars and removed request for hvitved and alexrford February 23, 2023 12:21
Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!

@aibaars
Copy link
Contributor

aibaars commented Mar 1, 2023

  • I took a look at the results and it indeed looks strange when the source and the sink are in the same place. The reason for it is that Zip::File.open is defined as one of the FileSystemAccess sinks in Archive.qll. The files in the roo repo are responsible for handling ZIP-like excel and office files and I think it's not impossible that there's an archive inside the archive inside those files. It might be hard to exploit but I don't see any obvious error in the query here.

You're right that there is a potential zip-slip issue in roo-rb/roo. The previous version sort of found the problem, but for the wrong reason. To find the result properly we need a model for zip.dir.foreach { | filename | ... } , so the query does not know that there should be taint flow from zip to filename:

https://github.com/roo-rb/roo/blob/3fecab545b943962213cdf5f0cbe2cdb142940b7/lib/roo/base.rb#L603

If we add a model for the dir and foreach methods the query finds results with much more sensible flow paths.

@hmac Shall we add modelling the Zip::File class and friends to our list of things to model, or is it simple enough to add to this PR?

With models for the Zip::File#dir and Zip::FileSystem::ZipFsDir#foreach methods the query would find the following results: https://gist.github.com/aibaars/2f4a573b4d1d87f70e5b6f6a4e575e9b

If we also model that Zip::File::open returns the value yielded by the block then we get the original 5 results back, but now with sensible looking paths: https://gist.github.com/aibaars/6ca7c116e7ece9ee652c281f94d9f2ba

@hmac
Copy link
Contributor

hmac commented Mar 6, 2023

@aibaars we do have rubyzip in our list - we can bump it up the priority if it's helpful here. I'll add a link to this PR to the relevant tracking issue. cc @alexrford

@aibaars
Copy link
Contributor

aibaars commented Mar 6, 2023

@gregxsunday Thanks a lot for this contribution!

@aibaars aibaars merged commit d2ab40c into github:main Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants