RC 3.3: merge codeql-ruby repository into github/codeql #6955

aibaars · 2021-10-25T14:56:44Z

@dbartol This pull request replays the steps I took to merge codeql-ruby into this repository. In this case I used the rc/3.3 branch of codeql and the codeql-ruby SHA of the 2.6.3 CLI release as starting points.

Below is the list of commits I made; 7741a72 is the actual merge, and the rest are small tweaks to make CI work smoothly etc. The changes are best reviewed on a per-commit basis.

b79f8f1 (HEAD -> codeql-ruby-3.3) Fix CI jobs
8cd86ae Move queries.xml to src
b23b3c3 Add a queries.xml file (for CWE coverage) docs
de38570 Merge identical-files.json
1bf4542 Remove github/codeql submodule
ddbba40 Update CodeSpaces configuration
aeb9ace Add ruby to CODEOWNERS
7741a72 Merge remote-tracking branch 'codeql-ruby/rc/3.3' into codeql/rc/3.3
8ce7b28 Update dependabot config
3554e8d Drop LICENSE and CODE_OF_CONDUCT.md
2de7573 Update Ruby workflows
068beef Move create-extractor-pack Action
d2ea732 Remove CodeSpaces configuration
ba32c54 Move files to ruby subfolder
tip of codeql rc/3.3 702c647 (origin/rc/3.3) Merge pull request #6904 from shati-patel/ruby-query-help
tip of codeql-ruby 1d58f8cd50 (tag: codeql-cli/v2.6.3) Merge pull request #320 from github/rasmuswl/fix-hasLocationInfo-url

Add integration test

Exclude beta releases of code-cli for qltest job

CFG: Allow `erb` top-level scopes

Add an example snippet query

The base `PrintAstConfiguration` class already has a predicate for filtering out desugared nodes - this change just makes use of it in the query. This fixes https://github.com/github/codeql-team/issues/408, which was caused by including nodes representing the desugaring of a[b] = c in the query output. This would result in multiple edges to the same target node (one from the surface AST and another from the desugared AST), which the VSCode AST viewer cannot handle.

Don't include desugared nodes in the printed AST

sync ReDoSUtil.qll with python/JS

Particularly, in tree-siter-embedded-template

Add a dependabot.yml file to trigger daily dependabot updates on the four Rust projects in the codebase: - `node_types` - `generator` - `extractor` - `autobuilder`

Enable dependabot on the Rust projects

Bump tree-sitter versions to pick up parsing fixes

…late files

use toUnicode in ReDoSUtil.qll

Add DB upgrade script check

Temporarily disable operation call resolution

github-actions · 2021-10-25T15:04:02Z

QHelp previews:

javascript/ql/src/Performance/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.replace(/^\s+|\s+$/g, ''); // BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ // BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

javascript/ql/src/Performance/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			/^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			/^_(__|[^_])+_$/

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

python/ql/src/experimental/Security/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			re.sub(r"^\s+|\s+$", "", text) # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (^\s+|(?<!\s)\s+$), or just by using the built-in strip method (text.strip()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

python/ql/src/experimental/Security/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			^_(__|.)+_$

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			^_(__|[^_])+_$

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-078/CommandInjection.qhelp

Uncontrolled command line

Code that passes user input directly to Kernel.system, Kernel.exec, or some other library routine that executes a command, allows the user to execute malicious code.

Recommendation

If possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals.

If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it.

Example

The following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to Kernel.system without examining it first.

class UsersController < ActionController::Base
  def create
    command = params[:command]
    system(command) # BAD
  end
end

References

OWASP: Command Injection.
Common Weakness Enumeration: CWE-78.
Common Weakness Enumeration: CWE-88.

ruby/ql/src/queries/security/cwe-079/ReflectedXSS.qhelp

Reflected server-side cross-site scripting

Directly writing user input (for example, an HTTP request parameter) to a webpage, without properly sanitizing the input first, allows for a cross-site scripting vulnerability.

Recommendation

To guard against cross-site scripting, escape user input before writing it to the page. Some frameworks, such as Rails, perform this escaping implicitly and by default.

Take care when using methods such as html_safe or raw. They can be used to emit a string without escaping it, and should only be used when the string has already been manually escaped (for example, with the Rails html_escape method), or when the content is otherwise guaranteed to be safe (such as a hard-coded string).

Example

The following example is safe because the params[:user_name] content within the output tags will be HTML-escaped automatically before being emitted.

<p>Hello <%= params[:user_name] %>!</p>

However, the following example is unsafe because user-controlled input is emitted without escaping, since it is marked as html_safe.

<p>Hello <%= params[:user_name].html_safe %>!</p>

References

OWASP: XSS Ruby on Rails Cheatsheet.
Wikipedia: Cross-site scripting.
Common Weakness Enumeration: CWE-79.
Common Weakness Enumeration: CWE-116.

ruby/ql/src/queries/security/cwe-089/SqlInjection.qhelp

SQL query built from user-controlled sources

If a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries.

Recommendation

Most database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements.

Example

In the following Rails example, an ActionController class has a text_bio method to handle requests to fetch a biography for a specified user.

The user is specified using a parameter, user_name provided by the client. This value is accessible using the params method.

The method illustrates three different ways to construct and execute an SQL query to find the user by name.

In the first case, the parameter user_name is inserted into an SQL fragment using string interpolation. The parameter is user-supplied and is not sanitized. An attacker could use this to construct SQL queries that were not intended to be executed here.

The second case uses string concatenation and is vulnerable in the same way that the first case is.

In the third case, the name is passed in a hash instead. ActiveRecord will construct a parameterized SQL query that is not vulnerable to SQL injection attacks.

class UserController < ActionController::Base
  def text_bio
    # BAD -- Using string interpolation
    user = User.find_by "name = '#{params[:user_name]}'"

    # BAD -- Using string concatenation
    find_str = "name = '" + params[:user_name] + "'"
    user = User.find_by(find_str)

    # GOOD -- Using a hash to parameterize arguments
    user = User.find_by name: params[:user_name]

    render plain: user&.text_bio
  end
end

References

Wikipedia: SQL injection.
OWASP: SQL Injection Prevention Cheat Sheet.
Common Weakness Enumeration: CWE-89.

ruby/ql/src/queries/security/cwe-094/CodeInjection.qhelp

Code injection

Directly evaluating user input (for example, an HTTP request parameter) as code without first sanitizing the input allows an attacker arbitrary code execution. This can occur when user input is passed to code that interprets it as an expression to be evaluated, using methods such as Kernel.eval or Kernel.send.

Recommendation

Avoid including user input in any expression that may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur.

Example

The following example shows two functions setting a name from a request. The first function uses eval to execute the set_name method. This is dangerous as it can allow a malicious user to execute arbitrary code on the server. For example, the user could supply the value "' + exec('rm -rf') + '" to destroy the server's file system. The second function calls the set_name method directly and is thus safe.

class UsersController < ActionController::Base
  # BAD - Allow user to define code to be run.
  def create_bad
    first_name = params[:first_name]
    eval("set_name(#{first_name})")
  end

  # GOOD - Call code directly
  def create_good
    first_name = params[:first_name]
    set_name(first_name)
  end

  def set_name(name)
    @name = name
  end
end

References

OWASP: Code Injection.
Wikipedia: Code Injection.
Common Weakness Enumeration: CWE-94.
Common Weakness Enumeration: CWE-95.
Common Weakness Enumeration: CWE-116.

ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.gsub!(/^\s+|\s+$/, '') # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/), or just by using the built-in strip method (text.strip!).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			/^0\.\d+E?\d+$/ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: /^0\.\d+(E\d+)?$/.

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to n^k or even 2ⁿ. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

      /^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

      /^_(__|[^_])+_$/

References

OWASP: Regular expression Denial of Service - ReDoS.
Wikipedia: ReDoS.
Wikipedia: Time complexity.
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack.
Common Weakness Enumeration: CWE-1333.
Common Weakness Enumeration: CWE-730.
Common Weakness Enumeration: CWE-400.

ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitarary objects. For libraries that support it, such as the Ruby standard library's JSON module, ensure that the parser is configured to disable deserialization of arbitrary objects.

Example

The following example calls the Marshal.load, JSON.load, and YAML.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'json'
require 'yaml'

class UserController < ActionController::Base
  def marshal_example
    data = Base64.decode64 params[:data]
    object = Marshal.load data
    # ...
  end

  def json_example
    object = JSON.load params[:json]
    # ...
  end

  def yaml_example
    object = YAML.load params[:yaml]
    # ...
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'json'

class UserController < ActionController::Base
  def safe_json_example
    object = JSON.parse params[:json]
    # ...
  end

  def safe_yaml_example
    object = YAML.safe_load params[:yaml]
    # ...
  end
end

References

OWASP vulnerability description: deserialization of untrusted data.
Ruby documentation: guidance on deserializing objects safely.
Ruby documentation: security guidance on the Marshal library.
Ruby documentation: security guidance on JSON.load.
Ruby documentation: security guidance on the YAML library.
Common Weakness Enumeration: CWE-502.

ruby/ql/src/queries/security/cwe-601/UrlRedirect.qhelp

URL redirection from remote source

Directly incorporating user input into a URL redirect request without validating the input can facilitate phishing attacks. In these attacks, unsuspecting users can be redirected to a malicious site that looks very similar to the real site they intend to visit, but which is controlled by the attacker.

Recommendation

To guard against untrusted URL redirection, it is advisable to avoid putting user input directly into a redirect URL. Instead, maintain a list of authorized redirects on the server; then choose from that list based on the user input provided.

Example

The following example shows an HTTP request parameter being used directly in a URL redirect without validating the input, which facilitates phishing attacks:

class HelloController < ActionController::Base
  def hello
    redirect_to params[:url]
  end
end

One way to remedy the problem is to validate the user input against a known fixed string before doing the redirection:

class HelloController < ActionController::Base
  VALID_REDIRECT = "http://cwe.mitre.org/data/definitions/601.html"

  def hello
    if params[:url] == VALID_REDIRECT
      redirect_to params[:url]
    else
      # error
    end
  end
end

References

OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Rails Guides: Redirection and Files.
Common Weakness Enumeration: CWE-601.

ruby/ql/src/queries/security/cwe-732/WeakFilePermissions.qhelp

Overly permissive file permissions

When creating a file, POSIX systems allow permissions to be specified for owner, group and others separately. Permissions should be kept as strict as possible, preventing access to the files contents by other users.

Recommendation

Restrict the file permissions of files to prevent any but the owner being able to read or write to that file

References

Wikipedia: File system permissions.
Common Weakness Enumeration: CWE-732.

ruby/ql/src/queries/security/cwe-798/HardcodedCredentials.qhelp

Hard-coded credentials

Including unencrypted hard-coded inbound or outbound authentication credentials within source code or configuration files is dangerous because the credentials may be easily discovered.

Source or configuration files containing hard-coded credentials may be visible to an attacker. For example, the source code may be open source, or it may be leaked or accidentally revealed.

For inbound authentication, hard-coded credentials may allow unauthorized access to the system. This is particularly problematic if the credential is hard-coded in the source code, because it cannot be disabled easily. For outbound authentication, the hard-coded credentials may provide an attacker with privileged information or unauthorized access to some other system.

Recommendation

Remove hard-coded credentials, such as user names, passwords and certificates, from source code, placing them in configuration files or other data stores if necessary. If possible, store configuration files including credential data separately from the source code, in a secure location with restricted access.

For outbound authentication details, consider encrypting the credentials or the enclosing data stores or configuration files, and using permissions to restrict access.

For inbound authentication details, consider hashing passwords using standard library functions where possible. For example, OpenSSL::KDF.pbkdf2_hmac.

Example

The following examples shows different types of inbound and outbound authentication.

In the first case, RackAppBad, we accept a password from a remote user, and compare it against a plaintext string literal. If an attacker acquires the source code they can observe the password, and can log in to the system. Furthermore, if such an intrusion was discovered, the application would need to be rewritten and redeployed in order to change the password.

In the second case, RackAppGood, the password is compared to a hashed and salted password stored in a configuration file, using OpenSSL::KDF.pbkdf2_hmac. In this case, access to the source code or the assembly would not reveal the password to an attacker. Even access to the configuration file containing the password hash and salt would be of little value to an attacker, as it is usually extremely difficult to reverse engineer the password from the hash and salt. In a real application care should be taken to make the string comparison of the hashed input against the hashed password take close to constant time, as this will make timing attacks more difficult.

require 'rack'
require 'yaml'
require 'openssl'

class RackAppBad
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    # BAD: Inbound authentication made by comparison to string literal
    if password == 'myPa55word'
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

class RackAppGood
  def call(env)
    req = Rack::Request.new(env)
    password = req.params['password']

    config_file = YAML.load_file('config.yml')
    hashed_password = config_file['hashed_password']
    salt = [config_file['salt']].pack('H*')

    #GOOD: Inbound authentication made by comparing to a hash password from a config file.
    hash = OpenSSL::Digest::SHA256.new
    dk = OpenSSL::KDF.pbkdf2_hmac(
      password, salt: salt, hash: hash, iterations: 100_000, length: hash.digest_length
    )
    hashed_input = dk.unpack('H*').first
    if hashed_password == hashed_input
      [200, {'Content-type' => 'text/plain'}, ['OK']]
    else
      [403, {'Content-type' => 'text/plain'}, ['Permission denied']]
    end
  end
end

References

OWASP: XSS Use of hard-coded password.
Common Weakness Enumeration: CWE-259.
Common Weakness Enumeration: CWE-321.
Common Weakness Enumeration: CWE-798.

dbartol

Only a couple questions around new workflows that I thought we already had elsewhere.

dbartol · 2021-10-31T12:44:07Z

.github/workflows/qhelp-pr-preview.yml

@@ -0,0 +1,39 @@
+name: Query help preview


Don't we already have a job that does this for all languages?

There is no workflow for that in this repository. There is another indirect check that verifies qhelp files, however, the logs of that job are not available to open source developers. In addition this job posts the markdown previews as a comment in the pull request for easy reviewing. See for example: #6955 (comment) ;-)

dbartol · 2021-10-31T12:44:44Z

.github/workflows/sync-files.yml

@@ -0,0 +1,20 @@
+name: Check synchronized files


Doesn't a workflow like this already exist?

Not in the codeql repository. There is likely an indirect check in the closed source CLI repository.

This probably isn't the right branch to be adding new workflows, but it seems both harmless and useful, so let's keep it.

aibaars · 2021-10-31T13:14:48Z

Only a couple questions around new workflows that I thought we already had elsewhere.

That's true. I don't think that is a problem for this pull request though.

dbartol

I'm OK with the two new workflows, and everything else looks fine.

alexrford and others added 30 commits August 11, 2021 16:54

Merge pull request #251 from github/aibaars/test

0f6c464

Add integration test

sync ReDoSUtil.qll with python/JS

8bd663a

add RegExpSubPattern.getOperand

5e63b0b

CFG: Allow erb top-level scopes

394c27a

exclude beta releases of code-cli for qltest job

8427a6b

Merge pull request #258 from github/qltest-no-beta

115a13f

Exclude beta releases of code-cli for qltest job

Merge pull request #257 from github/hvitved/cfg/erb

50cfd9c

CFG: Allow `erb` top-level scopes

Add an example snippet query

9b877dc

Merge pull request #246 from github/aibaars/tweaks

df4fb23

Add an example snippet query

Merge pull request #259 from github/hmac-print-ast

a2115f4

Don't include desugared nodes in the printed AST

Merge pull request #256 from github/syncRedos

9c17e00

sync ReDoSUtil.qll with python/JS

Implement getPrimaryQlClasses

5e783e4

extend modelling of ActionController, and start modelling ActionView

41ff10c

extend ActionController tests

d628716

tests

e403fc7

remove ErbFile refs

abc283e

Bump tree-sitter versions to pick up parsing fixes

289b59d

Particularly, in tree-siter-embedded-template

Add ERB comment as regression test for parsing bug

bc06817

Use published crate for tree-sitter-ruby 0.19

3b0055a

Enable dependabot on the Rust projects

0bd7e59

Add a dependabot.yml file to trigger daily dependabot updates on the four Rust projects in the codebase: - `node_types` - `generator` - `extractor` - `autobuilder`

Merge pull request #264 from github/hmac-dependabot

4cbd848

Enable dependabot on the Rust projects

Merge pull request #263 from github/bump_ts

ffd80fc

Bump tree-sitter versions to pick up parsing fixes

Clean up how we map between Rails actions and default associated temp…

4a4b244

…late files

use toUnicode in ReDoSUtil.qll

ff27a0c

Merge pull request #267 from github/erik-krogh/redosUnicode

4ec30b2

use toUnicode in ReDoSUtil.qll

improve ActionControllerHelperMethod doc

a3ae5bc

drop ViewComponent parts from the ActionView library

9571e7b

Add DB upgrade script check

42daf5b

Merge pull request #268 from github/hvitved/db-upgrade-pr-check

348b12c

Add DB upgrade script check

hvitved and others added 12 commits September 29, 2021 14:17

Merge pull request #317 from github/hvitved/disable-operation-resolution

c69762b

Temporarily disable operation call resolution

Merge pull request #320 from github/rasmuswl/fix-hasLocationInfo-url

1d58f8c

Move files to ruby subfolder

ba32c54

Remove CodeSpaces configuration

d2ea732

Move create-extractor-pack Action

068beef

Update Ruby workflows

2de7573

Drop LICENSE and CODE_OF_CONDUCT.md

3554e8d

Update dependabot config

8ce7b28

Merge remote-tracking branch 'codeql-ruby/rc/3.3' into codeql/rc/3.3

7741a72

Add ruby to CODEOWNERS

aeb9ace

Update CodeSpaces configuration

ddbba40

Remove github/codeql submodule

1bf4542

github-actions bot added the documentation label Oct 25, 2021

aibaars and others added 4 commits October 25, 2021 17:01

Merge identical-files.json

de38570

Add a queries.xml file (for CWE coverage) docs

b23b3c3

Move queries.xml to src

8cd86ae

Fix CI jobs

b79f8f1

aibaars force-pushed the codeql-ruby-3.3 branch from 1bff15f to b79f8f1 Compare October 25, 2021 15:03

aibaars changed the title ~~Codeql ruby 3.3~~ RC 3.3: merge codeql-ruby repository into github/codeql Oct 27, 2021

aibaars marked this pull request as ready for review October 27, 2021 18:56

aibaars requested a review from dbartol October 27, 2021 18:56

aibaars assigned dbartol Oct 27, 2021

aibaars added the no-change-note-required This PR does not need a change note label Oct 27, 2021

github deleted a comment from github-actions bot Oct 27, 2021

dbartol reviewed Oct 31, 2021

View reviewed changes

dbartol approved these changes Nov 2, 2021

View reviewed changes

dbartol merged commit d828ab7 into rc/3.3 Nov 2, 2021

dbartol deleted the codeql-ruby-3.3 branch November 2, 2021 13:57

RC 3.3: merge codeql-ruby repository into github/codeql #6955

RC 3.3: merge codeql-ruby repository into github/codeql #6955

Uh oh!

Conversation

aibaars commented Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 25, 2021

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Uncontrolled command line

Recommendation

Example

References

Reflected server-side cross-site scripting

Recommendation

Example

References

SQL query built from user-controlled sources

Recommendation

Example

References

Code injection

Recommendation

Example

References

Polynomial regular expression used on uncontrolled data

Recommendation

Example

Example

References

Inefficient regular expression

Recommendation

Example

References

Deserialization of user-controlled data

Recommendation

Example

References

URL redirection from remote source

Recommendation

Example

References

Overly permissive file permissions

Recommendation

References

Hard-coded credentials

Recommendation

Example

References

Uh oh!

dbartol left a comment

Choose a reason for hiding this comment

Uh oh!

dbartol Oct 31, 2021

Choose a reason for hiding this comment

Uh oh!

aibaars Oct 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbartol Oct 31, 2021

Choose a reason for hiding this comment

Uh oh!

aibaars commented Oct 25, 2021 •

edited

Loading

aibaars Oct 31, 2021 •

edited

Loading