-
Notifications
You must be signed in to change notification settings - Fork 1.8k
RC 3.3: merge codeql-ruby repository into github/codeql #6955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add integration test
Exclude beta releases of code-cli for qltest job
CFG: Allow `erb` top-level scopes
Add an example snippet query
The base `PrintAstConfiguration` class already has a predicate for filtering out desugared nodes - this change just makes use of it in the query. This fixes https://github.com/github/codeql-team/issues/408, which was caused by including nodes representing the desugaring of a[b] = c in the query output. This would result in multiple edges to the same target node (one from the surface AST and another from the desugared AST), which the VSCode AST viewer cannot handle.
Don't include desugared nodes in the printed AST
sync ReDoSUtil.qll with python/JS
Particularly, in tree-siter-embedded-template
Add a dependabot.yml file to trigger daily dependabot updates on the four Rust projects in the codebase: - `node_types` - `generator` - `extractor` - `autobuilder`
Enable dependabot on the Rust projects
Bump tree-sitter versions to pick up parsing fixes
use toUnicode in ReDoSUtil.qll
Add DB upgrade script check
Temporarily disable operation call resolution
1bff15f
to
b79f8f1
Compare
QHelp previews: javascript/ql/src/Performance/PolynomialReDoS.qhelpPolynomial regular expression used on uncontrolled dataSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this use of a regular expression, which removes all leading and trailing whitespace in a string: text.replace(/^\s+|\s+$/g, ''); // BAD
The sub-expression This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ( Note that the sub-expression ExampleAs a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation: ^0\.\d+E?\d+$ // BAD
The problem with this regular expression is in the sub-expression This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity. To make the processing faster, the regular expression should be rewritten such that the two References
javascript/ql/src/Performance/ReDoS.qhelpInefficient regular expressionSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this regular expression: /^_(__|.)+_$/
Its sub-expression This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition: /^_(__|[^_])+_$/
References
python/ql/src/experimental/Security/CWE-730/PolynomialReDoS.qhelpPolynomial regular expression used on uncontrolled dataSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this use of a regular expression, which removes all leading and trailing whitespace in a string: re.sub(r"^\s+|\s+$", "", text) # BAD
The sub-expression This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ( Note that the sub-expression ExampleAs a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation: ^0\.\d+E?\d+$ # BAD
The problem with this regular expression is in the sub-expression This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity. To make the processing faster, the regular expression should be rewritten such that the two References
python/ql/src/experimental/Security/CWE-730/ReDoS.qhelpInefficient regular expressionSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this regular expression: ^_(__|.)+_$
Its sub-expression This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition: ^_(__|[^_])+_$
References
ruby/ql/src/queries/security/cwe-078/CommandInjection.qhelpUncontrolled command lineCode that passes user input directly to RecommendationIf possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals. If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it. ExampleThe following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to class UsersController < ActionController::Base
def create
command = params[:command]
system(command) # BAD
end
end References
ruby/ql/src/queries/security/cwe-079/ReflectedXSS.qhelpReflected server-side cross-site scriptingDirectly writing user input (for example, an HTTP request parameter) to a webpage, without properly sanitizing the input first, allows for a cross-site scripting vulnerability. RecommendationTo guard against cross-site scripting, escape user input before writing it to the page. Some frameworks, such as Rails, perform this escaping implicitly and by default. Take care when using methods such as ExampleThe following example is safe because the
However, the following example is unsafe because user-controlled input is emitted without escaping, since it is marked as
References
ruby/ql/src/queries/security/cwe-089/SqlInjection.qhelpSQL query built from user-controlled sourcesIf a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries. RecommendationMost database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements. ExampleIn the following Rails example, an The user is specified using a parameter, The method illustrates three different ways to construct and execute an SQL query to find the user by name. In the first case, the parameter The second case uses string concatenation and is vulnerable in the same way that the first case is. In the third case, the name is passed in a hash instead. class UserController < ActionController::Base
def text_bio
# BAD -- Using string interpolation
user = User.find_by "name = '#{params[:user_name]}'"
# BAD -- Using string concatenation
find_str = "name = '" + params[:user_name] + "'"
user = User.find_by(find_str)
# GOOD -- Using a hash to parameterize arguments
user = User.find_by name: params[:user_name]
render plain: user&.text_bio
end
end References
ruby/ql/src/queries/security/cwe-094/CodeInjection.qhelpCode injectionDirectly evaluating user input (for example, an HTTP request parameter) as code without first sanitizing the input allows an attacker arbitrary code execution. This can occur when user input is passed to code that interprets it as an expression to be evaluated, using methods such as RecommendationAvoid including user input in any expression that may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur. ExampleThe following example shows two functions setting a name from a request. The first function uses class UsersController < ActionController::Base
# BAD - Allow user to define code to be run.
def create_bad
first_name = params[:first_name]
eval("set_name(#{first_name})")
end
# GOOD - Call code directly
def create_good
first_name = params[:first_name]
set_name(first_name)
end
def set_name(name)
@name = name
end
end References
ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelpPolynomial regular expression used on uncontrolled dataSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this use of a regular expression, which removes all leading and trailing whitespace in a string: text.gsub!(/^\s+|\s+$/, '') # BAD
The sub-expression This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ( Note that the sub-expression ExampleAs a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation: /^0\.\d+E?\d+$/ # BAD
The problem with this regular expression is in the sub-expression This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity. To make the processing faster, the regular expression should be rewritten such that the two References
ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelpInefficient regular expressionSome regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower. Typically, a regular expression is affected by this problem if it contains a repetition of the form RecommendationModify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. ExampleConsider this regular expression: /^_(__|.)+_$/
Its sub-expression This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition: /^_(__|[^_])+_$/
References
ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelpDeserialization of user-controlled dataDeserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code. RecommendationAvoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitarary objects. For libraries that support it, such as the Ruby standard library's ExampleThe following example calls the require 'json'
require 'yaml'
class UserController < ActionController::Base
def marshal_example
data = Base64.decode64 params[:data]
object = Marshal.load data
# ...
end
def json_example
object = JSON.load params[:json]
# ...
end
def yaml_example
object = YAML.load params[:yaml]
# ...
end
end Using require 'json'
class UserController < ActionController::Base
def safe_json_example
object = JSON.parse params[:json]
# ...
end
def safe_yaml_example
object = YAML.safe_load params[:yaml]
# ...
end
end References
ruby/ql/src/queries/security/cwe-601/UrlRedirect.qhelpURL redirection from remote sourceDirectly incorporating user input into a URL redirect request without validating the input can facilitate phishing attacks. In these attacks, unsuspecting users can be redirected to a malicious site that looks very similar to the real site they intend to visit, but which is controlled by the attacker. RecommendationTo guard against untrusted URL redirection, it is advisable to avoid putting user input directly into a redirect URL. Instead, maintain a list of authorized redirects on the server; then choose from that list based on the user input provided. ExampleThe following example shows an HTTP request parameter being used directly in a URL redirect without validating the input, which facilitates phishing attacks: class HelloController < ActionController::Base
def hello
redirect_to params[:url]
end
end One way to remedy the problem is to validate the user input against a known fixed string before doing the redirection: class HelloController < ActionController::Base
VALID_REDIRECT = "http://cwe.mitre.org/data/definitions/601.html"
def hello
if params[:url] == VALID_REDIRECT
redirect_to params[:url]
else
# error
end
end
end References
ruby/ql/src/queries/security/cwe-732/WeakFilePermissions.qhelpOverly permissive file permissionsWhen creating a file, POSIX systems allow permissions to be specified for owner, group and others separately. Permissions should be kept as strict as possible, preventing access to the files contents by other users. RecommendationRestrict the file permissions of files to prevent any but the owner being able to read or write to that file References
ruby/ql/src/queries/security/cwe-798/HardcodedCredentials.qhelpHard-coded credentialsIncluding unencrypted hard-coded inbound or outbound authentication credentials within source code or configuration files is dangerous because the credentials may be easily discovered. Source or configuration files containing hard-coded credentials may be visible to an attacker. For example, the source code may be open source, or it may be leaked or accidentally revealed. For inbound authentication, hard-coded credentials may allow unauthorized access to the system. This is particularly problematic if the credential is hard-coded in the source code, because it cannot be disabled easily. For outbound authentication, the hard-coded credentials may provide an attacker with privileged information or unauthorized access to some other system. RecommendationRemove hard-coded credentials, such as user names, passwords and certificates, from source code, placing them in configuration files or other data stores if necessary. If possible, store configuration files including credential data separately from the source code, in a secure location with restricted access. For outbound authentication details, consider encrypting the credentials or the enclosing data stores or configuration files, and using permissions to restrict access. For inbound authentication details, consider hashing passwords using standard library functions where possible. For example, ExampleThe following examples shows different types of inbound and outbound authentication. In the first case, In the second case, require 'rack'
require 'yaml'
require 'openssl'
class RackAppBad
def call(env)
req = Rack::Request.new(env)
password = req.params['password']
# BAD: Inbound authentication made by comparison to string literal
if password == 'myPa55word'
[200, {'Content-type' => 'text/plain'}, ['OK']]
else
[403, {'Content-type' => 'text/plain'}, ['Permission denied']]
end
end
end
class RackAppGood
def call(env)
req = Rack::Request.new(env)
password = req.params['password']
config_file = YAML.load_file('config.yml')
hashed_password = config_file['hashed_password']
salt = [config_file['salt']].pack('H*')
#GOOD: Inbound authentication made by comparing to a hash password from a config file.
hash = OpenSSL::Digest::SHA256.new
dk = OpenSSL::KDF.pbkdf2_hmac(
password, salt: salt, hash: hash, iterations: 100_000, length: hash.digest_length
)
hashed_input = dk.unpack('H*').first
if hashed_password == hashed_input
[200, {'Content-type' => 'text/plain'}, ['OK']]
else
[403, {'Content-type' => 'text/plain'}, ['Permission denied']]
end
end
end References
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a couple questions around new workflows that I thought we already had elsewhere.
@@ -0,0 +1,39 @@ | |||
name: Query help preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have a job that does this for all languages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no workflow for that in this repository. There is another indirect check that verifies qhelp files, however, the logs of that job are not available to open source developers. In addition this job posts the markdown previews as a comment in the pull request for easy reviewing. See for example: #6955 (comment) ;-)
@@ -0,0 +1,20 @@ | |||
name: Check synchronized files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't a workflow like this already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in the codeql repository. There is likely an indirect check in the closed source CLI repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably isn't the right branch to be adding new workflows, but it seems both harmless and useful, so let's keep it.
That's true. I don't think that is a problem for this pull request though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the two new workflows, and everything else looks fine.
@dbartol This pull request replays the steps I took to merge codeql-ruby into this repository. In this case I used the rc/3.3 branch of codeql and the codeql-ruby SHA of the 2.6.3 CLI release as starting points.
Below is the list of commits I made; 7741a72 is the actual merge, and the rest are small tweaks to make CI work smoothly etc. The changes are best reviewed on a per-commit basis.
b79f8f1 (HEAD -> codeql-ruby-3.3) Fix CI jobs
8cd86ae Move queries.xml to
src
b23b3c3 Add a queries.xml file (for CWE coverage) docs
de38570 Merge identical-files.json
1bf4542 Remove github/codeql submodule
ddbba40 Update CodeSpaces configuration
aeb9ace Add ruby to CODEOWNERS
7741a72 Merge remote-tracking branch 'codeql-ruby/rc/3.3' into codeql/rc/3.3
8ce7b28 Update dependabot config
3554e8d Drop LICENSE and CODE_OF_CONDUCT.md
2de7573 Update Ruby workflows
068beef Move create-extractor-pack Action
d2ea732 Remove CodeSpaces configuration
ba32c54 Move files to ruby subfolder
tip of codeql rc/3.3 702c647 (origin/rc/3.3) Merge pull request #6904 from shati-patel/ruby-query-help
tip of codeql-ruby 1d58f8cd50 (tag: codeql-cli/v2.6.3) Merge pull request #320 from github/rasmuswl/fix-hasLocationInfo-url