Skip to content

SafeQuery Explained

Ben Yu edited this page May 29, 2024 · 18 revisions

Why SafeQuery?

The StringFormat class offers compile-time safety with interpolation-style string formatting, using named placeholders.

What you shouldn't use it for, is to create SQL queries, because string interpolation with untrusted input can lead to SQL injection attack. For example:

private static final StringFormat FIND_USER_BY_ID =
    new StringFormat("SELECT * FROM Users WHERE user_id = '{user_id}'");

...
String query = FIND_USER_BY_ID.format(userIdInput);

If the userIdInput comes from untrusted sources, it can be used to steal information about other users. The attacker can just send a string like "' OR user_id = 'victim", and then they can see all information about the victim.

Instead, consider using the com.google.mu.safesql package:

SafeQuery query = SafeQuery.of(
    "SELECT * FROM Users WHERE user_id = '{user_id}'", userIdInput);

Benefits provided by the safesql package:

  • SafeQuery automatically escapes special characters to prevent injection attack.
  • You can interpolate not only literal values, but tables, columns and even sub-queries.
  • The same set of compile-time checks ensure that you can't make human errors (like passing the user password as the user id).
  • Extra sql-aware compile-time checks ensure that your SQL template is sane.
  • You can compose smaller SafeQuery objects to create large SafeQuery objects, making it easier to manage complex queries.
  • The GoogleSql class provides extra GoogleSQL conversions such as translating java.time.Instant to GoogleSQL's TIMESTAMP() function. You can use GoogleSql to generate SafeQuery objects for e.g. BigQuery.

Effectively Use SafeQuery

If you are using Maven, make sure to put mug-errorprone artifact in your annotationProcessorPath in order to get the compile-time guardrails.

Then refer to the javadoc of SafeQuery and GoogleSql for the specifics of the API.

Overall, there are a few best practices and examples worth mentioning.

Pass Around SafeQuery Parameters Instead of Raw String

Create a client library that wraps the interface with the underlying database. The interface should accept SafeQuery as the input instead of raw String so that you can better enforce the protection. The client library internally can call SafeQuery#toString() to get the wrapped query, but the callers of the client library must all pass in SafeQuery. Example:

class BigQueryClient {

  public BigQueryResponse sendQuery(SafeQuery query) {
    // ...
  }
}

...

// a caller that calls BigQuery
private static final StringFormat.Template<SafeQuery> JOBS_TIMELINE_BY_PROJECT_ID =
    GoogleSql.template(
        """
          SELECT period_start, period_slot_ms
          FROM INFORMATION_SCHEMA.JOBS_TIMELINE
          WHERE period_start BETWEEN {start_time} AND {end_time}
              AND project_id = '{project_id}'
        """);

private final BigQueryClient bigQuery;

// ...
  Instant startTime = ...;
  Instant endTime = ...;
  String projectId = ...;
  var response = bigQuery.sendQuery(
      JOBS_TIMELINE_BY_PROJECT_ID.with(startTime, endTime, projectId));

Quote String Literals in the Template

If the query expects a string literal, do quote the placeholder in the template, like the '{project_id}' in the above example. Failing to do so will not compile and even if you manage to fool the compile-time checks, SafeQuery will throw IllegalArgumentException.

This makes the query template clearer in terms of what's expected to be string.

Backtick-quote Identifiers in the Template

To parameterize by table name, column names etc., use backticks to quote around the placeholder (or else you get a compile-time error).

For example:

private static StringFormat.To<SafeQuery> SLOT_MS_PER_PERIOD =
    GoogleSql.template(
        """
        SELECT sum(period_slot_ms) as total_slot_ms FROM `{timeline_table}`
        GROUP BY period_start
        """);

// ...
  SafeQuery query = SLOT_MS_PER_PERIOD.with("JOBS_TIMELINE_BY_PROJECT");

Compose SafeQuery

The above example parameterizes the query by table name, but sometimes you might want to pass in a sub query instead.

For that, you can use a smaller SafeQuery object to interpolate the larger SafeQuery. Example:

private static StringFormat.Template<SafeQuery> WINDOW_TIMELINE =
    GoogleSql.template(
        """
        SELECT * FROM ({timeline_table})
        WHERE period_start BETWEEN {start_time} AND {end_time}
        """);
// ...
  SafeQuery timelineInWindow = WINDOW_TIMELINE.with(timelineTable, startTime, endTime);
  SafeQuery query = GoogleSql.from(
      """
      SELECT * FROM ({timeline_table})
      ORDER BY period_start
      """,
      timelineInWindow);

Encapsulate Trusted Input

SafeQuery and GoogleSql only allow types that are known to be safe (numbers, enums, Instant, ZonedDateTime, LocalDate, compile-time string literals, quoted string values, backtick-quoted identifiers etc., and other SafeQuery objects).

But sometimes you may have string values that come from trusted sources (say, from a command-line flag). These string values are not compile-time string literals, so you won't be able to pass them to SafeQuery.of(), or use them as unquoted placeholder values.

As a work-around, you can create your own TrustedSql class. If you tightly control the construction of this class only from trusted sources, it'll be provably safe. You can then set the system property com.google.mu.safesql.SafeQuery.trusted_sql_type to this class name. SafeQuery will pick it up and objects of this type will be accepted.

For example:

package com.mycompany.safesql;

public final class TrustedSql {
  private final String sql;

  private TrustedSql(String sql) {
    this.sql = sql;
  }

  public static TrustedSql fromFlag(Flag<String> stringFlag) {
    return new TrustedSql(stringFlag.get());
  }

  public static TrustedSql loadFromResource(String resourceName)
      throws IOException {
    ...
  }

  /** The class is expected to be annotated with @Query("SELECT ..."). */
  public static TrustedSql loadFromQueryAnnotation(Class<?> annotatedClass) {
    ...
  }

  @Override public String toString() {
    return sql;
  }
}

// ...
SafeQuery query = SLOT_MS_PER_PERIOD.with(
    TrustedSql.loadFromResource("com/google/mycompany/all_job_timeline.sql"));

Iterables are Auto-expanded

As a convenience feature, when you pass an Iterable as the value of a template placeholder, it'll be automatically expanded, using , as the delimiter.

For example:

private static final StringFormat.To<SafeQuery> GET_JOBS_BY_ID =
    SafeQuery.template("SELECT * FROM Jobs WHERE job_id IN ({ids})");

...
List<Integer> ids = asList(1, 2, 3);
SafeQuery query = GET_JOBS_BY_ID.with(ids);
// produces "SELECT * FROM Jobs WHERE job_id IN (1, 2, 3)

If you expect the list elements to be string values, you need to quote the placeholder (the library will throw otherwise):

List<String> ids = asList("foo", "bar's");
SafeQuery query =
    SafeQuery.of("SELECT * FROM Jobs WHERE job_id IN ('{ids}')", ids);
// produces "SELECT * FROM Jobs WHERE job_id IN ('foo', 'bar\'s')