Skip to content

ENG-50887: Mask value using a masking config#227

Merged
siddhant2001 merged 28 commits intomainfrom
mask-unbofuscated-cookies
Oct 10, 2024
Merged

ENG-50887: Mask value using a masking config#227
siddhant2001 merged 28 commits intomainfrom
mask-unbofuscated-cookies

Conversation

@siddhant2001
Copy link
Contributor

@siddhant2001 siddhant2001 commented Sep 27, 2024

  • Write tests and verify correctness

@github-actions
Copy link

github-actions bot commented Sep 27, 2024

Test Results

330 tests  +5   330 ✅ +5   11s ⏱️ ±0s
 42 suites ±0     0 💤 ±0 
 42 files   ±0     0 ❌ ±0 

Results for commit d38e08c. ± Comparison against base commit e092fbf.

This pull request removes 72 and adds 18 tests. Note that renamed tests count towards both.
            long: 3600000
            valueType: LONG
          columnName: "SERVICE.startTime"
          long: 1570658506605
          long: 1727521612551
          long: 1727521614904
          long: 1727521615242
          long: 1727521633079
          long: 1727525212711
          long: 1727525214904
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.spanTags"
      }
    }
    operator: CONTAINS_KEY
    rhs {
      literal {
        value {
          string: "span.kind"
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.spanTags"
        subpath: "span.kind"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          string: "client"
        }
      }
    }
  }
}
selection {
  at…, 10, server
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "API_TRACE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556715813
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "API_TRACE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560315969
        }
      }
    }
  }
  …
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "BACKEND.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718115
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "BACKEND.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318115
        }
      }
    }
  }
  chil…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718415
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318415
        }
      }
    }
  }
  chil…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "API_TRACE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556715813
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "API_TRACE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560315969
        }
      }
  …
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "BACKEND.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718115
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "BACKEND.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318115
        }
      }
    }
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.startTime"
      }
    }
    operator: GT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1570658506605
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "EVENT.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 2570744906673
        }
      }
    }
  }
…, 2, server
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [2] filter {
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556718415
        }
      }
    }
  }
  childFilter {
    lhs {
      attribute_expression {
        attributeId: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560318415
        }
      }
    }
…
org.hypertrace.core.query.service.htqueries.HTPinotQueriesTest ‑ [3] selection {
  function {
    functionName: "sum"
    arguments {
      attribute_expression {
        attributeId: "EVENT.spanTags"
        subpath: "otel.status_code"
      }
    }
  }
}
, 1, 0.0
org.hypertrace.core.query.service.htqueries.HTPostgresQueriesTest ‑ [1] filter {
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: GE
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728556736851
        }
      }
    }
  }
  childFilter {
    lhs {
      columnIdentifier {
        columnName: "SERVICE.startTime"
      }
    }
    operator: LT
    rhs {
      literal {
        value {
          valueType: LONG
          long: 1728560336851
        }
      }
    }
  }
  chil…
…

♻️ This comment has been updated with latest results.

tenantScopedMaskingCriteria = [
{
"tenantId": "testTenant",
"timeRangeAndMaskValues": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can have multiple TimeRange conditions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each timeRange condition will have it's own masking values

# ]
# }
# ]
tenantScopedMaskingCriteria = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's comment it in main application.conf file here. As a default, there is no masking applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done, you've reviewed an older commit

public class HandlerScopedMaskingConfig {
private static final String TENANT_SCOPED_MASKS_CONFIG_KEY = "tenantScopedMaskingCriteria";
private final Map<String, List<MaskValuesForTimeRange>> tenantToMaskValuesMap;
private HashMap<String, Boolean> shouldMaskAttribute = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is key in both these maps - shouldMaskAttribute and maskedValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeId is the key to both.
Removing shouldMaskAttribute as it is not needed.


private static boolean isTimeRangeOverlap(
MaskValuesForTimeRange timeRangeAndMasks, Instant queryStartTime, Instant queryEndTime) {
boolean timeRangeOverlap = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be false, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following conditionals check for no overlap, i.e. they set the timeRangeOverlap to false. This statement is correct.

timeRangeOverlap = false;
}

Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be getEndTimeMillis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, is this what we are looking as function?

if (timeRangeAndMasks.getStartTimeMillis().isPresent()) {
      
      Instant startTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get());
      Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getEndTimeMillis().get());

      if (!(startTimeInstant.isAfter(queryEndTime) || endTimeInstant.isBefore(queryStartTime))) {
        return true;
      }
    }
    return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed the condition.

}

public void parseColumns(ExecutionContext executionContext) {
shouldMaskAttribute.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the state is maintained per request, but we should only test against the timeRange condition.

Pre-compute using config:

  • tenantId -> List of timeRanges
  • timeRange -> set of attributes
  • timeRange -> map(attributeId, maskValue)

During response processing:

  • Check if any time range matches.
  • Pick the first match (or should we apply UNION?).
    • If UNION is used, and the same attribute is present in two time ranges with different mask values, which one should we consider? I guess any Should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I've done it.
In case of attribute in multiple time ranges, I choose any value.


return Observable.fromIterable(rowBuilderList)
.map(Builder::build)
// .map(row -> handlerScopedMaskingConfig.mask(row))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

# "tenantId": "testTenant",
# "timeRangeAndMaskValues": [
# {
# "startTimeMillis": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should take the timestamp as mandatory.

  • if startTime or endTime missing -> log an warn stating that the filter will be ignored.

if (indexToLogicalName.containsKey(colIdx)) {
return indexToLogicalName.get(colIdx);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove additional space.

return result;
}

String getLogicalNameFromColIdx(Integer colIdx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets see if we can use Optional as return type.


private static final String MASKED_VALUE = "*";
// This is how empty list is represented in Pinot
private static final String PINOT_EMPTY_LIST = "[\"\"]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PINOT_EMPTY_LIST -> ARRAY_TYPE_MASKED_VALUE
MASKED_VALUE-> DEFAULT_MASKED_VALUE

public List<String> getMaskedAttributes(ExecutionContext executionContext) {
String tenantId = executionContext.getTenantId();
List<String> maskedAttributes = new ArrayList<>();
// maskedValue.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this.

this.tenantId = config.getString(TENANT_ID_CONFIG_KEY);
this.maskValues =
config.getConfigList(TIME_RANGE_AND_MASK_VALUES_CONFIG_KEY).stream()
.map(MaskValuesForTimeRange::new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter out the empty maskings

// to retrieve data
String colVal = resultAnalyzer.getDataFromRow(rowId, logicalName);
String colVal =
!maskedAttributes.contains(logicalName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : get rid of the ! and invert to simplify


Observable<Row> convert(ResultSetGroup resultSetGroup, LinkedHashSet<String> selectedAttributes) {
Observable<Row> convert(ResultSetGroup resultSetGroup, ExecutionContext executionContext) {
LinkedHashSet<String> selectedAttributes = executionContext.getSelectedColumns();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this also inside resultSetGroup.getResultSetCount() > 0?

for (int colIdx = 0; colIdx < resultSet.getColumnCount(); colIdx++) {
for (int colIdx = 0, logicalColIdx = 0;
colIdx < resultSet.getColumnCount();
colIdx++, logicalColIdx++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need logicalColIdx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple map fields. When creating the idxToLogical name map, a map field would only increment the counter by one. Here each map field increments colIdx by 2. That's why I have created a new variable, which is incremented only once even if we go through a maps 2 columns (key and value)

}

Optional<String> getLogicalNameFromColIdx(Integer colIdx) {
return Optional.ofNullable(indexToLogicalName.get(colIdx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what scenario, can it be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it shouldn't be

@kotharironak kotharironak self-requested a review October 10, 2024 11:28
kotharironak
kotharironak previously approved these changes Oct 10, 2024
Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddhant2001 siddhant2001 merged commit cd60ebb into main Oct 10, 2024
@siddhant2001 siddhant2001 deleted the mask-unbofuscated-cookies branch October 10, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants