-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore: Add transformations to correlation data links #61799
Conversation
55724fb
to
fac0c56
Compare
Backend code coverage report for PR #61799 |
Frontend code coverage report for PR #61799
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… variables between links
65561b4
to
b1cf69b
Compare
if (link.internal?.transformations) { | ||
const fieldValue = field.values.get(rowIndex); | ||
link.internal?.transformations.forEach((transformation) => { | ||
if (transformation.type === 'regex' && transformation.expression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement it in a cleaner way? This has the potential to grow quite a bit, maybe break the logic for transformations out to its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will defo need it in the future and maybe move it to feature/correlations
but we can do it later. Also you can do it now as it may make it easier to test just the transformation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I was able to extract fields from Loki nicely with change. Probably worth adding some tests to dataLink.ts / links.ts
.betterer.results
Outdated
@@ -3038,6 +3038,9 @@ exports[`better eslint`] = { | |||
[0, 0, 0, "Do not use any type assertions.", "0"], | |||
[0, 0, 0, "Do not use any type assertions.", "1"] | |||
], | |||
"public/app/features/correlations/transformations.ts:5381": [ | |||
[0, 0, 0, "Do not use any type assertions.", "0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened DefinitelyTyped/DefinitelyTyped#64263 to get rid of this
@@ -192,7 +192,7 @@ export const safeParseJson = (text?: string): any | undefined => { | |||
}; | |||
|
|||
export const safeStringifyValue = (value: any, space?: number) => { | |||
if (!value) { | |||
if (value === undefined || value === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this slightly changes the behaviour of a bunch of things, like the panel model inspector in dashboards and variables, did we test it's safe? (especially for variables as false
and 0
would have resulted in an empty string, but "false"
and "0"
now, not sure if that's even possible tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just couldn't imagine why we would want true to evaluate to "true" and false to evaluate to an empty string. Let's look at the usages of this function outside how it is used in this PR
features/dashboard/state/PanelModel.ts
- passes the entire model into this function, and that model is a required parameter. I am extremely confident that PanelModel will never evaluate to false or 0, there are a lot of required fields in it.features/variables/utils.ts
- safe-stringifies the first arg, concatenates all args together split by a space except the last one and checks that string has matches for any of the three formats for variables. I do not believe that having false or 0 evaluate to an empty string or not will have any impact on this, because the safe-stringify only runs on the first argument, and in all cases a variable reference must start with either$
or[
which is not possible with a false or 0 scenario.features/variables/inspect/utils.ts
- Similar to the above, we safe-stringify a value and check if it matches any of the variable patterns. False or 0 or empty string will all not match. That value is not used again - it will use the matching groups instead.plugins/datasource/prometheus/datasource.tsx
- This will only run if the if statement depending on the same value is true, so if the value passed in is 0 or false, it will never run in the first place
I'm very confident based on those usages that this logic will not impact anything.
To document some discussion that happened off this PR, @Elfo404 @ifrost and I looked at this solution with the original design doc and decided on a couple things
To this effect, I have kept the existing solution but implemented logic to take advantage of named capture groups. I have added a test to show how this works, and an example provisioning and CSV pair will be available below. However, when looking into this, I really would like reviewers to reconsider removing the regex variable name from what can be defined. In my opinion, in order of increasing complexity, regexs could fall into the following categories
There could also be combinations of the above in the same regex, but the pattern would always be that named capture groups would get those names and be guaranteed to work, and any unnamed capture groups in the same expression would most likely not work. I think we should prioritize the user experience of that first category. I predict users that can figure out named capture groups will understand that the names they define override what is set in their configuration, vs a user who is struggling to understand why their variable has to match their field name with no alternative. I also think that a variable definition gives us a reasonable path forward to supporting multiple unnamed capture groups. In short (ha!) I think users will most likely take the path of creating 10 simple regex transformations rather than 1 complex regex expression, and we should prioritize making that as simple as possible. Currently the PR keeps the transformation variable name option. If you disagree with the above, I can remove it. Example datasource provisioning yaml with named regex capture groups
- targetUID: WyFv5154z
label: "Superhero complicated regex"
description: "this is a test regex correlation from provisioning"
config:
type: query
target:
editorMode: "code"
format: "table"
rawQuery: "true"
rawSql: "SELECT * FROM superhero WHERE name='$${name}' AND alignment='$${align}'"
refId: "A"
field: "text"
transformations:
- type: "regex"
expression: "(?=.*(?<align>(good|bad)))(?=.*(?<name>(Superman|Batman)))" |
Also, none of this covers another use case that we are holding off on, which is when one capture group matches multiple times 🙃 |
It's gonna be difficult to have one transformation to support all use cases. I like that we start with something powerful that can be simplified to the user later. Also we know that regex is complex to many users anyway. That's why we're providing logfmt and we can provide more in the future based on use cases (e.g. extracting key value pairs, and labels). We don't want users to use only regex. I can imagine we can add a new transformation (simpleregex) that would be the default allowing to write an expression and name of the first match (basically what you had had before the change). A simpleregex won't prevent as from doing transformations in the future, I can imagine a data frame transformation that would behave this way. The only thing we need to remember is to keep configuration decoupled from the current logic - so as we said we shouldn't use "variable" property in the settings yet (but we could call it "name" to indicate it's a name of the first matching). Anyway, I feel its powerful and flexible with multiple matches and still open to go in many directions in the future. |
@ifrost In my opinion, adding a variable name option doesn't tie configuration to logic any more than any other definitions we specify, but I trust in your vision of this feature and will remove it. Additionally, this limitation now means users cannot do multiple transformations with unnamed capture groups - it will override the fieldName variable with the last transformation. Again, I anticipate the most likely use case will be multiple simple regexes, and this will prohibit that from working without defining the capture group name. |
I was only concerned about the naming. Let's say we have provisioning looking like: - name: testData-correlations
correlations:
- targetUID: WyFv5154z
config:
type: query
target: ...
field: "text"
transformations:
- type: "regex"
expression: "(Superman|Batman)"
variable: "name" And in the future, we want to use real transformations where - name: testData-correlations
correlations:
- targetUID: WyFv5154z
config:
type: query
target: ...
field: "text"
transformations:
- type: "regex"
expression: "(Superman|Batman)"
fieldName: "name" Only the name of the last property changes. From the user perspective and other configuration options, nothing changes and we can start adding more super-sophisticated transformations to the mix. Of course, it's just a minor thing, we can deprecate the config, and support Totally agree it "doesn't tie configuration to logic". It seems like a valid transformation that takes a regex, simple expression, and creates a field with name provided by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff 🎉
I think it'd be great to have better provisioning validation, but we can add it in a separate PR.
For example we should check if the correct structure of the config is provided. I was testing it with the example in the PR description and it created a link without transformations (they are incorrectly nested in the yaml example, so please update it). I also mistyped transformation type (regexp
instead of regex
) and got a link with no transformations and no errors. We do it with some other properties (e.g. we validate if a correct correlation type, i.e. query
) is provided and return an error when provisioning is parsed.
@@ -40,12 +40,21 @@ export interface DataLink<T extends DataQuery = any> { | |||
internal?: InternalDataLink<T>; | |||
} | |||
|
|||
/** @internal */ | |||
export interface DataLinkTransformationConfig { | |||
type: 'regex' | 'logfmt'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: It could be an enum
…or bad type name and format
@ifrost I added the provisioning checks you asked for (although the errors do seem to come out a little garbled, not sure if there's something I'm missing with that) and the enum change. I checked on how one could do enums in go and it didn't seem as straightforward as typescript so I left it as is - let me know if you have any thoughts about that. |
…na/transformation
@ryantxu This is what we discussed yesterday - let me know what you think would be a better term for this feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex
validation doesn't seem to check if expression is provided (it's required by regex transformation to make it work)
* bring in source from database * bring in transformations from database * add regex transformations to scopevar * Consolidate types, add better example, cleanup * Add var only if match * Change ScopedVar to not require text, do not leak transformation-made variables between links * Add mappings and start implementing logfmt * Add mappings and start implementing logfmt * Remove mappings, turn off global regex * Add example yaml and omit transformations if empty * Fix the yaml * Add logfmt transformation * Cleanup transformations and yaml * add transformation field to FE types and use it, safeStringify logfmt values * Add tests, only safe stringify if non-string, fix bug with safe stringify where it would return empty string with false value * Add test for transformation field * Do not add null transformations object * Break out transformation logic, add tests to backend code * Fix lint errors I understand 😅 * Fix the backend lint error * Remove unnecessary code and mark new Transformations object as internal * Add support for named capture groups * Remove type assertion * Remove variable name from transformation * Add test for overriding regexes * Add back variable name field, but change to mapValue * fix go api test * Change transformation types to enum, add better provisioning checks for bad type name and format * Check for expression with regex transformations
What is this feature?
Transformations act as a lens in which we focus on specific pieces of source data to be used by target data. This implements the
regex
andlogfmt
transformations as a first pass.Why do we need this feature?
We have the ability to correlate data from one datasource to another, but most of the time we will need to do something to the source data to make it suitable for the target data source. Transformations are the answer to that and this is the first example of how they may work.
Who is this feature for?
Anyone using correlations.
Which issue(s) does this PR fix?:
Fixes #60023
Special notes for your reviewer:
Example 1
Example datasource provisioning yaml with regex only
You will need to edit the target datasource to be one that is available in your environment. I have a postgres datasource running with a table of superhero data.
Using the above provisioned datasource, use the scenario 'CSV Content' and use the following CSV
This will create a link to the target datasource defined and the query will contain "Superman" instead of the variable name.
Example 2
Example datasource provisioning yaml with regex and logfmt
You will need to edit the target datasource to be one that is available in your environment. I have a postgres datasource running with a table of superhero data.
Using the above provisioned datasource, use the scenario 'CSV Content' and use the following CSV
This will create a link to the target datasource defined and the query will contain "Superman" instead of the variable name.