Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33282][test] Verify labeler for DSTREAM config entries ending in a single wildcard #19

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

kbendick
Copy link
Owner

@kbendick kbendick commented Nov 4, 2020

Once I changed the pattern /external/kinesis* to external/kinesis*/**/* this works. I've already verified several times that using the terminal **/* works for files that are in either a nested subdirectory or in the directory just below it. The docs say that we'd need to also add a matching glob that only ended in /*, but I have verified that is not the case.

I also verified that this works when the config is changed to simply external/kinesis/*.

Several issues indicate that the docs are somewhat outdated. But given that this is the PR labeler from github, I'd rather open issues there than try using a less official GH action (though many of them are great, but likely the official one provided by GH will be supported and will see work on opened issues eventually).

@kbendick kbendick changed the title [SPARK-33282][test] Verify labeler for dstream with updated globs [SPARK-33282][test] Verify labeler for dstream with a true wildcard in middle folder Nov 5, 2020
@kbendick kbendick changed the title [SPARK-33282][test] Verify labeler for dstream with a true wildcard in middle folder [SPARK-33282][test] Verify labeler for DSTREAM config entries ending in a single wildcard Nov 5, 2020
HyukjinKwon pushed a commit to apache/spark that referenced this pull request Nov 5, 2020
…beler action

### What changes were proposed in this pull request?

This PR removes the old Probot Autolabeler labeling configuration, as the probot autolabeler has been deprecated. I've updated the configs in Iceberg and in Avro, and we also need to update here. This PR adds in an additional workflow for labeling PRs and migrates the old probot config to the new format. Unfortunately, because certain features have not been released upstream, we will not get the _exact_ behavior as before. I have documented where that is and what changes are neeeded, and in the associated ticket I've also discussed other options and why I think this is the best way to go. Definitely a follow up ticket is needed to get the original behavior back in these few cases, but PRs have not been labeled for almost a month and so it's probably best to get it right 95% of the time and occasionally have some UI related PRs labeled as `CORE` while the issue is resolved upstream and/or further investigated.

### Why are the changes needed?

The probot autolabeler is dead and will not be maintained going forward. This has been confirmed with github user [at]mithro in an issue in their repository.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

To test this PR, I first merged the config into my local fork. I then edited it several times and ran tests on that.

Unfortunately, I've overwritten my fork with the apache repo in order to create a proper PR. However, I've also added the config for the same thing in the Iceberg repo as well as the Avro repo.

I have now merged this PR into my local repo and will be running some tests on edge cases there and for validating in general:
- [Check that the SQL label is applied for changes directly below repo root's sql directory](kbendick#16) ✅
- [Check that the structured streaming label is applied](kbendick#20) ✅
- [Check that a wildcard at the end of a pattern will match nested files](kbendick#19) ✅
- [Check that the rule **/*pom.xml will match the root pom.xml file](kbendick#25) ✅

I've also discovered that we're likely not killing github actions that run (like large tests etc) when users push to their PR. In most cases, I see that a user has to mark something as "OK to test", but it still seems like we might want to discuss whether or not we should add a cancellation step In order to save time / capacity on the runners. If so desired, we would add an action in each workflow that cancels old runs when a `push` action occurs on a PR. This will likely make waiting for test runners much faster iff tests are automatically rerun on push by anybody (such as PMCs, PRs that have been marked OK to test, etc). We could free a large number of resources potentially if a cancellation step was added to all of the workflows in the Apache account (as github action API limits are set at the account level).

Admittedly, the fact that the "old" workflow runs weren't cancelled could admittedly be because of the fact that I was working in a fork, but given that there are explicit actions to be added to the start of workflows to cancel old PR workflows and given that we don't have them configured indicates to me that likely this is the case in this repo (and in most `apache` repos as well), at least under certain circumstances (e.g. repos that don't have "Ok to test"-like webhooks as one example).

This is a separate issue though, which I can bring up on the mailing list once I'm done with this PR. Unfortunately I've been very busy the past two weeks, but if somebody else wanted to work on that I would be happy to support with any knowledge I have.

The last Apache repo to still have the probot autolabeler in it is Beam, at which point we can have Gavin from ASF Infra remove the permissions for the probot autolabeler entirely. See the associated JIRA ticket for the links to other tickets, like the one for ASF Infra to remove the dead probot autolabeler's read and write permissions to our PRs in the Apache organization.

Closes #30244 from kbendick/begin-migration-to-github-labeler-action.

Authored-by: Kyle Bendickson <kjbendickson@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
kbendick pushed a commit that referenced this pull request May 29, 2022
…aceable

### What changes were proposed in this pull request?

This PR uses a manual recursion to replace `RuntimeReplaceable` expressions instead of `transformAllExpressionsWithPruning`. The problem of `transformAllExpressionsWithPruning` is it will automatically make the replacement expression inherit  the function alias name from the parent node, which is quite misleading. For example, `select date_part('month', c) from t`, the optimized plan in EXPLAIN before this PR is
```
Project [date_part(cast(c#18 as date)) AS date_part(month, c)#19]
+- Relation default.t[c#18] parquet
```
Now it's
```
Project [month(cast(c#9 as date)) AS date_part(month, c)#10]
+- Relation default.t[c#9] parquet
```

### Why are the changes needed?

fix misleading EXPLAIN result

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

Closes apache#35821 from cloud-fan/follow2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant