-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq join bug fix #1169
Merged
Merged
Seq join bug fix #1169
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
...-impl/src/main/scala/com/linkedin/feathr/offline/join/workflow/AnchoredFeatureJoinStep.scala
Show resolved
Hide resolved
...c/main/scala/com/linkedin/feathr/offline/derived/strategies/SequentialJoinAsDerivation.scala
Outdated
Show resolved
Hide resolved
feathr-impl/src/test/scala/com/linkedin/feathr/offline/AnchoredFeaturesIntegTest.scala
Show resolved
Hide resolved
feathr-impl/src/main/scala/com/linkedin/feathr/offline/derived/DerivedFeatureEvaluator.scala
Show resolved
Hide resolved
rakeshkashyap123
added
the
safe to test
Tag to execute build pipeline for a PR from forked repo
label
May 16, 2023
anirudhagar13
approved these changes
May 16, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Yuqing-cat
pushed a commit
to Yuqing-cat/feathr
that referenced
this pull request
May 23, 2023
* Seq join bug fix * Address comments * version bump --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
xiaoyongzhu
added a commit
that referenced
this pull request
May 31, 2023
* add spark sql doc Signed-off-by: Yuqing Wei <weiyuqing021@outlook.com> * Clean up redis keys created by CI tests (#1100) * Bump version to 1.0.0 (#1104) * Update ARM template to use v1.0.0 tag (#1106) - Update pre-built docker image from `feathrfeaturestore/feathr-registry:releases-v0.9.0` to `feathrfeaturestore/feathr-registry:releases-v1.0.0` * Improve CI/CD workflow configuration (#1105) - Update workflow names to be more descriptive - Restrict pull_request_target configuration to workflow requires secret access - Isolate gradle test off E2E test, and trigger for scala change only * Add a guide to use Feathr in MLOps v2 Solution Accelerator (#1103) * Improve Quickstart and Release Guide for v1.0.0 (#1107) - Update the quickstart guide to make it easier for users to get started, validate feature definitions and develop new things * Implement an optional null filter before join (#1098) * Add null filter * Add spark flag * filter obs data nulls * Remove feature data null handling * Update test * remove additional test --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Add support for external SWA library (#1093) * working test * Minor comment * bump version * documentation update * update version --------- Co-authored-by: rkashyap <rkashyap@linkedin.com> Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Update GitHub Actions for building and pushing images (#1109) This PR addresses the issue of Spark materialize job failure on machines with an arm platform, such as Mac M1, due to pre-fetched amd64 versions of Python packages and Maven jars during docker image creation. To resolve this problem, Sandbox Docker GitHub action is updated to support the arm 64 platform. - Update job name in `.github/workflows/publish-to-dockerhub.yml` - Update `build-push-action` from v3 to v4 - Add `setup-qemu-action` and `setup-buildx-action` - Add support for Linux/AMD64 and Linux platforms * Upgrade actions/checkout version from v2 to v3 to clean up node 12 deprecated warnings (#1110) - Upgrade action checkout version from `v2` to `v3` * add simulate time delay feature (#1108) * Support sql expression in FDSExtract (#1112) * Add Fake Data Generator (#1113) * Add Fake Data Generator * update * Update data_generator.py * Update README (#1119) * Update README to reflect the latest thought * update readme * Allow alien value in MVEL-based derivations (#1120) * Fix feathr hocon command (#1121) * Honor debug.output.num.parts in debug mode (#1122) * Fix "value is not a valid dict" (#1111) (#1126) Fix "value is not a valid dict" when access sql-registry api /projects/{project}/datasources/{datasource} Co-authored-by: brianxiao <brianxiao@tencent.com> * Fix skipping features when derived feature contains a swa feature (#1128) * Fix skipping features when derived feature contains a swa feature * Fix comments * Update documentation * update version --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Skip snowflakes test in CI (#1131) * Add Feathr chat bot in the notebook(Experimental, powered by ChatGPT) (#1132) * ChatGPT integration * Delay version bump * Fix bug when SWA hdfs and local paths without data.avro.json extensio… (#1130) * fix bug when SWA hdfs and local paths without data.avro.json extensions are included for evaluation * try * Fix tests * revert test file * Add tests * Add private classifier to variable * fix test * fix test --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Revert mvel log (#1140) * Revert "Allow alien value in MVEL-based derivations (#1120) and remove stdout statements" This reverts commit 55290e7. * updating rc version after last commit --------- Co-authored-by: Anirudh Agarwal <aniagarw@aniagarw-mn1.linkedin.biz> * Exclude experimental changes under feathr/chat for test coverage check (#1142) * Revert "Update GitHub Actions for building and pushing images (#1109)" (#1141) * Add try and catch for getTensorFeatures (#1136) * Add try and catch for getTensorFeatures * Attach the original exception with the throw --------- Co-authored-by: Minh Nguyen <minnguyen@linkedin.com> * Enable override_time_delay (#1144) * Update query_feature_list.py * Update query_feature_list.py * Fix incorrect merge in PR #1141 * #latest should pick the latest available path (#1146) * #latest should pick the latest available path * update gradle.properties * add empty folder --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Update README.md * Update README.md * Section spell fix (#1147) * Update troubleshoot-feature-definition.md * Add a flag for adding a default value column for missing data features (#1149) * WIP: safe mode * Add swallowedExceptionHandler * Fix minor bug * Address comments * version bump --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Improve debug logging (#1150) * Suppressed exceptions api (#1152) * Add another API for accessing doJoinObsAndFeatures which suppresses exceptions * version bump --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Fix doJoinObsAndFeaturesWithSuppressedExceptions API (#1153) Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * minor version bump to 1.0.2-rc9 (#1154) Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * made function interface consistent with underlying delegation call (#1156) Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz> * minor version bump (#1157) Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz> * Update feathr-snowflake-guide.md * fix debug path limit (#1160) * Add default column for missing features (#1158) * Add default column for missing features * Fix failing test * Fix SWA sparksession issue * address comments * Add comment * bump version --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> Co-authored-by: Jinghui Mo <jmo@linkedin.com> * Add new multi-level aggregation framework and bucketed count distinct aggregation. (#1159) The bucketed aggregation works by aggregate data at lower level timestamp, e.g. 5 minutes bucket, then leverage the lower level bucket aggregated result to produce the higher level aggregation result such as 1 hour, 1 day, etc. The support levels are 5 minutes, 1 hour, 1 week, 1 month, 1 year. * Fix bug when skipping missing feature data (#1161) Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * version bump (#1162) * version bump * add logs --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * fix for handling missing feature data (#1163) Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz> * Fix bug when skipping anchored features with missing data (#1164) Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * minor version bump to consume latest fix (#1165) Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz> * Allow alien value in MVEL-based derivations (#1120) (#1166) Add feature value wrapper for 3rdparity feature value compatibility * add bucketed_sum aggregation (#1168) * Seq join bug fix (#1169) * Seq join bug fix * Address comments * version bump --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * Fix failing tests * Support high-dimensional tensor in derivations (#1172) * Fix bug in SWA with missing feature data (#1171) * Fix bug in SWA with missing feature data * remove unwanted code * Address feedback and version bump --------- Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> * minor version bump due to a PR getting directly merged (#1173) Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz> * sparksql source doc --------- Signed-off-by: Yuqing Wei <weiyuqing021@outlook.com> Co-authored-by: Enya-Yx <108409954+enya-yx@users.noreply.github.com> Co-authored-by: Blair Chen <blrchen@users.noreply.github.com> Co-authored-by: Rizo-R <56843532+Rizo-R@users.noreply.github.com> Co-authored-by: rakeshkashyap123 <hanasoge@usc.edu> Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz> Co-authored-by: rkashyap <rkashyap@linkedin.com> Co-authored-by: aabbasi-hbo <92401544+aabbasi-hbo@users.noreply.github.com> Co-authored-by: Jinghui Mo <jmo@linkedin.com> Co-authored-by: Xiaoyong Zhu <xiaoyongzhu@users.noreply.github.com> Co-authored-by: BrianXiao <187150266@qq.com> Co-authored-by: brianxiao <brianxiao@tencent.com> Co-authored-by: Anirudh Agarwal <anirudhagarwal13@gmail.com> Co-authored-by: Anirudh Agarwal <aniagarw@aniagarw-mn1.linkedin.biz> Co-authored-by: Minh Nguyen <44143370+minhmo1620@users.noreply.github.com> Co-authored-by: Minh Nguyen <minnguyen@linkedin.com> Co-authored-by: Hangfei Lin <hnlin@linkedin.com> Co-authored-by: nj879 <127419491+nj879@users.noreply.github.com> Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If an anchored feature exists in a seq join feature's expansion feature, and its data is missing, then feathr was throwing a bug. This PR handles that issue.