Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq join bug fix #1169

Merged
merged 3 commits into from
May 16, 2023
Merged

Seq join bug fix #1169

merged 3 commits into from
May 16, 2023

Conversation

rakeshkashyap123
Copy link
Collaborator

If an anchored feature exists in a seq join feature's expansion feature, and its data is missing, then feathr was throwing a bug. This PR handles that issue.

@rakeshkashyap123 rakeshkashyap123 added the safe to test Tag to execute build pipeline for a PR from forked repo label May 16, 2023
Copy link
Collaborator

@anirudhagar13 anirudhagar13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@anirudhagar13 anirudhagar13 merged commit 402efe1 into main May 16, 2023
@anirudhagar13 anirudhagar13 deleted the seqJoinBugFix branch May 16, 2023 20:46
@rakeshkashyap123 rakeshkashyap123 restored the seqJoinBugFix branch May 16, 2023 21:28
Yuqing-cat pushed a commit to Yuqing-cat/feathr that referenced this pull request May 23, 2023
* Seq join bug fix

* Address comments

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
xiaoyongzhu added a commit that referenced this pull request May 31, 2023
* add spark sql doc

Signed-off-by: Yuqing Wei <weiyuqing021@outlook.com>

* Clean up redis keys created by CI tests (#1100)

* Bump version to 1.0.0 (#1104)

* Update ARM template to use v1.0.0 tag (#1106)

- Update pre-built docker image from `feathrfeaturestore/feathr-registry:releases-v0.9.0` to `feathrfeaturestore/feathr-registry:releases-v1.0.0`

* Improve CI/CD workflow configuration (#1105)

- Update workflow names to be more descriptive
- Restrict pull_request_target configuration to workflow requires secret access
- Isolate gradle test off E2E test, and trigger for scala change only

* Add a guide to use Feathr in MLOps v2 Solution Accelerator (#1103)

* Improve Quickstart and Release Guide for v1.0.0 (#1107)

- Update the quickstart guide to make it easier for users to get started, validate feature definitions and develop new things

* Implement an optional null filter before join (#1098)

* Add null filter

* Add spark flag

* filter obs data nulls

* Remove feature data null handling

* Update test

* remove additional test

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Add support for external SWA library (#1093)

* working test

* Minor comment

* bump version

* documentation update

* update version

---------

Co-authored-by: rkashyap <rkashyap@linkedin.com>
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Update GitHub Actions for building and pushing images (#1109)

This PR addresses the issue of Spark materialize job failure on machines with an arm platform, such as Mac M1, due to pre-fetched amd64 versions of Python packages and Maven jars during docker image creation. To resolve this problem, Sandbox Docker GitHub action is updated to support the arm 64 platform.

- Update job name in `.github/workflows/publish-to-dockerhub.yml`
- Update `build-push-action` from v3 to v4
- Add `setup-qemu-action` and `setup-buildx-action`
- Add support for Linux/AMD64 and Linux platforms

* Upgrade actions/checkout version from v2 to v3 to clean up node 12 deprecated warnings (#1110)

- Upgrade action checkout version from `v2` to `v3`

* add simulate time delay feature (#1108)

* Support sql expression in FDSExtract (#1112)

* Add Fake Data Generator (#1113)

* Add Fake Data Generator

* update

* Update data_generator.py

* Update README (#1119)

* Update README to reflect the latest thought

* update readme

* Allow alien value in MVEL-based derivations (#1120)

* Fix feathr hocon command (#1121)

* Honor debug.output.num.parts in debug mode (#1122)

* Fix "value is not a valid dict" (#1111) (#1126)

Fix "value is not a valid dict"
when access sql-registry api /projects/{project}/datasources/{datasource}

Co-authored-by: brianxiao <brianxiao@tencent.com>

* Fix skipping features when derived feature contains a swa feature (#1128)

* Fix skipping features when derived feature contains a swa feature

* Fix comments

* Update documentation

* update version

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Skip snowflakes test in CI (#1131)

* Add Feathr chat bot in the notebook(Experimental, powered by ChatGPT) (#1132)

* ChatGPT integration

* Delay version bump

* Fix bug when SWA hdfs and local paths without data.avro.json extensio… (#1130)

* fix bug when SWA hdfs and local paths without data.avro.json extensions are included for evaluation

* try

* Fix tests

* revert test file

* Add tests

* Add private classifier to variable

* fix test

* fix test

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Revert mvel log (#1140)

* Revert "Allow alien value in MVEL-based derivations (#1120) and remove stdout statements"

This reverts commit 55290e7.

* updating rc version after last commit

---------

Co-authored-by: Anirudh Agarwal <aniagarw@aniagarw-mn1.linkedin.biz>

* Exclude experimental changes under feathr/chat for test coverage check (#1142)

* Revert "Update GitHub Actions for building and pushing images (#1109)" (#1141)

* Add try and catch for getTensorFeatures (#1136)

* Add try and catch for getTensorFeatures

* Attach the original exception with the throw

---------

Co-authored-by: Minh Nguyen <minnguyen@linkedin.com>

* Enable override_time_delay (#1144)

* Update query_feature_list.py

* Update query_feature_list.py

* Fix incorrect merge in PR #1141

* #latest should pick the latest available path (#1146)

* #latest should pick the latest available path

* update gradle.properties

* add empty folder

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Update README.md

* Update README.md

* Section spell fix (#1147)

* Update troubleshoot-feature-definition.md

* Add a flag for adding a default value column for missing data features (#1149)

* WIP: safe mode

* Add swallowedExceptionHandler

* Fix minor bug

* Address comments

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Improve debug logging (#1150)

* Suppressed exceptions api (#1152)

* Add another API for accessing doJoinObsAndFeatures which suppresses exceptions

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Fix doJoinObsAndFeaturesWithSuppressedExceptions API (#1153)

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* minor version bump to 1.0.2-rc9 (#1154)

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* made function interface consistent with underlying delegation call (#1156)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>

* minor version bump (#1157)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>

* Update feathr-snowflake-guide.md

* fix debug path limit (#1160)

* Add default column for missing features (#1158)

* Add default column for missing features

* Fix failing test

* Fix SWA sparksession issue

* address comments

* Add comment

* bump version

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Jinghui Mo <jmo@linkedin.com>

* Add new multi-level aggregation framework and bucketed count distinct aggregation. (#1159)

The bucketed aggregation works by aggregate data at lower level timestamp, e.g. 5 minutes bucket, then leverage the lower level bucket aggregated result to produce the higher level aggregation result such as 1 hour, 1 day, etc.

The support levels are 5 minutes, 1 hour, 1 week, 1 month, 1 year.

* Fix bug when skipping missing feature data (#1161)

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* version bump (#1162)

* version bump

* add logs

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* fix for handling missing feature data (#1163)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>

* Fix bug when skipping anchored features with missing data (#1164)

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* minor version bump to consume latest fix (#1165)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>

* Allow alien value in MVEL-based derivations (#1120) (#1166)

Add feature value wrapper for 3rdparity feature value compatibility

* add bucketed_sum aggregation (#1168)

* Seq join bug fix (#1169)

* Seq join bug fix

* Address comments

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* Fix failing tests

* Support high-dimensional tensor in derivations (#1172)

* Fix bug in SWA with missing feature data (#1171)

* Fix bug in SWA with missing feature data

* remove unwanted code

* Address feedback and version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>

* minor version bump due to a PR getting directly merged (#1173)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>

* sparksql source doc

---------

Signed-off-by: Yuqing Wei <weiyuqing021@outlook.com>
Co-authored-by: Enya-Yx <108409954+enya-yx@users.noreply.github.com>
Co-authored-by: Blair Chen <blrchen@users.noreply.github.com>
Co-authored-by: Rizo-R <56843532+Rizo-R@users.noreply.github.com>
Co-authored-by: rakeshkashyap123 <hanasoge@usc.edu>
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: rkashyap <rkashyap@linkedin.com>
Co-authored-by: aabbasi-hbo <92401544+aabbasi-hbo@users.noreply.github.com>
Co-authored-by: Jinghui Mo <jmo@linkedin.com>
Co-authored-by: Xiaoyong Zhu <xiaoyongzhu@users.noreply.github.com>
Co-authored-by: BrianXiao <187150266@qq.com>
Co-authored-by: brianxiao <brianxiao@tencent.com>
Co-authored-by: Anirudh Agarwal <anirudhagarwal13@gmail.com>
Co-authored-by: Anirudh Agarwal <aniagarw@aniagarw-mn1.linkedin.biz>
Co-authored-by: Minh Nguyen <44143370+minhmo1620@users.noreply.github.com>
Co-authored-by: Minh Nguyen <minnguyen@linkedin.com>
Co-authored-by: Hangfei Lin <hnlin@linkedin.com>
Co-authored-by: nj879 <127419491+nj879@users.noreply.github.com>
Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Tag to execute build pipeline for a PR from forked repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants