Refining example, add utilities, and fix xdist test error (#794)

* Fix xdist test error. Also make a small cleanup some codes Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Revert "Revert 756 (#798)" This reverts commit ff438f5. * revert 798 (revert756 - example notebook refactor). Also add job_utils unit tests Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Update test_azure_spark_e2e.py * Fix doc dead links (#805) This PR fixes dead links detected in latest ci run. The doc scan ci action has been updated to run on main only, as running this in PR frequently reports false alarm due to changes in CI not deployed. * Improve UI experience and clean up ui code warnings (#801) * Add DataSourcesSelect and FlowGraph and ResizeTable components. Fix all warning and lint issues. Signed-off-by: Boli Guan <ifendoe@gmail.com> * Add CardDescriptions component and fix ESlint warning. Signed-off-by: Boli Guan <ifendoe@gmail.com> * Update FeatureDetails page title. Signed-off-by: Boli Guan <ifendoe@gmail.com> * Rename ProjectSelect Signed-off-by: Boli Guan <ifendoe@gmail.com> Signed-off-by: Boli Guan <ifendoe@gmail.com> * Add release instructions for Release Candidate (#809) * Add release instructions for Release Candidate * Add a section for release versioning * Add a section for overall process triggered by the release manager * Bump version to 0.9.0-rc1 (#810) * Fix tests to use mocks and fix get_result_df's databricks behavior Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * fix tem file to dir Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * checkout the feature_derivations.py from main (it was temporally changed to goaround previous issues) Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Remove old databricks sample notebook. Change pip install feathr from the github main branch to pickup the latest changes always Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Fix config and get_result_df for synapse * Fix generate_config to accept all the feathr env var config name Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Add more pytests Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Use None as default dataformat in the job_utils. Instead, set 'avro' as a default output format to the job tags from the client Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Change feathr client to mocked object Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> * Change timeout to 1000s in the notebook Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> Signed-off-by: Jun Ki Min <42475935+loomlike@users.noreply.github.com> Signed-off-by: Boli Guan <ifendoe@gmail.com> Co-authored-by: Blair Chen <blrchen@hotmail.com> Co-authored-by: Blair Chen <blrchen@users.noreply.github.com> Co-authored-by: Boli Guan <ifendoe@gmail.com>
feathr-ai · Nov 23, 2022 · 15550ca · 15550ca
1 parent 799fac0
commit 15550ca
Show file tree

Hide file tree

Showing 39 changed files with 3,852 additions and 3,798 deletions.
diff --git a/.github/workflows/pull_request_push_test.yml b/.github/workflows/pull_request_push_test.yml
@@ -22,7 +22,7 @@ on:
       - "docs/**"
       - "ui/**"
       - "**/README.md"
-  
+
   schedule:
     # Runs daily at 1 PM UTC (9 PM CST), will send notification to TEAMS_WEBHOOK
     - cron: '00 13 * * *'
@@ -127,7 +127,7 @@ jobs:
           SQL1_USER: ${{secrets.SQL1_USER}}
           SQL1_PASSWORD: ${{secrets.SQL1_PASSWORD}}
         run: |
-          # run only test with databricks. run in 4 parallel jobs
+          # run only test with databricks. run in 6 parallel jobs
           pytest -n 6 --cov-report term-missing --cov=feathr_project/feathr feathr_project/test --cov-config=.github/workflows/.coveragerc_db
   azure_synapse_test:
     # might be a bit duplication to setup both the azure_synapse test and databricks test, but for now we will keep those to accelerate the test speed
@@ -195,7 +195,7 @@ jobs:
           SQL1_PASSWORD: ${{secrets.SQL1_PASSWORD}}
         run: |
           # skip databricks related test as we just ran the test; also seperate databricks and synapse test to make sure there's no write conflict
-          # run in 4 parallel jobs to make the time shorter
+          # run in 6 parallel jobs to make the time shorter
           pytest -n 6 --cov-report term-missing --cov=feathr_project/feathr feathr_project/test --cov-config=.github/workflows/.coveragerc_sy
 
   local_spark_test:

diff --git a/.gitignore b/.gitignore
@@ -213,3 +213,6 @@ null/*
 project/.bloop
 metals.sbt
 .bsp/sbt.json
+
+# Feathr output debug folder
+**/debug/
diff --git a/docs/dev_guide/new_contributor_guide.md b/docs/dev_guide/new_contributor_guide.md
@@ -6,19 +6,19 @@ parent: Feathr Developer Guides
 
 # What can I contribute?
 All forms of contributions are welcome, including and not limited to:
-* Improve or contribute new [notebook samples](https://github.com/feathr-ai/feathr/tree/main/feathr_project/feathrcli/data/feathr_user_workspace)
+* Improve or contribute new [notebook samples](https://github.com/feathr-ai/feathr/tree/main/docs/samples)
 * Add tutorial, blog posts, tech talks etc
 * Increase media coverage and exposure
 * Improve user-facing documentation or developer-facing documentation
-* Add testing code 
+* Add testing code
 * Add new features
 * Refactor and improve architecture
 * For any other forms of contribution and collaboration, don't hesitate to reach out to us.
 
 # I am interested, how can I start?
 If you are new to this project, we recommend start with [`good-first-issue`](https://github.com/feathr-ai/feathr/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22).
 
-The issues are also labled with what types of programming language the task need. 
+The issues are also labled with what types of programming language the task need.
 * [`good-first-issue` and `Python`](https://github.com/feathr-ai/feathr/issues?q=is%3Aopen+label%3A%22good+first+issue%22+label%3Apython)
 * [`good-first-issue` and `Scala`](https://github.com/feathr-ai/feathr/issues?q=is%3Aopen+label%3A%22good+first+issue%22+label%3Ascala)
 * [`good-first-issue` and `Java`](https://github.com/feathr-ai/feathr/issues?q=is%3Aopen+label%3A%22good+first+issue%22+label%3Ajava)

diff --git a/docs/quickstart_synapse.md b/docs/quickstart_synapse.md
@@ -24,7 +24,7 @@ Feathr has native cloud integration. Here are the steps to use Feathr on Azure:
 
 1. Follow the [Feathr ARM deployment guide](https://feathr-ai.github.io/feathr/how-to-guides/azure-deployment-arm.html) to run Feathr on Azure. This allows you to quickly get started with automated deployment using Azure Resource Manager template. Alternatively, if you want to set up everything manually, you can checkout the [Feathr CLI deployment guide](https://feathr-ai.github.io/feathr/how-to-guides/azure-deployment-cli.html) to run Feathr on Azure. This allows you to understand what is going on and set up one resource at a time.
 
-2. Once the deployment is complete,run the Feathr Jupyter Notebook by clicking this button:  [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/feathr-ai/feathr/main?labpath=feathr_project%2Ffeathrcli%2Fdata%2Ffeathr_user_workspace%2Fnyc_driver_demo.ipynb). 
+2. Once the deployment is complete,run the Feathr Jupyter Notebook by clicking this button:  [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/feathr-ai/feathr/main?labpath=docs%2Fsamples%2Fnyc_taxi_demo.ipynb).
 3. You only need to change the specified `Resource Prefix`.
 
 ## Step 2: Install Feathr