Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Adding Spark34 support #2052

Merged

Conversation

KeerthiYandaOS
Copy link
Contributor

@KeerthiYandaOS KeerthiYandaOS commented Aug 22, 2023

What changes are proposed in this pull request?

  • Added the Spark34 support for the SynapseML.
  • Updated the dependencies to make them compatible with the Spark34
  • Updated the tests to use Spark34 pools for testing.

How is this patch tested?

  • Using Databricks E2E tests and the Unit tests.
  • Also created a custom VHD and ran the CST tests on the custom pool.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.

@github-actions
Copy link

Hey @KeerthiYandaOS 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2023

Codecov Report

Merging #2052 (4ef41cd) into spark3.4 (0836e40) will decrease coverage by 3.98%.
The diff coverage is 50.00%.

@@             Coverage Diff              @@
##           spark3.4    #2052      +/-   ##
============================================
- Coverage     87.00%   83.03%   -3.98%     
============================================
  Files           306      306              
  Lines         16066    16224     +158     
  Branches        847     1446     +599     
============================================
- Hits          13978    13471     -507     
- Misses         2088     2753     +665     
Files Changed Coverage Δ
...microsoft/azure/synapse/ml/codegen/PyCodegen.scala 93.61% <ø> (ø)
...a/com/microsoft/azure/synapse/ml/nn/BallTree.scala 83.17% <ø> (+0.32%) ⬆️
...spark/ml/recommendation/RecommendationHelper.scala 83.54% <0.00%> (-3.30%) ⬇️
...osoft/azure/synapse/ml/core/env/PackageUtils.scala 100.00% <100.00%> (ø)
...se/ml/exploratory/DistributionBalanceMeasure.scala 96.42% <100.00%> (+0.06%) ⬆️
.../azure/synapse/ml/io/binary/BinaryFileFormat.scala 98.86% <100.00%> (ø)
...spark/ml/source/image/PatchedImageFileFormat.scala 89.28% <100.00%> (-1.46%) ⬇️

... and 73 files with indirect coverage changes

@svotaw
Copy link
Collaborator

svotaw commented Aug 29, 2023

/azp

@azure-pipelines
Copy link

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@KeerthiYandaOS KeerthiYandaOS changed the title Keerthiyanda/spark3.4 chore: Adding Spark34 support Aug 31, 2023
BrendanWalsh
BrendanWalsh previously approved these changes Aug 31, 2023
Copy link
Collaborator

@BrendanWalsh BrendanWalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of nits but the only serious concern I have here is the commented out lines in the anomaly detection notebook.

eisber
eisber previously approved these changes Sep 1, 2023
build.sbt Show resolved Hide resolved
build.sbt Show resolved Hide resolved
@KeerthiYandaOS KeerthiYandaOS changed the base branch from master to spark3.4 September 1, 2023 16:43
@KeerthiYandaOS KeerthiYandaOS merged commit f54275f into microsoft:spark3.4 Sep 1, 2023
61 checks passed
mhamilton723 pushed a commit that referenced this pull request Sep 27, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
mhamilton723 pushed a commit that referenced this pull request Sep 27, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
mhamilton723 pushed a commit that referenced this pull request Oct 25, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
mhamilton723 pushed a commit that referenced this pull request Oct 26, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
mhamilton723 pushed a commit to mhamilton723/SynapseML that referenced this pull request Oct 30, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
mhamilton723 pushed a commit to mhamilton723/SynapseML that referenced this pull request Nov 1, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

fix isolation forest version

fix spark serving

drop synapse cluster version

chore: remove problematic streaming example

add back 3.4 in synapse tests

use dev pool

fix pool sub

fix langchain, stop notebook deletion

fixes

local repro

fix other issues

fix other issues

refactor notebook

Update docs/Explore Algorithms/Responsible AI/Explanation Dashboard.ipynb

fixes

fixes

fixes
mhamilton723 pushed a commit to mhamilton723/SynapseML that referenced this pull request Nov 1, 2023
* chore: bump to spark 3.3.1

* Adding Spark3.4 Support

* Removing comment

* Updating evn file

* Fixing build errors

* Fixing build error

* fixing style error

* Fixing scala errors

* Adding json

* Adding play

* Converting to json

* Adding scalac plugin

* Version update

* Updating scoverage version

* Adding scalariform

* Adding versionScheme

* update version

* Updating the path

* Update hadoop

* Updating breeze version

* Remove scalactic exclusion

* Adding scikit-learn

* removing versions

* Adding scikit-learn

* try fix LightGBM unit test

* Adding sklearn

* Adding exclusion

* Exclude avro

* exc sql

* Adding protobuf

* Updating horovod to 0.28.1

* Update Quickstart - Document Question and Answering with PDFs.ipynb

Add line to delete the azure search index before recreating it because "put" fails on the index if one already exists.

(cherry picked from commit 214060c)

* Fixing Databricks Runtime and Tests

* Adding Reccomendations fix

* Adding recomnd fix

* formating

* Skipping SynapseE2E and Io2

* Addressing comment

* Removed the commented code

* Adding comment

* Adding comments

* Added default values

* Delete .github/workflows/ci-tests-r.yml

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

fix isolation forest version

fix spark serving

drop synapse cluster version

chore: remove problematic streaming example

add back 3.4 in synapse tests

use dev pool

fix pool sub

fix langchain, stop notebook deletion

fixes

local repro

fix other issues

fix other issues

refactor notebook

Update docs/Explore Algorithms/Responsible AI/Explanation Dashboard.ipynb

fixes

fixes

fixes
mhamilton723 pushed a commit to mhamilton723/SynapseML that referenced this pull request Nov 1, 2023
* chore: bump to spark 3.4.1

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

fixes
mhamilton723 pushed a commit to mhamilton723/SynapseML that referenced this pull request Nov 1, 2023
* chore: bump to spark 3.4.1

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

fixes
mhamilton723 added a commit that referenced this pull request Nov 1, 2023
* chore: bump to spark 3.4.1

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

fixes

Co-authored-by: Keerthi Yanda <98137159+KeerthiYandaOS@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants