Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Python bindings for Hyperspace #36

Merged
merged 36 commits into from
Aug 5, 2020
Merged

Python bindings for Hyperspace #36

merged 36 commits into from
Aug 5, 2020

Conversation

thrajput
Copy link
Contributor

@thrajput thrajput commented Jun 24, 2020

What changes were proposed in this pull request?

Adding python bindings on top of hyperspace scala APIs to be used from pyspark.

Why are the changes needed?

This will allow python or pyspark developers, to use hyperspace APIs using python wrapper classes.

Does this PR introduce any user-facing change?

Yes. This introduces python wrapper classes of hyperspace APIs like createIndex, deleteIndex, enable or disable hyperspace on spark session and so on. This will make it easier for python developer to utilize hyperspace in their spark applications. Example user experience screenshots are attached in testing section below.

How was this patch tested?

Below are few screenshots of python API utilization of hyperspace in synapse with spark 2.4.4.2.6 and python 3.6. Also ran the unit tests locally in the PR.

Creating hyperspace object
image

Creating index
image

Deleting index
image

Restore index
image

Vacuum index
image

Enable Disable Hyperspace
image

Explain API:
image

Installation instructions for user:

  • User can create hyperspace*.whl file using setup.py by below command.
    python setup.py sdist bdist_wheel
  • *.whl file can be installed using pip install like other python packages.
    pip install ..\hyperspace-0.0.1-py3-none-any.whl

@imback82
Copy link
Contributor

How will user install this extension?

@thrajput
Copy link
Contributor Author

thrajput commented Jun 24, 2020

How will user install this extension?

  • They can create hyperspace*.whl file using setup by below command.
    python setup.py sdist bdist_wheel
  • whl file can be installed using pip installed
    pip install ..\hyperspace-0.0.1-py3-none-any.whl

@imback82
Copy link
Contributor

What would it take to do something like:

pyspark --packages com.microsoft:hyperspace-core_2.11:0.1.0

or

spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .config("spark.jars.packages", "com.microsoft:hyperspace-core_2.11:0.1.0")

from hyperspace import *

@imback82 imback82 requested a review from AFFogarty June 24, 2020 05:23
@imback82 imback82 added the enhancement New feature or request label Jun 24, 2020
python/hyperspace/hyperspace.py Outdated Show resolved Hide resolved
python/hyperspace/hyperspace.py Show resolved Hide resolved
python/hyperspace/hyperspace.py Show resolved Hide resolved
python/hyperspace/hyperspace.py Outdated Show resolved Hide resolved
python/hyperspace/hyperspace.py Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/hyperspace/indexconfig.py Show resolved Hide resolved
python/hyperspace/hyperspace.py Show resolved Hide resolved
@rapoth rapoth changed the title Python bindings for hyperspace Python bindings for Hyperspace Jun 26, 2020
@rapoth
Copy link
Contributor

rapoth commented Jun 26, 2020

Can you add the necessary documentation for a user to install this extension on their own Spark clusters?

python/setup.py Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/hyperspace/hyperspace.py Outdated Show resolved Hide resolved
python/hyperspace/hyperspace.py Outdated Show resolved Hide resolved
@rapoth rapoth added this to the 0.2.0 milestone Jul 16, 2020
build.sbt Outdated
@@ -14,30 +14,45 @@
* limitations under the License.
*/

/***************************
* Spark Packages settings *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this below similar to https://github.com/delta-io/delta/blob/master/build.sbt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved thanks.


Then, run PySpark with the Hyperspace package:
```
pyspark --packages com.microsoft.hyperspace:hyperspace-core_2.11:0.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1.0 wouldn't work no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea right. need to be the new version to be released.

@@ -0,0 +1,32 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this under script directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.thanks

SPARK_HOME=$(pwd)

export PATH=$PATH:$SPARK_HOME/bin
echo "Spark Home: ${SPARK_HOME}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a new line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added new line in the end. thanks.

project/plugins.sbt Outdated Show resolved Hide resolved
build.sbt Outdated Show resolved Hide resolved
build.sbt Show resolved Hide resolved
python/hyperspace/tests/test_indexmanagement.py Outdated Show resolved Hide resolved
python/hyperspace/tests/test_indexutilization.py Outdated Show resolved Hide resolved
python/run-tests.py Outdated Show resolved Hide resolved
python/run-tests.py Outdated Show resolved Hide resolved
python/run-tests.py Outdated Show resolved Hide resolved
def run_python_style_checks(root_dir):
run_cmd([os.path.join(root_dir, "dev", "lint-python")], stream_output=True)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file has a different style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually haven't integrated this lint check in this PR? Can do it in separate PR?

python/run-tests.py Outdated Show resolved Hide resolved
script/download_spark.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (except for two minor comments), thanks @thrajput!

- task: Bash@3
inputs:
filePath: 'script/download_spark.sh'
arguments: '2.4.2 2.7'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought if we need to support multiple versions of spark this may be useful. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the script doesn't support args anyway, no? (in dotnet/spark, the script downloads all the versions needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting back as you suggested. Will follow dotnet/spark style of script. thanks

indexed_columns = self._getScalaSeqFromList(index_config.indexedColumns)
included_columns = self._getScalaSeqFromList(index_config.includedColumns)
_jindexConfig = self.jvm.com.microsoft.hyperspace.index.IndexConfig(
self.jvm.java.lang.String(index_config.indexName), indexed_columns, included_columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the right indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. thanks

@imback82 imback82 merged commit aad15bd into microsoft:master Aug 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants