-
Notifications
You must be signed in to change notification settings - Fork 115
Conversation
How will user install this extension? |
|
What would it take to do something like:
or spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft:hyperspace-core_2.11:0.1.0")
from hyperspace import * |
Can you add the necessary documentation for a user to install this extension on their own Spark clusters? |
…rspace-1 into thrajput/hpcpython
build.sbt
Outdated
@@ -14,30 +14,45 @@ | |||
* limitations under the License. | |||
*/ | |||
|
|||
/*************************** | |||
* Spark Packages settings * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this below similar to https://github.com/delta-io/delta/blob/master/build.sbt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved thanks.
|
||
Then, run PySpark with the Hyperspace package: | ||
``` | ||
pyspark --packages com.microsoft.hyperspace:hyperspace-core_2.11:0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.1.0 wouldn't work no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea right. need to be the new version to be released.
download_spark.sh
Outdated
@@ -0,0 +1,32 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this under script
directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.thanks
download_spark.sh
Outdated
SPARK_HOME=$(pwd) | ||
|
||
export PATH=$PATH:$SPARK_HOME/bin | ||
echo "Spark Home: ${SPARK_HOME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a new line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added new line in the end. thanks.
python/run-tests.py
Outdated
def run_python_style_checks(root_dir): | ||
run_cmd([os.path.join(root_dir, "dev", "lint-python")], stream_output=True) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file has a different style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually haven't integrated this lint check in this PR? Can do it in separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (except for two minor comments), thanks @thrajput!
azure-pipelines.yml
Outdated
- task: Bash@3 | ||
inputs: | ||
filePath: 'script/download_spark.sh' | ||
arguments: '2.4.2 2.7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought if we need to support multiple versions of spark this may be useful. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the script doesn't support args anyway, no? (in dotnet/spark, the script downloads all the versions needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting back as you suggested. Will follow dotnet/spark style of script. thanks
python/hyperspace/hyperspace.py
Outdated
indexed_columns = self._getScalaSeqFromList(index_config.indexedColumns) | ||
included_columns = self._getScalaSeqFromList(index_config.includedColumns) | ||
_jindexConfig = self.jvm.com.microsoft.hyperspace.index.IndexConfig( | ||
self.jvm.java.lang.String(index_config.indexName), indexed_columns, included_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the right indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. thanks
What changes were proposed in this pull request?
Adding python bindings on top of hyperspace scala APIs to be used from pyspark.
Why are the changes needed?
This will allow python or pyspark developers, to use hyperspace APIs using python wrapper classes.
Does this PR introduce any user-facing change?
Yes. This introduces python wrapper classes of hyperspace APIs like createIndex, deleteIndex, enable or disable hyperspace on spark session and so on. This will make it easier for python developer to utilize hyperspace in their spark applications. Example user experience screenshots are attached in testing section below.
How was this patch tested?
Below are few screenshots of python API utilization of hyperspace in synapse with spark 2.4.4.2.6 and python 3.6. Also ran the unit tests locally in the PR.
Creating hyperspace object
Creating index
Deleting index
Restore index
Vacuum index
Enable Disable Hyperspace
Explain API:
Installation instructions for user:
python setup.py sdist bdist_wheel
pip install ..\hyperspace-0.0.1-py3-none-any.whl