-
Notifications
You must be signed in to change notification settings - Fork 114
Conversation
examples/build.sbt
Outdated
|
||
javaOptions += "-Xmx1024m" | ||
|
||
publishMavenStyle := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's end with a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I have added one before pushing; however I dont see it here. I am not sure if it gets removed automatically or sth else is happening.
examples/src/main/scala/App.scala
Outdated
object App { | ||
def main(args: Array[String]): Unit = { | ||
// Create Spark session | ||
val sparkConf = new SparkConf().setMaster("local[*]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be local? What if I try to run this app on a cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added Local to make the code runnable right after checkout. However, I looked at how Spark does it for its examples and followed the same pattern. As an example, check here.
examples/src/main/scala/App.scala
Outdated
// Create Spark session | ||
val sparkConf = new SparkConf().setMaster("local[*]") | ||
val spark = SparkSession.builder.config(sparkConf).getOrCreate() | ||
spark.sparkContext.setLogLevel("ERROR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why error loglevel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it to make the output less noisy with Info logs. Removed it.
examples/build.sbt
Outdated
@@ -0,0 +1,22 @@ | |||
name := "hyperspaceApp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the whole app to examples/scala/...
so that we can put the C# example in examples/csharp/...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few minor comments, but generally looks good to me.
examples/scala/build.sbt
Outdated
javaOptions += "-Xmx1024m" | ||
|
||
publishMavenStyle := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed them.
val spark = SparkSession | ||
.builder() | ||
.appName("Hyperspace example") | ||
.config("spark.some.config.option", "some-value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which line do you exactly mean?
We need to create a SparkSession and instead of using local mode I switched to this way as Spark examples use this style. As an example, please look at here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, the line right above the comment: config
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it; yes I guess it is a good idea to keep this line to highlight the fact that user can change this to set master and create the Spark session in his desired mode, or add any other config override. But if you think it should be removed, I will drop it.
hyperspace.createIndex(deptDF, deptIndexConfig) | ||
hyperspace.createIndex(empDF, empIndexConfig) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
// Create Hyperspace indexes | ||
val hyperspace = new Hyperspace(spark) | ||
|
||
val deptDF: DataFrame = spark.read.parquet(deptLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we want types for local variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped type decelerations for local ones.
.join(deptDF, empDF("deptId") === deptDF("deptId")) | ||
.select(empDF("empName"), deptDF("deptName")) | ||
eqJoin.show() | ||
hyperspace.explain(eqJoin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add spark.stop
to be explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it. Thnx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few small comments.
// Save example data records as Parquet | ||
import spark.implicits._ | ||
val deptLocation = "departments" | ||
val empLocation = "employees" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this down to right above where it is first used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
eqFilter.show() | ||
hyperspace.explain(eqFilter) | ||
|
||
// Example of index usage for join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's end single-sentence comments with a .
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val deptDF = spark.read.parquet(deptLocation) | ||
val empDF = spark.read.parquet(empLocation) | ||
|
||
val deptIndexConfig = IndexConfig("deptIndex", Seq("deptId"), Seq("deptName")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if these configs could just be inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to keep it separate to improve readability as the main purpose of this App is helping newbie users of Hyperspace. We have followed similar separation of configs and creation step in our tutorial Notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one nit comment.
|
||
// Example of index usage for join. | ||
val eqJoin = empDF | ||
.join(deptDF, empDF("deptId") === deptDF("deptId")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't auto-format give you two space indentations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it is fixed.
What changes were proposed in this pull request?
This change adds a sample Scala application to show how Hyperspace can be used as a library by other applications.
It addresses issue #79 .
Why are the changes needed?
This change demonstrates how an application can use Hyperspace by adding a dependency to a project and import Hyperspace in application's code. This example helps users understand the details by showing required changes to import Hyperspace and use it to create and leverage indexes on sample data.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This change consists of an standalone example application and is verified by running it manually and validating the output and its behavior.