Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Hyperspace "HelloWorld" application #90

Merged
merged 33 commits into from
Jul 21, 2020
Merged

Hyperspace "HelloWorld" application #90

merged 33 commits into from
Jul 21, 2020

Conversation

pirz
Copy link
Contributor

@pirz pirz commented Jul 14, 2020

What changes were proposed in this pull request?

This change adds a sample Scala application to show how Hyperspace can be used as a library by other applications.
It addresses issue #79 .

Why are the changes needed?

This change demonstrates how an application can use Hyperspace by adding a dependency to a project and import Hyperspace in application's code. This example helps users understand the details by showing required changes to import Hyperspace and use it to create and leverage indexes on sample data.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This change consists of an standalone example application and is verified by running it manually and validating the output and its behavior.

@pirz pirz self-assigned this Jul 14, 2020
@rapoth rapoth added this to the 0.2.0 milestone Jul 14, 2020

javaOptions += "-Xmx1024m"

publishMavenStyle := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's end with a newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I have added one before pushing; however I dont see it here. I am not sure if it gets removed automatically or sth else is happening.

object App {
def main(args: Array[String]): Unit = {
// Create Spark session
val sparkConf = new SparkConf().setMaster("local[*]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be local? What if I try to run this app on a cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added Local to make the code runnable right after checkout. However, I looked at how Spark does it for its examples and followed the same pattern. As an example, check here.

// Create Spark session
val sparkConf = new SparkConf().setMaster("local[*]")
val spark = SparkSession.builder.config(sparkConf).getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why error loglevel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it to make the output less noisy with Info logs. Removed it.

examples/src/main/scala/App.scala Outdated Show resolved Hide resolved
@@ -0,0 +1,22 @@
name := "hyperspaceApp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the whole app to examples/scala/... so that we can put the C# example in examples/csharp/...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few minor comments, but generally looks good to me.

Comment on lines 20 to 22
javaOptions += "-Xmx1024m"

publishMavenStyle := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed them.

val spark = SparkSession
.builder()
.appName("Hyperspace example")
.config("spark.some.config.option", "some-value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line do you exactly mean?
We need to create a SparkSession and instead of using local mode I switched to this way as Spark examples use this style. As an example, please look at here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, the line right above the comment: config in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; yes I guess it is a good idea to keep this line to highlight the fact that user can change this to set master and create the Spark session in his desired mode, or add any other config override. But if you think it should be removed, I will drop it.

hyperspace.createIndex(deptDF, deptIndexConfig)
hyperspace.createIndex(empDF, empIndexConfig)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

// Create Hyperspace indexes
val hyperspace = new Hyperspace(spark)

val deptDF: DataFrame = spark.read.parquet(deptLocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want types for local variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped type decelerations for local ones.

.join(deptDF, empDF("deptId") === deptDF("deptId"))
.select(empDF("empName"), deptDF("deptName"))
eqJoin.show()
hyperspace.explain(eqJoin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spark.stop to be explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it. Thnx!

Copy link
Contributor

@AFFogarty AFFogarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few small comments.

// Save example data records as Parquet
import spark.implicits._
val deptLocation = "departments"
val empLocation = "employees"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down to right above where it is first used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

eqFilter.show()
hyperspace.explain(eqFilter)

// Example of index usage for join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's end single-sentence comments with a ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

val deptDF = spark.read.parquet(deptLocation)
val empDF = spark.read.parquet(empLocation)

val deptIndexConfig = IndexConfig("deptIndex", Seq("deptId"), Seq("deptName"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if these configs could just be inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to keep it separate to improve readability as the main purpose of this App is helping newbie users of Hyperspace. We have followed similar separation of configs and creation step in our tutorial Notebook.

AFFogarty
AFFogarty previously approved these changes Jul 21, 2020
Copy link
Contributor

@AFFogarty AFFogarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

imback82
imback82 previously approved these changes Jul 21, 2020
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one nit comment.


// Example of index usage for join.
val eqJoin = empDF
.join(deptDF, empDF("deptId") === deptDF("deptId"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't auto-format give you two space indentations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is fixed.

@pirz pirz dismissed stale reviews from imback82 and AFFogarty via 4fd12b9 July 21, 2020 23:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants