## Environment Setup

There are quite few ways to setup environment to try spark. Here comes two easy ways, to try it.

### Local Environment with Jupyter + Spylon-kernel

1. Install JDK
2. Download and install [spark](http://spark.apache.org/downloads.html)
3. Download and instal [Anaconda 3](https://www.anaconda.com/distribution/#download-section)
4. Install [Spylon-Kernel](https://github.com/Valassis-Digital-Media/spylon-kernel)
5. Start Jupyter by `jupyter notebook`

### Using [Databricks](https://databricks.com/try-databricks)

Databricks is a company founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, a distributed computing framework built atop Scala

Databricks also offer community version, which can try out Spark. 

## Value and Variable

Scala has two kinds of variables, vals and vars. A val is similar to a final variable in Java. Once initialized, a val can never be reassigned. A var, by contrast, is similar to a non-final variable in Java. A var can be reassigned throughout its lifetime. Here's a val definition:

In [2]:
val value = "Hello World"
var variable = "Hello World"
variable = "Hello World again"

value: String = Hello World
variable: String = Hello World again
variable: String = Hello World again


Scala is a static type language. In the previous example, there are no type provided. While the type will be automatically infered. We can, provide type explicitly if makes the code more clear. 

The following code has exactly same effect as previous code.

In [3]:
val value: String = "Hello World"
var variable: String = "Hello World"
variable = "Hello World again"

value: String = Hello World
variable: String = Hello World again
variable: String = Hello World again


## Primitive Types



In [7]:
val boolean_value = true
val int_value     = 3
val long_value    = 3L
val double_value  = 3.0
val float_value   = 3.0f
val string_value  = "hello"
val byte_value: Byte   = 3
val short_value: Short = 3

boolean_value: Boolean = true
int_value: Int = 3
long_value: Long = 3
double_value: Double = 3.0
float_value: Float = 3.0
string_value: String = hello
byte_value: Byte = 3
short_value: Short = 3


## Tuple



In [8]:
val tuple_2_value = (1, 2)
val tuple_3_value = (1, 2, 3)
val tuple_different_types = (1, "hello")

tuple_2_value: (Int, Int) = (1,2)
tuple_3_value: (Int, Int, Int) = (1,2,3)
tuple_different_types: (Int, String) = (1,hello)


In [22]:
for (v <- tuple_2_value.productIterator) {
    println(v)
}

1
2


## List 


In [16]:
val list_value = List(1, 2, 3)
val list_not_same_type = List(1, "Hello")

for (v <- list_value) {
    println(v)
}

1
2
3


list_value: List[Int] = List(1, 2, 3)
list_not_same_type: List[Any] = List(1, Hello)


## Map

In [14]:
val map_value = Map("name" -> "Rockie", "like" -> "Simple")

val name = map_value("name")

val keys = map_value.keys
val values = map_value.values

map_value: scala.collection.immutable.Map[String,String] = Map(name -> Rockie, like -> Simple)
name: String = Rockie
keys: Iterable[String] = Set(name, like)
values: Iterable[String] = MapLike(Rockie, Simple)


In [20]:
for ((key, value) <- map_value) {
    println(key, value)
}

(name,Rockie)
(like,Simple)


## Functions



In [29]:
def fun_no_parameter_no_value() = {
    println("hello Spark")
}
fun_no_parameter_no_value()

hello Spark


fun_no_parameter_no_value: ()Unit


2019-03-26 21:03:05 WARN  HeartbeatReceiver:66 - Removing executor driver with no recent heartbeats: 146333 ms exceeds timeout 120000 ms
2019-03-26 21:03:05 ERROR TaskSchedulerImpl:70 - Lost an executor driver (already removed): Executor heartbeat timed out after 146333 ms
2019-03-26 21:03:05 WARN  SparkContext:66 - Killing executors is not supported by current scheduler.


In [28]:
def fun_no_value(name: String) = {
    println("hello " + name)
}

fun_no_value("Spark")

hello Spark


fun_no_value: (name: String)Unit


In [27]:
def fun(name: String) = {
    "hello " + name
}

println(fun("Spark"))

hello Spark


fun: (name: String)String


In [None]:


// 相当于singleton
object HelloWorld {
    // object里的函数相当于静态函数
    // args 参数名
    // Array[String] 参数类型
    // Unit 返回类型，相当于java的void
    def main(args: Array[String]): Unit = {
        // 定义字符串常量，相当于java的
        // final String exp = "Hello World";
        val exp: String = "Hello World"
 
        // 编译器可以侦测到的类型不需要显式定义类型.
        // 所以也可以写成下面的方式
        // scala中有很多都可以省略，新做的编译器智能化多了.
        // 最后的分号也是可以省略的。
        // 不过Scala的Eclipse的插件如果没有；就会缩进。
        val imp = "Hello World"
 
        // 定义变量，相当于java的
        // String v = "Hello"
        var v = "Hello"
        v += " World"
 
        // 打印。相当于java的
        // System.out.println(v);
        println(v)
    }
}