Signed-off-by: Greg Watson <g.watson@computer.org>

nyu-cds · May 3, 2017 · d1b6064 · d1b6064
1 parent d2d3e80
commit d1b6064
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 0 deletions.
diff --git a/assignments/advanced-13.md b/assignments/advanced-13.md
@@ -0,0 +1,7 @@
+---
+layout: page
+title: Advanced Python for Data Science Assignment 13
+exercises: ['BigData 1', 'BigData 2', 'BigData 3']
+---
+
+{% include assignment.html %}
diff --git a/exercises/BigData-1.md b/exercises/BigData-1.md
@@ -0,0 +1,12 @@
+---
+layout: exercise
+title: BigData 1
+---
+
+The `wordcount_spark.py` program we wrote earlier find the word that is used the most times in the input text. It did this by doing a sum reduction
+using the `add` operator. You job is to modify this program using a different kind of reduction in order to count the number of distinct words in 
+the input text.
+
+Call your new program `distinct_spark.py` and commit it to the repository you used for Assignment 3.
+
+
diff --git a/exercises/BigData-2.md b/exercises/BigData-2.md
@@ -0,0 +1,13 @@
+---
+layout: exercise
+title: BigData 2
+---
+
+We saw how to use the `SparkContext.parallelize` method to create a distributed dataset (RDD) containing all the numbers from 0 to 1,000,000. Use this
+same method to create an RDD containing the numbers from 1 to 1000. The RDD class has a handy method called 
+[fold](https://spark.apache.org/docs/1.1.1/api/python/pyspark.rdd.RDD-class.html#fold) which aggregates all the elements of the data set 
+using a function that is supplied as an argument. Use this method to creat a program that
+calculates the product of all the numbers from 1 to 1000 and prints the result.
+
+Call your new program `product_spark.py` and commit it to the repository you used for Assignment 3.
+
diff --git a/exercises/BigData-3.md b/exercises/BigData-3.md
@@ -0,0 +1,12 @@
+---
+layout: exercise
+title: BigData 3
+---
+
+There is nothing to stop you combining the `map` operation with the `fold` operation. You can even apply `map` more than once in order
+to generate more complex mappings. For *bonus marks*, see if you can work out how to use `map` and `fold` to calculate the average of
+the square root of all the numbers from 1 to 1000. i.e the sum of the square roots of all the numbers divided by 1000.
+
+Call your new program `squareroot_spark.py` and commit it to the repository you used for Assignment 3.
+
+