Skip to content

Commit

Permalink
Signed-off-by: Greg Watson <g.watson@computer.org>
Browse files Browse the repository at this point in the history
  • Loading branch information
jarrah42 committed May 3, 2017
1 parent d2d3e80 commit d1b6064
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 0 deletions.
7 changes: 7 additions & 0 deletions assignments/advanced-13.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
layout: page
title: Advanced Python for Data Science Assignment 13
exercises: ['BigData 1', 'BigData 2', 'BigData 3']
---

{% include assignment.html %}
12 changes: 12 additions & 0 deletions exercises/BigData-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: exercise
title: BigData 1
---

The `wordcount_spark.py` program we wrote earlier find the word that is used the most times in the input text. It did this by doing a sum reduction
using the `add` operator. You job is to modify this program using a different kind of reduction in order to count the number of distinct words in
the input text.

Call your new program `distinct_spark.py` and commit it to the repository you used for Assignment 3.


13 changes: 13 additions & 0 deletions exercises/BigData-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
layout: exercise
title: BigData 2
---

We saw how to use the `SparkContext.parallelize` method to create a distributed dataset (RDD) containing all the numbers from 0 to 1,000,000. Use this
same method to create an RDD containing the numbers from 1 to 1000. The RDD class has a handy method called
[fold](https://spark.apache.org/docs/1.1.1/api/python/pyspark.rdd.RDD-class.html#fold) which aggregates all the elements of the data set
using a function that is supplied as an argument. Use this method to creat a program that
calculates the product of all the numbers from 1 to 1000 and prints the result.

Call your new program `product_spark.py` and commit it to the repository you used for Assignment 3.

12 changes: 12 additions & 0 deletions exercises/BigData-3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: exercise
title: BigData 3
---

There is nothing to stop you combining the `map` operation with the `fold` operation. You can even apply `map` more than once in order
to generate more complex mappings. For *bonus marks*, see if you can work out how to use `map` and `fold` to calculate the average of
the square root of all the numbers from 1 to 1000. i.e the sum of the square roots of all the numbers divided by 1000.

Call your new program `squareroot_spark.py` and commit it to the repository you used for Assignment 3.


0 comments on commit d1b6064

Please sign in to comment.