Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch05 - potentially unexpected result from sample code #33

Closed
dolaameng opened this issue Jul 17, 2015 · 4 comments
Closed

ch05 - potentially unexpected result from sample code #33

dolaameng opened this issue Jul 17, 2015 · 4 comments
Assignees
Labels
Milestone

Comments

@dolaameng
Copy link

On page 92 in calculating sumSquares, the code

val sumSquares = dataAsArray.fold(
  new Array[Double](numCols)
)(
  (a,b) => a.zip(b).map(t => t._1 + t._2 * t._2)
)

As the RDD.fold requires operator to be communicative, which was violated by asymmetry in the map() function, the result might be different for different number of partitions in RDD.

@srowen
Copy link
Collaborator

srowen commented Jul 17, 2015

I think you're right; this should really be aggregate? that is it needs to handle adding together the current sum of squares, and the square of a new value, differently from summing partial sums of squares?

    val sumSquares = dataAsArray.aggregate(
        new Array[Double](numCols)
      )(
        (a, b) => a.zip(b).map(t => t._1 + t._2 * t._2),
        (a, b) => a.zip(b).map(t => t._1 + t._2)
      )

@srowen srowen self-assigned this Jul 17, 2015
@srowen srowen added this to the 1.0.1 milestone Jul 17, 2015
@srowen srowen added the bug label Jul 17, 2015
@srowen
Copy link
Collaborator

srowen commented Jul 17, 2015

Resolved in source and filed as an erratum.

@srowen srowen closed this as completed Jul 17, 2015
@dolaameng
Copy link
Author

Another way might be just using stats(), even though I think its a little bit slower. The book is a great work. Thanks!

@srowen
Copy link
Collaborator

srowen commented Jul 18, 2015

Fair enough, yes; I kind of wanted to show computing these things directly in an FP-oriented way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants