P2: Vector-matrix multiplication #44

magsol · 2015-12-15T22:00:12Z

This Spark primitive is a little trickier than #20. This is due to the fact that the matrix will be row-distributed, but in vector-matrix multiplication, the columns of the matrix are multiplied.

Still, this can be done in a fairly straightforward manner.

As in P1, broadcast the array u to be multiplied, e.g. sc.broadcast(u).
Run a flatMap over the RDD.
Each flatMap worker multiply its row of the matrix with the corresponding element of the broadcasted vector u.
Each value of the resulting vector will be outputted, keyed by its element index (hence the need for flatMap instead of map).
A reduceByKey will then sum up the values for each key, which correspond to the elements of the resulting vector u.

The text was updated successfully, but these errors were encountered:

…, vector-matrix multiplication), and P4 (#46, deflation). Not yet tested, but this is what an initial proof of concept looks like.

MOJTABAFA · 2015-12-24T19:19:44Z

@magsol
Now I'm going to test the pyspark file in thunder , but during the execution the following error is appeared :

File "/home/targol/anaconda2/lib/python2.7/R1DL_Pyspark.py", line 216, in <module>
    S = S.apply(deflate, keepDType = True, keepIndex = True)
TypeError: apply() got an unexpected keyword argument 'keepDType'

the log file is as follows :
testreport.txt

magsol · 2015-12-24T19:25:04Z

Can you figure out what it means?

iPhone'd

On Dec 24, 2015, at 14:19, MOJTABAFA notifications@github.com wrote:

@magsol
Now I'm going to test the pyspark file in thunder , but during the execution the following error is appeared :

File "/home/targol/anaconda2/lib/python2.7/R1DL_Pyspark.py", line 216, in
S = S.apply(deflate, keepDType = True, keepIndex = True)
TypeError: apply() got an unexpected keyword argument 'keepDType'
the log file is as follows :
testreport.txt

—
Reply to this email directly or view it on GitHub.

MOJTABAFA · 2015-12-24T20:27:40Z

@magsol
is it in deflate function ? when we're calling it in S.apply() ,shouldn't we pass the "raw" to this function ?

magsol · 2015-12-24T21:45:20Z

No. Read the error message specifically. It's complaining about about unrecognized parameter names. Check the thunder documentation and see if you can figure out how to fix it.

iPhone'd

On Dec 24, 2015, at 15:27, MOJTABAFA notifications@github.com wrote:

is it in deflate function ? when we're calling it shouldn't we pass the "raw" to this function ?

—
Reply to this email directly or view it on GitHub.

MOJTABAFA · 2015-12-24T21:58:09Z

Ok, let me check it.

MOJTABAFA · 2015-12-24T23:46:54Z

Actually the problem was just on spelling ! In 'KeepDType' T must be change into small 't' ! I'll correct it in main file.

MOJTABAFA · 2015-12-25T00:07:22Z

Now again I've tested the code on small test1 pattern , there z is much better than our previous z file ! now the z.txt is a sparse matrix , I'm going to test the bigger data set , the results of first small data set is as follows :
Z.txt

magsol · 2015-12-25T20:19:26Z

Excellent work!

MOJTABAFA · 2015-12-26T00:41:12Z

@magsol
After a long time execution in my lap top the z matrix of big data is now around 40MB, so it's not possible to put it here. However, the results are not similar to our small data sets, where the small data set answers were totally satisfactory, the big data result is suspicious. the parameters were : m=100, n=0.07 , e =0.01 and I didn't determine the Row and Col , some part of result is as follows :

-49.625731  -90.950085  -107.148851 -22.263390  -27.960949  -74.573206  -35.491131  -1.820312   106.864215  14.072931   171.595561  -93.018838  -3.851441   281.873055  212.104157  375.934794  -69.247916  -79.771974  -27.565432  335.760112  330.057942  255.645971  129.561707  23.689732   39.457266   338.431347  358.045253  -16.198390  211.919775  120.124855  66.542751   282.075863  378.395402  -94.307979  -2.779630   -11.584412  185.832728  279.141163  101.102970  -99.788754  -82.138987  99.249246   175.284746  101.319492  -94.943044  -29.128951  26.582609   -22.439812  16.184655   -30.774730  -42.659585  -28.481978  -76.469311  -137.889147 -69.109695  -74.959590  -93.705282  -121.603436 -149.070855 -55.650968  4.239743    -17.991413  -64.647887  -55.436329  -55.543341  -233.434969 -226.427454 -73.695304  -141.986671 -140.047461 -242.440411 -280.187721 -196.235706 89.043456   -22.907281  -11.296129  -80.976172  -138.241792 -352.324480 -125.427455 43.500121   -186.793748 -112.535951 -205.595161 -278.406738 -371.797682 -80.563537  48.026023   287.180729  178.378065  121.456420  87.679904   -109.481793 -114.439424 11.187516   282.435522  -78.271834  -78.662650  -222.487548 -393.253565

magsol · 2015-12-26T17:44:39Z

After a long time execution in my lap top the z matrix of big data is now around 40MB, so it's not possible to put it there.

I'm not sure what that means.

However, the results are not similar to our small data sets, where the small data set answers were totally satisfactory, the big data results is suspicious. the parameters were : m=100, n=0.07 , e =0.01 and I didn't determine the Row and Col , some part of result is as follows :

How does it compare to what we see in the milestone 2 output?

MOJTABAFA · 2015-12-26T19:08:59Z

@magsol

The answer for that 2 questions :

The Z file size is around 40MB , So I cannot drag and drop it here ( the maximum acceptable size for a repository ticket is around 10 MB ).
. in milestone 2 , the z output values for both the small and big data samples were similar in value of each element . However, the dimensions were different . Now, in milestone 3 , the answer for small test sets are totally better than milestone 2 answers and extremely close to what xiang mentioned as Ground Z answers.But in Big data, However the Z dimensions are the same as milestone 2 answers, but the element values were different, However, It maybe because of some resource problems in my laptop or other reasons , But As I told you before my laptop is not suitable for testing now and it will take alot of time here. Anyway, I try to test it again and will let you know. .

magsol · 2015-12-28T00:12:49Z

If the quality changes with the size (i.e. the results are better with
small data than large data) it may be a resource issue, although that still
seems odd as Spark is a deterministic framework; quality of the results
shouldn't degrade with data volume.

Still, we need more testing. I'm on the road again tomorrow, but almost
have a spark cluster ready at UGA. Hopefully in the next day or two. In the
meantime, I'll run this on my office desktop; it has 32GB memory and 8
cores, so it should scale reasonably well.

If you and Xiang could start working on unit tests, that would be great.
Small ones are fine for now--take a fraction of the input we have, get an
expected output, and then have the program run it and test if the two
outputs are equal within a certain tolerance (e.g. 6 decimal points).
On Sat, Dec 26, 2015 at 14:09 MOJTABAFA notifications@github.com wrote:

@magsol https://github.com/MAGSOL
1.The Z file size is around 40MB , So I cannot drag and drop it here ( the
maximum acceptable size for a repository ticket is around 10 MB ).

in milestone 2 , the z output values for both the small and big
data samples were similar in value of each element . However, the
dimensions were different . Now, in milestone 3 , the answer for small test
sets are totally better than milestone 2 answers and extremely close to
what xiang mentioned as Ground Z answers.But in Big data, However the Z
dimensions are the same as milestone 2 answers, but the element values were
different, However, It maybe because of some resource problems in my laptop
or other reasons , But As I told you before my laptop is not suitable for
testing now and it will take alot of time here. Anyway, I try to test it
again and will let you know. .

—
Reply to this email directly or view it on GitHub
#44 (comment)
.

iPhone'd

MOJTABAFA · 2015-12-28T19:44:40Z

@magsol
I already talked with Milad, tomorrow I'll go to university and try to check the code in lab server.Moreover, I checked the big data file and there is one point I wanted to ask your idea about that :
The small file was tall and tiny with dimensions of (100,5), However the Big data file is short and fatty with (170,39850) dimensions( I mean the number of rows are much smaller than number of columns). could it be a reason for uncertain results?As I read in a paper that spark answers in matrix multiplications are always better in tall and tiny matrices.

magsol · 2015-12-29T16:52:46Z

Hmm, that's a good question. However the fact that the very nature of the
data has changed has me a little worried. I thought, in general, the number
of rows (data points) would far exceed the number of columns (features)? It
seems like, in these two datasets, they have roughly the same number of
data points (100 vs 170) but hugely differing dimensions. Is that truly the
case, or have the data been accidentally transposed?

On Mon, Dec 28, 2015 at 2:44 PM MOJTABAFA notifications@github.com wrote:

@magsol https://github.com/magsol
I already talked with Milad, tomorrow I'll go to university and try to
check the code in lab server.Moreover, I checked the big data file and
there is one point I wanted to ask your idea about that :
The small file was tall and tiny with dimensions of (100,5), However the
Big data file is short and fatty with (170,39850) dimensions. could it be a
reason for uncertain results?As I read in a paper that spark answers in
matrix multiplications are always better in tall and tiny matrices.

—
Reply to this email directly or view it on GitHub
#44 (comment)
.

iPhone'd

MOJTABAFA · 2015-12-30T21:00:27Z

@magsol
Actually I don't know why but by using the transposed data the following error is appeared :

  File "/home/targol/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 2089, in 
<genexpr>
  File "/home/targol/anaconda2/lib/python2.7/R1DL_Pyspark.py", line 26, in <lambda>
    .map(lambda x: np.array(map(float, x.strip().split("\t")))) \
ValueError: could not convert string to float:

Moreover, It's really difficult and time consuming for me to test with my laptop because of lack of resources.

magsol · 2015-12-30T21:46:55Z

It looks like there's a non-float character that we're trying to cast to a float, e.g. float("?") or something like that.

Nonetheless, I hear you loud and clear. I'm sorry I haven't had time to finish setting up my cluster, but that's still in progress. I should have some news for you today or tomorrow.

magsol added the todo label Dec 15, 2015

magsol assigned MOJTABAFA Dec 15, 2015

magsol added this to the Milestone 3: Spark Prototype milestone Dec 15, 2015

magsol added a commit that referenced this issue Dec 22, 2015

Huge commit. Implements P1 (#20, matrix-vector multiplication), P2 (#44…

3064250

…, vector-matrix multiplication), and P4 (#46, deflation). Not yet tested, but this is what an initial proof of concept looks like.

magsol closed this as completed Jan 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P2: Vector-matrix multiplication #44

P2: Vector-matrix multiplication #44

magsol commented Dec 15, 2015

MOJTABAFA commented Dec 24, 2015

magsol commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

magsol commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

MOJTABAFA commented Dec 25, 2015

magsol commented Dec 25, 2015

MOJTABAFA commented Dec 26, 2015

magsol commented Dec 26, 2015

MOJTABAFA commented Dec 26, 2015

magsol commented Dec 28, 2015

MOJTABAFA commented Dec 28, 2015

magsol commented Dec 29, 2015

MOJTABAFA commented Dec 30, 2015

magsol commented Dec 30, 2015

P2: Vector-matrix multiplication #44

P2: Vector-matrix multiplication #44

Comments

magsol commented Dec 15, 2015

MOJTABAFA commented Dec 24, 2015

magsol commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

magsol commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

MOJTABAFA commented Dec 24, 2015

MOJTABAFA commented Dec 25, 2015

magsol commented Dec 25, 2015

MOJTABAFA commented Dec 26, 2015

magsol commented Dec 26, 2015

MOJTABAFA commented Dec 26, 2015

magsol commented Dec 28, 2015

MOJTABAFA commented Dec 28, 2015

magsol commented Dec 29, 2015

MOJTABAFA commented Dec 30, 2015

magsol commented Dec 30, 2015