<img src="img/logocs.jpeg" width="200" align="left">
<img src="img/logops.jpg" width="200" align="right">

# <center>Introduction to Resilient Distributed Property Graph</center>

<img src="http://spark.apache.org/docs/latest/img/graphx_logo.png" width=300/>
#### Family Name: 
#### First Name: 


## Exploring GraphX
### Apache Spark's API for  graph-parallel processing

The purpose of this lab is to learn about the GraphX library  to build a simple multi directed graph with Scala and to explore some Graph class methods. 

First we to  import the following libraries:

- org.apache.spark._ 
- org.apache.spark.graphx._
- org.apache.spark.rdd.RDD 

In [3]:
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD


Now  we  first create the vertices and egdes of our graph as  <code>facebook_vertices</code> and  <code>facebook_edges</code> using <code>Array</code> variables.

In [4]:
val facebook_vertices = Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Andrew Smith", "Person")), (4L, ("Iron Man Fan Page", "Page")), (5L, ("Captain America Fan Page", "Page")))
val facebook_edges = Array(Edge(1L, 2L, "Friends"), Edge(1L, 3L, "Friends"), Edge(2L, 4L, "Follower"), Edge(2L, 5L, "Follower"), Edge(3L, 5L, "Follower"))


facebook_vertices: Array[(Long, (String, String))] = Array((1,(Billy Bill,Person)), (2,(Jacob Johnson,Person)), (3,(Andrew Smith,Person)), (4,(Iron Man Fan Page,Page)), (5,(Captain America Fan Page,Page)))
facebook_edges: Array[org.apache.spark.graphx.Edge[String]] = Array(Edge(1,2,Friends), Edge(1,3,Friends), Edge(2,4,Follower), Edge(2,5,Follower), Edge(3,5,Follower))


### A summary list of Graph class operators

In [5]:
class Graph[VD, ED] {
  // Information about the Graph ===================================================================
  val numEdges: Long
  val numVertices: Long
  val inDegrees: VertexRDD[Int]
  val outDegrees: VertexRDD[Int]
  val degrees: VertexRDD[Int]
  // Views of the graph as collections =============================================================
  val vertices: VertexRDD[VD]
  val edges: EdgeRDD[ED]
  val triplets: RDD[EdgeTriplet[VD, ED]]
  // Functions for caching graphs ==================================================================
  def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]
  def cache(): Graph[VD, ED]
  def unpersistVertices(blocking: Boolean = false): Graph[VD, ED]
  // Change the partitioning heuristic  ============================================================
  def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]
  // Transform vertex and edge attributes ==========================================================
  def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
  def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
  def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2]
  def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
  def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2])
    : Graph[VD, ED2]
  // Modify the graph structure ====================================================================
  def reverse: Graph[VD, ED]
  def subgraph(
      epred: EdgeTriplet[VD,ED] => Boolean = (x => true),
      vpred: (VertexId, VD) => Boolean = ((v, d) => true))
    : Graph[VD, ED]
  def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
  def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]
  // Join RDDs with the graph ======================================================================
  def joinVertices[U](table: RDD[(VertexId, U)])(mapFunc: (VertexId, VD, U) => VD): Graph[VD, ED]
  def outerJoinVertices[U, VD2](other: RDD[(VertexId, U)])
      (mapFunc: (VertexId, VD, Option[U]) => VD2)
    : Graph[VD2, ED]
  // Aggregate information about adjacent triplets =================================================
  def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexId]]
  def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexId, VD)]]
  def aggregateMessages[Msg: ClassTag](
      sendMsg: EdgeContext[VD, ED, Msg] => Unit,
      mergeMsg: (Msg, Msg) => Msg,
      tripletFields: TripletFields = TripletFields.All)
    : VertexRDD[A]
  // Iterative graph-parallel computation ==========================================================
  def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)(
      vprog: (VertexId, VD, A) => VD,
      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
      mergeMsg: (A, A) => A)
    : Graph[VD, ED]
  // Basic graph algorithms ========================================================================
  def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]
  def connectedComponents(): Graph[VertexId, ED]
  def triangleCount(): Graph[Int, ED]
  def stronglyConnectedComponents(numIter: Int): Graph[VertexId, ED]
}

<console>: 28: error: not found: type StorageLevel

### Question 1:
Now, we need to create the object Graph. Create RDD vertices <code>facebook_RDD_vertices</code> and edges <code>facebook_RDD_edges</code> to be able to build the object Graph. Define a <code>default_user</code> user which will be defaulty connected to any edge with missing vertex.

As a reminder, we have a SparkContext called <code>sc</code>. What happens when <code>sc</code> is used?

In [12]:
//To DO
val facebook_RDD_vertices = sc.parallelize(facebook_vertices)
val facebook_RDD_edges = sc.parallelize(facebook_edges)
val default_user = ("Default User","Missing")

val myFacebookGraph = Graph(facebook_RDD_vertices, facebook_RDD_edges,default_user)

facebook_RDD_vertices: org.apache.spark.rdd.RDD[(Long, (String, String))] = ParallelCollectionRDD[4] at parallelize at <console>:43
facebook_RDD_edges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[String]] = ParallelCollectionRDD[5] at parallelize at <console>:44
default_user: (String, String) = (Default User,Missing)
myFacebookGraph: org.apache.spark.graphx.Graph[(String, String),String] = org.apache.spark.graphx.impl.GraphImpl@27433583


In [20]:
myFacebookGraph.vertices.toDF.show()

+---+--------------------+
| _1|                  _2|
+---+--------------------+
|  1|[Billy Bill, Person]|
|  2|[Jacob Johnson, P...|
|  3|[Andrew Smith, Pe...|
|  4|[Iron Man Fan Pag...|
|  5|[Captain America ...|
+---+--------------------+



In [21]:
myFacebookGraph.edges.toDF.show()

+-----+-----+--------+
|srcId|dstId|    attr|
+-----+-----+--------+
|    1|    2| Friends|
|    1|    3| Friends|
|    2|    4|Follower|
|    2|    5|Follower|
|    3|    5|Follower|
+-----+-----+--------+



In [25]:
myFacebookGraph.triplets.collect

res15: Array[org.apache.spark.graphx.EdgeTriplet[(String, String),String]] = Array(((1,(Billy Bill,Person)),(2,(Jacob Johnson,Person)),Friends), ((1,(Billy Bill,Person)),(3,(Andrew Smith,Person)),Friends), ((2,(Jacob Johnson,Person)),(4,(Iron Man Fan Page,Page)),Follower), ((2,(Jacob Johnson,Person)),(5,(Captain America Fan Page,Page)),Follower), ((3,(Andrew Smith,Person)),(5,(Captain America Fan Page,Page)),Follower))


Here's a visual representation to show what the graph should look like:

<img src = "img/rhkiopM.png">

### Question 2:
Now, get information about the Graph and vertices, and diffrent  views using vertices, edges and triplets methods. Compute the maximum and the minimum out and in degrees.

In [31]:
//To Do
// Define a reduce operation to compute the highest degree vertex
def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
  if (a._2 > b._2) a else b
}

def min(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
    if (a._2 > b._2) b else a
}

// Compute the max degrees
val maxInDegree: (VertexId, Int)  = myFacebookGraph.inDegrees.reduce(max)
val maxOutDegree: (VertexId, Int) = myFacebookGraph.outDegrees.reduce(max)

val minInDegree: (VertexId, Int)  = myFacebookGraph.inDegrees.reduce(min)
val minOutDegree: (VertexId, Int)  = myFacebookGraph.inDegrees.reduce(min)

max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))(org.apache.spark.graphx.VertexId, Int)
min: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))(org.apache.spark.graphx.VertexId, Int)
maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (5,2)
maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (2,2)


In [37]:
myFacebookGraph.vertices.toDF.show(20,false)

+---+--------------------------------+
|_1 |_2                              |
+---+--------------------------------+
|1  |[Billy Bill, Person]            |
|2  |[Jacob Johnson, Person]         |
|3  |[Andrew Smith, Person]          |
|4  |[Iron Man Fan Page, Page]       |
|5  |[Captain America Fan Page, Page]|
+---+--------------------------------+



In [38]:
myFacebookGraph.edges.toDF.show(20,false)

+-----+-----+--------+
|srcId|dstId|attr    |
+-----+-----+--------+
|1    |2    |Friends |
|1    |3    |Friends |
|2    |4    |Follower|
|2    |5    |Follower|
|3    |5    |Follower|
+-----+-----+--------+



### Question 3:

Use the filter function to find persons  who follow the "Captain America Fan Page".

In [47]:
myFacebookGraph.edges.filter{
  case Edge(src, dst, prop ) => dst == 5 && prop=="Follower"
}.toDF.show()’

+-----+-----+--------+
|srcId|dstId|    attr|
+-----+-----+--------+
|    2|    5|Follower|
|    3|    5|Follower|
+-----+-----+--------+



However, there is an easier way to create views using filter on triplets. 

In [53]:
myFacebookGraph.triplets.filter(t => t.dstId==5 && t.attr=="Follower").collect

res33: Array[org.apache.spark.graphx.EdgeTriplet[(String, String),String]] = Array(((2,(Jacob Johnson,Person)),(5,(Captain America Fan Page,Page)),Follower), ((3,(Andrew Smith,Person)),(5,(Captain America Fan Page,Page)),Follower))


### Question 4:
Transform vertex and edge attributes using mapVertices,  mapEdges or  mapTriplets methods. For instance, convert edge attributes to friendof, followerof, and include user or page popularity to graph (popularity could be defined as the number of friends or followers).

In [70]:
import shapeless.syntax.std.tuple._


import shapeless.syntax.std.tuple._


In [75]:
//To Do
val newGraph = myFacebookGraph.mapVertices((id, attr) => (id, attr:+("hey")))


newGraph: org.apache.spark.graphx.Graph[(org.apache.spark.graphx.VertexId, (String, String, String)),String] = org.apache.spark.graphx.impl.GraphImpl@18c69a54


In [76]:
newGraph.vertices.toDF.show(false)

+---+------------------------------------------+
|_1 |_2                                        |
+---+------------------------------------------+
|1  |[1, [Billy Bill, Person, hey]]            |
|2  |[2, [Jacob Johnson, Person, hey]]         |
|3  |[3, [Andrew Smith, Person, hey]]          |
|4  |[4, [Iron Man Fan Page, Page, hey]]       |
|5  |[5, [Captain America Fan Page, Page, hey]]|
+---+------------------------------------------+



### Question 5
Modify the graph structure using join methods. Create another graph to be merged with the above graph.

In [82]:
//To Do
val newGraph = myFacebookGraph.mapVertices((id, attr) => (_*,"efv"))


<console>: 44: error: missing parameter type for expanded function ((x$1: <error>) => x$1.$times)

In [79]:
newGraph.vertices.toDF.show(false)

+---+---------------------------------------------------------+
|_1 |_2                                                       |
+---+---------------------------------------------------------+
|1  |[Billy Bill, Billy Bill, efv]                            |
|2  |[Jacob Johnson, Jacob Johnson, efv]                      |
|3  |[Andrew Smith, Andrew Smith, efv]                        |
|4  |[Iron Man Fan Page, Iron Man Fan Page, efv]              |
|5  |[Captain America Fan Page, Captain America Fan Page, efv]|
+---+---------------------------------------------------------+



In [84]:
newGraph.triplets.collect

res46: Array[org.apache.spark.graphx.EdgeTriplet[(String, String, String),String]] = Array(((1,(Billy Bill,Billy Bill,efv)),(2,(Jacob Johnson,Jacob Johnson,efv)),Friends), ((1,(Billy Bill,Billy Bill,efv)),(3,(Andrew Smith,Andrew Smith,efv)),Friends), ((2,(Jacob Johnson,Jacob Johnson,efv)),(4,(Iron Man Fan Page,Iron Man Fan Page,efv)),Follower), ((2,(Jacob Johnson,Jacob Johnson,efv)),(5,(Captain America Fan Page,Captain America Fan Page,efv)),Follower), ((3,(Andrew Smith,Andrew Smith,efv)),(5,(Captain America Fan Page,Captain America Fan Page,efv)),Follower))
