In [17]:
from graphframes import *


Create the vertices of the graph. Each vertex needs a unique ID, placed in a column named `id`. Additional arbitrary columns can be added.

In [18]:
vertices = spark.createDataFrame([
  ("a", "Alice", 34, "female"),
  ("b", "Bob", 36, "male"),
  ("c", "Charlie", 30, "male"),
  ("d", "David", 29, "male"),
  ("e", "Esther", 32, "female"),
  ("f", "Fanny", 36, "female"),
  ("g", "Gabby", 60, "female")], 
    ["id", "name", "age", "gender"])

Create the edges of the graph. Each edge needs to have a `src` and `dst` which refer to vertex IDs. Additional arbitrary columns can be added.

In [19]:
edges = spark.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")], 
    ["src", "dst", "relationship"])

Create the graph from edges and vertices.

In [20]:
g = GraphFrame(vertices, edges)

g.vertices.show()
g.edges.show()


+---+-------+---+------+
| id|   name|age|gender|
+---+-------+---+------+
|  a|  Alice| 34|female|
|  b|    Bob| 36|  male|
|  c|Charlie| 30|  male|
|  d|  David| 29|  male|
|  e| Esther| 32|female|
|  f|  Fanny| 36|female|
|  g|  Gabby| 60|female|
+---+-------+---+------+

+---+---+------------+
|src|dst|relationship|
+---+---+------------+
|  a|  b|      friend|
|  b|  c|      follow|
|  c|  b|      follow|
|  f|  c|      follow|
|  e|  f|      follow|
|  e|  d|      friend|
|  d|  a|      friend|
|  a|  e|      friend|
+---+---+------------+



Show the number of degrees, inDegrees and outDegrees of the graph.

In [5]:
#todo: show the number of degrees, inDegrees and outDegrees of the graph

+---+------+
| id|degree|
+---+------+
|  b|     3|
|  a|     3|
|  c|     3|
|  f|     2|
|  e|     3|
|  d|     2|
+---+------+

+---+---------+
| id|outDegree|
+---+---------+
|  a|        2|
|  b|        1|
|  c|        1|
|  f|        1|
|  e|        2|
|  d|        1|
+---+---------+

+---+--------+
| id|inDegree|
+---+--------+
|  b|       2|
|  c|       2|
|  f|       1|
|  d|       1|
|  a|       1|
|  e|       1|
+---+--------+



Group the vertices by gender, and show the minimum age, maximum age and average age for each gender.

In [6]:
from pyspark.sql.functions import min, avg, max

#todo: group the vertices by gender and show the minimum, maximum and average age for each gender

+------+--------+------------------+--------+
|gender|min(age)|          avg(age)|max(age)|
+------+--------+------------------+--------+
|female|      32|              40.5|      60|
|  male|      29|31.666666666666668|      36|
+------+--------+------------------+--------+



How many "follow" relationships are there?

In [8]:
#todo: count how many follow relationship edges are there and print it

4


How many males are there?

In [10]:
#todo: count the number of vertices that have gender male

3


Use Motifs to list the pairs of vertices that have a directed edge in both directions.

In [12]:
#todo: list the vertices that have an edge in both directions

+--------------------+--------------+--------------------+--------------+
|                   a|             e|                   b|            e2|
+--------------------+--------------+--------------------+--------------+
|{c, Charlie, 30, ...|{c, b, follow}|  {b, Bob, 36, male}|{b, c, follow}|
|  {b, Bob, 36, male}|{b, c, follow}|{c, Charlie, 30, ...|{c, b, follow}|
+--------------------+--------------+--------------------+--------------+



Use Motifs to list the pairs of vertices that only have an edge in one direction.

In [18]:
#todo: list the vertices that have only an edge in one direction

+--------------------+--------------+--------------------+
|                   a|             e|                   b|
+--------------------+--------------+--------------------+
|{e, Esther, 32, f...|{e, f, follow}|{f, Fanny, 36, fe...|
|{a, Alice, 34, fe...|{a, e, friend}|{e, Esther, 32, f...|
|{d, David, 29, male}|{d, a, friend}|{a, Alice, 34, fe...|
|{e, Esther, 32, f...|{e, d, friend}|{d, David, 29, male}|
|{f, Fanny, 36, fe...|{f, c, follow}|{c, Charlie, 30, ...|
|{a, Alice, 34, fe...|{a, b, friend}|  {b, Bob, 36, male}|
+--------------------+--------------+--------------------+



Find the vertices that have no inbound followers or friends. 

In [29]:
#todo: find the vertices that have no inbound edges

+--------------------+
|                   a|
+--------------------+
|{g, Gabby, 60, fe...|
+--------------------+



Use Breadth-First Search to find the path between Alice and David.

In [25]:
#todo: find the path between Alice and David

+--------------------+--------------+--------------------+--------------+--------------------+
|                from|            e0|                  v1|            e1|                  to|
+--------------------+--------------+--------------------+--------------+--------------------+
|{a, Alice, 34, fe...|{a, e, friend}|{e, Esther, 32, f...|{e, d, friend}|{d, David, 29, male}|
+--------------------+--------------+--------------------+--------------+--------------------+

