# Schema Transform Example
In this example, we persist a source graph's metadata in Grakn, perform a motif query to 'transform' the graph, and document the updated schema. Ideally, generated code snippets will apply the transforms to the source graph in Spark.

## Source Graph


In [1]:
# ! pip install findspark

In [2]:
# !pip install graphframes
# https://towardsdatascience.com/graphframes-in-jupyter-a-practical-guide-9b3b346cebc5

In [1]:
import findspark
findspark.init()


In [2]:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, SQLContext, DataFrame
from pyspark.conf import SparkConf

In [3]:
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))
sqlC = SQLContext(sc)
sc.addPyFile("/Users/josephhaaga/.ivy2/jars/graphframes_graphframes-0.6.0-spark2.3-s_2.11.jar")

In [4]:
edges = sqlC.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("./data/peopleAndCompanies_edges.csv") 
    
vertices = sqlC.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("./data/peopleAndCompanies_vertices.csv") 

In [5]:
vertices.toPandas()

Unnamed: 0,id,type,name,address
0,0,company,Cooper-Green,"6647 Roger Walks Suite 088 Julieburgh, NY 53881"
1,1,company,Williams-Figueroa,"65694 Maureen Mountain Morganhaven, ND 01430"
2,2,company,Montes LLC,"2464 Mark Unions Suite 345 Johnview, MN 95700"
3,3,company,Gomez-Morgan,"565 Jason Park Thomasmouth, TX 84271"
4,4,company,Bowen-Payne,"6161 Lynn Summit Suite 881 South Darleneshire,..."
5,5,company,Smith-Jimenez,"34537 Briggs Light Suite 173 Danielfort, GA 06278"
6,6,company,Brandt and Sons,"43659 Butler Shores Apt. 723 Juliachester, NJ ..."
7,7,company,Patterson-Allen,"04097 Turner Lake Apt. 998 Robinsonfurt, WA 58520"
8,8,company,Rivera-Taylor,"014 Jeffrey Pines West Bobbyfort, OK 65867"
9,9,company,Peters Ltd,"398 Cortez Point Suite 413 Kevintown, NC 49123"


In [6]:
from graphframes import *
# https://stackoverflow.com/a/50404308

In [7]:
g = GraphFrame(vertices, edges)

In [15]:
# Dependents claiming dependents
g.find("(a)-[r]->(b); (b)-[r2]->(c)") \
    .filter("r.relationship == 'claims_dependent'") \
    .filter("r2.relationship == 'claims_dependent'").show()

+--------------------+--------------------+--------------------+--------------------+--------------------+
|                   a|                   r|                   b|                  r2|                   c|
+--------------------+--------------------+--------------------+--------------------+--------------------+
|[22, person, Mari...|[22, 66, claims_d...|[66, person, Andr...|[66, 69, claims_d...|[69, person, Javi...|
|[22, person, Mari...|[22, 88, claims_d...|[88, person, Caro...|[88, 34, claims_d...|[34, person, Tyle...|
|[22, person, Mari...|[22, 88, claims_d...|[88, person, Caro...|[88, 92, claims_d...|[92, person, Anit...|
|[22, person, Mari...|[22, 25, claims_d...|[25, person, Mary...|[25, 58, claims_d...|[58, person, Dona...|
|[26, person, Thom...|[26, 45, claims_d...|[45, person, Terr...|[45, 47, claims_d...|[47, person, Geor...|
|[29, person, Lisa...|[29, 71, claims_d...|[71, person, Evan...|[71, 58, claims_d...|[58, person, Dona...|
|[35, person, Devi...|[35, 36, claims

## Describe Source Graph in Grakn Metamodel
The source graph, a GraphFrame named `g`, depicts a network of people claiming eachother as dependents. We need methods to extract the relevant features of this graph so that it can be depicted in the Grakn metamodel. Some examples of things we need to describe include:

### graphVertex
- Attributes
    - Name
- Relationships
    - has-type
    - has-graphobjects
    - has-concept
    - has-vertexid
    - has_property
    - has-attribute

In [16]:
import uuid

#### Can GraphFrames give a list of the different vertex types?
e.g: `g.vertices.types`

It looks like GraphFrame has weak support for multiple vertex types in the same graph. This will require manual intervention in the metamodel creation. Or we can insert a Type column into the vertex DataFrame.
https://forums.databricks.com/questions/7792/with-graphframes-are-there-ways-of-dealing-with-mu.html

In [17]:
# vertexTypes = ["Person"]
vertexTypes = g.vertices.select("type").distinct().rdd.flatMap(lambda x: x).collect()

# A more interesting graph would have multiple vertex types e.g.
# types = ["Person", "Return", "Company"]

createVertices = ['insert $'+str(uuid.uuid4())+' isa graphVertex has name "'+v+'";' \
 for v in vertexTypes]

# insert statements may need to become match-insert statements if we want to update existing metamodel graphs

createVertices

['insert $23b71f90-3b2c-475e-9beb-98853329b257 isa graphVertex has name "person";',
 'insert $6256bbe3-f48e-42e4-b1c9-edfa46037e8f isa graphVertex has name "company";']

### graphEdge
- Attributes
    - Name
- Relationships
    - has-type
    - has-graphobjects
    - has-concept
    - has-edgeids
    - has_property
    - has-attribute


In [18]:
edgeTypes = g.edges.select("relationship").distinct().rdd.flatMap(lambda x: x).collect()
createEdges = ['insert $'+str(uuid.uuid4())+' isa graphEdge has name "'+e+'";' \
 for e in edgeTypes]

createEdges

['insert $30b6d21a-b030-412f-9d2f-b2bd4bfb6c5f isa graphEdge has name "owned_by";',
 'insert $dfd916b0-7640-4783-a700-da9867967ca1 isa graphEdge has name "employed_by";',
 'insert $8d8e2e02-94c8-4c28-988c-8d7bbb0abad0 isa graphEdge has name "claims_dependent";']

### graphTriplet
This may benefit from making owned_by an ambigious relationship (e.g. Company can be owned_by another Company or Person). This can be changed in the GeneratePeopleAndCompanies.ipynb notebook.

In [20]:
# Try to get the nodeType -> relationshipType -> nodeType triples from GraphFrame for insertion into metamodel
g.edges

DataFrame[src: int, dst: int, relationship: string]

In [23]:
g.vertices.

SyntaxError: invalid syntax (<ipython-input-23-f29cdcf28f6a>, line 1)

In [15]:
createRelationships = ['insert $'+str(uuid.uuid4())+' isa graphEdge has name "'+e+'";' \
 for e in ]


SyntaxError: invalid syntax (<ipython-input-15-2ab630cd75c2>, line 2)

### graphAttribute