<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/tigergraph/tigergraph_inventor_schema_creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install pyTigerGraph

In [1]:
from IPython.display import clear_output
!pip install -qq watermark

################################
# Packages to install
################################
!pip install -U -qq pyTigerGraph

################################

clear_output()
%reload_ext watermark
%watermark -v -p numpy,pyTigerGraph

Python implementation: CPython
Python version       : 3.9.10
IPython version      : 8.0.1

numpy       : 1.22.2
pyTigerGraph: 0.0.9.9.2



# Global Schema

This is an overarching schema that can be used by as many or as few **Graphs** as you desire. The purpose of the **Global Schema** is to give you a common schema that you CAN pull from when setting up Graphs. This is helpful if you have common elements between your different Graphs and want those common elements to have the same schema. For example, my **Global Schema** might contain the entirety of information about my supply chain. Everything from suppliers, warehouses, transportation lines, user orders, product information, user shipping information. However, one of our Graphs might want to only contain information about the suppliers, warehouses, and transportation as that side of the business doesn't need access to customer information. I could use just those elements and their edges from the Global Schema to populate this Graph without having to include all fo the information about the ordering user.

# Graph Schema
As we talked about above, the Graph Schema can contain as much or none of the Global Schema as you desire. In addition, the Graph Schema can contain elements that are not in the Global Schema. From the example above, I want to keep track of the manufacturing side of my supply chain. I can include the suppliers, warehouses, and transportation from the Global Schema, but I can also add in something like weather forecasts and supply forecasts that could be used to predict disruptions to my manufacturing chain. This data can exist in the Graph Schema for the Manufacturing Graph, but not in the Global Schema

## GSQL Schema Design

For times when you don't want to click through the interface in order to create your schema, you can do it via TigerGraph's language **GSQL**. **GSQL** can be executed on your TigerGraph server via the **GSQL** terminal or remotely by one of our many TigerGraph connectors. For this example we'll be using [pyTigerGraph](https://github.com/pyTigerGraph/pyTigerGraph) to interface through this Python notebook. The GSQL will remain the same regardless of which connector method you use.

### Creating a Vertex

This is what the GSQL required to create the **Application** vertex looks like:

`CREATE VERTEX Application(PRIMARY_ID id STRING, filingDate DATETIME, confirmationNumber STRING, docketNumber STRING, title STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"`

To simplify things, this is the pattern your most basic vertex deceleration will follow:

`CREATE VERTEX <VertexType>(PRIMARY_ID id <DataType>, <attributeName1> <DataType1>) WITH PRIMARY_ID_AS_ATTRIBUTE="true"`

Additional attributes are separated by commas and placed after the first.

### Creating an Edge

The GSQL to create an edge is extremely similar to that to create a vertex. Here's the **is_continuation_type** edge:

`CREATE UNDIRECTED EDGE is_continuation_type(FROM Application, TO ContinuationIype)`

Let's look at a **Directed** edge for comparison:

`CREATE DIRECTED EDGE has_child(FROM Application, TO Application, date DATETIME) WITH REVERSE_EDGE="reverse_has_child"`

And lastly, the pattern ( [  ] = **Optional** ):

`CREATE DIRECTED|UNDIRECTED EDGE <edge_name>(FROM <VertexType> TO <VertexType>, <attributeName1> <DataType1>) [WITH REVERSE_EDGE=<reverse_edge_name>]`

#### Compound Edges

[TODO] This is a slightly more advanced concept. Needs some research.

Let's take a look at what a **Compound Edge** looks like.

`CREATE UNDIRECTED EDGE in_code (FROM Address, TO PostalCode | FROM Correspondence, TO PostalCode | FROM City TO, PostalCode)`

Notice that pipes separate the individual **Source** - **Target** pairs.


## The Full Schema

Now it's time to declare our whole schema. I'm going to separate the declarations by vertices and edges, but there is nothing stopping you from declaring your whole schema in one statement.

Read through this and see how it lines up with the visual schema that we created earlier.

### Vertices

```
CREATE VERTEX Application(PRIMARY_ID id STRING, filingDate DATETIME, confirmationNumber STRING, docketNumber STRING, title STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX ContinuationType(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX USPCClass(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX USPCSubclass(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX EventCode(PRIMARY_ID id STRING, description STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PTAEvent(PRIMARY_ID id STRING, description STRING, applicantDelay FLOAT, usptoDelay FLOAT) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX ExtensionIndicator(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PTASummary(PRIMARY_ID id STRING, ptoDelayA FLOAT, ptoDelayB FLOAT, ptoDelayC FLOAT, overlapDelay FLOAT, nonOverlapDelay FLOAT, manualAdjustment FLOAT, applicationDelay FLOAT, PTA FLOAT) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PTESummay(PRIMARY_ID id STRING, ptoAdjustment FLOAT, ptoDelay FLOAT, applicantDelay FLOAT, PTE FLOAT) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX ApplicationStatus(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PatentNumber(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Examiner(PRIMARY_ID id STRING, fullName STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX ArtUnit(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Attorney(PRIMARY_ID id STRING, firstName STRING, middleName STRING, lastName STRING, suffix STRING, phone STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PracticeCategory(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX SmallEntity(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX First_toFile(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX FileLocation(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PGPUBNumber(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX WIPONumber(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Inventor(PRIMARY_ID id STRING, first STRING, middle STRING, last STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX ForeignParent(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX City(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Region(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Country(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX PostalCode(PRIMARY_ID id STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Correspondence(PRIMARY_ID id STRING, name STRING, customerNumber STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
CREATE VERTEX Address(PRIMARY_ID id STRING, line1 STRING, line2 STRING, line3 STRING) WITH PRIMARY_ID_AS_ATTRIBUTE="true"
```

### Edges

```
CREATE DIRECTED EDGE has_child(FROM Application, TO Application, date DATETIME) WITH REVERSE_EDGE="reverse_has_child"
CREATE DIRECTED EDGE has_parent(FROM Application, TO Application, date DATETIME) WITH REVERSE_EDGE="reverse_has_parent"
CREATE UNDIRECTED EDGE is_continuation_type(FROM Application, TO ContinuationType)
CREATE UNDIRECTED EDGE has_class(FROM Application, TO USPCClass)
CREATE UNDIRECTED EDGE has_subclass(FROM Application, TO USPCSubclass)
CREATE DIRECTED EDGE is_subclass(FROM USPCSubclass, TO USPCClass) WITH REVERSE_EDGE="reverse_is_subclass"
CREATE UNDIRECTED EDGE has_code(FROM Application, TO EventCode, date DATETIME)
CREATE UNDIRECTED EDGE has_pta_event(FROM Application, TO PTAEvent, date DATETIME)
CREATE UNDIRECTED EDGE is_extension(FROM PTAEvent, TO ExtensionIndicator)
CREATE DIRECTED EDGE has_start(FROM PTAEvent, TO PTAEvent) WITH REVERSE_EDGE="reverse_has_start"
CREATE UNDIRECTED EDGE has_pta_summary(FROM Application, TO PTASummary)
CREATE UNDIRECTED EDGE has_pte_summary(FROM Application, TO PTESummay)
CREATE UNDIRECTED EDGE has_status(FROM Application, TO ApplicationStatus, date DATETIME)
CREATE UNDIRECTED EDGE has_patent(FROM Application, TO PatentNumber, date DATETIME)
CREATE UNDIRECTED EDGE has_examiner(FROM Application, TO Examiner)
CREATE UNDIRECTED EDGE from_unit(FROM Examiner, TO ArtUnit)
CREATE UNDIRECTED EDGE has_attorney(FROM Application, TO Attorney)
CREATE UNDIRECTED EDGE has_practice_category(FROM Attorney, TO PracticeCategory)
CREATE UNDIRECTED EDGE is_small(FROM Application, TO SmallEntity)
CREATE UNDIRECTED EDGE follows_ftf(FROM Application, TO First_toFile)
CREATE UNDIRECTED EDGE at_location(FROM Application, TO FileLocation, date DATETIME)
CREATE UNDIRECTED EDGE hasIGPUB(FROM Application, TO PGPUBNumber, date DATETIME)
CREATE UNDIRECTED EDGE hasIIPO(FROM Application, TO WIPONumber, date DATETIME)
CREATE UNDIRECTED EDGE filed_application(FROM Application, TO Inventor, rank INT)
CREATE UNDIRECTED EDGE has_foreign_parent(FROM Application, TO ForeignParent, date DATETIME)
CREATE UNDIRECTED EDGE from_city(FROM Inventor, TO City)
CREATE UNDIRECTED EDGE from_region(FROM Inventor, TO Region)
CREATE UNDIRECTED EDGE from_country(FROM Inventor, TO Country)
CREATE UNDIRECTED EDGE filed_country(FROM ForeignParent, TO Country)
CREATE UNDIRECTED EDGE has_correspondence(FROM Inventor, TO Correspondence)
CREATE UNDIRECTED EDGE has_address(FROM Address, TO Correspondence)
CREATE UNDIRECTED EDGE in_code(FROM City, TO PostalCode | FROM Correspondence, TO PostalCode | FROM Address, TO PostalCode)
CREATE UNDIRECTED EDGE in_region(FROM PostalCode, TO Region | FROM Correspondence, TO Region | FROM Address, TO Region)
CREATE UNDIRECTED EDGE in_country(FROM PostalCode, TO Country | FROM Country, TO Region | FROM Correspondence, TO Country | FROM Address, TO Country)
CREATE UNDIRECTED EDGE in_city(FROM Region, TO City | FROM Correspondence, TO City | FROM Address, TO City)
```

#### Solution Connection

In [2]:
import pyTigerGraph as tg

# connection parameters
# hostName is the TigerGraph solution URL
hostName = "http://localhost"
graphName = "Patents"
userName = "tigergraph"
password = "tigergraph"

# establish the connection to the TigerGraph Solution
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password)

# print any current schema so we can verify that we are connected
conn.gsql('LS')

'---- Global vertices, edges, and all graphs\nVertex Types:\n- VERTEX Paper(PRIMARY_ID id INT, x LIST<INT>, y INT, train_mask BOOL, val_mask BOOL, test_mask BOOL) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="true"\nEdge Types:\n- DIRECTED EDGE Cite(FROM Paper, TO Paper)\n\nGraphs:\n- Graph Cora(Paper:v, Cite:e)\n- Graph Patents()\nJobs:\n\n\nJSON API version: v2\nSyntax version: v2\n'

#### Graph Connection

Once connected to the Solution, we can create the required Secret and Token needed to authenticate with our Graph.

In [3]:
# set the name of the graph that we want to connect to
conn.graphname = graphName

# create a secret
secret = conn.createSecret()
# use the secret to get a token
authToken = conn.getToken(secret)[0]

# connect to graph with token
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName, apiToken=authToken)

# listing vertex count requires graph authentication and will prove that we're securely connected to the Graph
conn.getVertexCount("*")

{}

#### Clearing the Schema (optional)

Since we're going to be loading in the full schema, we should clear any currently loaded schema from the graph as it will throw an error if we try to create a Vertex or Edge with the same name as one that already exists.

In [4]:
# Get a list of vertices and edges currently in the graph schema
vertices = conn.getVertexTypes()
edges = conn.getEdgeTypes()

# we need a SCHEMA_CHANGE JOB to change the schema, we're going to put that together in the next couple lines
changeJob = '''CREATE SCHEMA_CHANGE JOB clearGraph FOR GRAPH Patents {'''

for vertex in vertices:
    changeJob += ('''DROP VERTEX ''' + vertex + ';')

for edge in edges:
    changeJob += ('''DROP EDGE ''' + edge + ';')

changeJob += '}'

# print the complete change job
print(changeJob)

# add the job to the graph
print(conn.gsql('''
    USE GRAPH patents
    ''' + changeJob))
# execute the change job
print(conn.gsql('''
    USE GRAPH patents
    RUN SCHEMA_CHANGE JOB clearGraph
    '''))
# delete the change job
print(conn.gsql('''
    USE GRAPH patents
    DROP JOB clearGraph
    '''))

CREATE SCHEMA_CHANGE JOB clearGraph FOR GRAPH Patents {}
Encountered " "}" "} "" at line 3, column 60.
Was expecting one of:
"add" ...
"alter" ...
"change" ...
"drop" ...

Graph 'patents' does not exist.
Error: Currently not using any graphs! Please use 'USE GRAPH' command to switch to a graph. Create a graph if no graph exists.
Graph 'patents' does not exist.
Semantic Check Fails: These jobs could not be found anywhere: [clearGraph].


#### Clearing the Graph

Via GSQL, it's much easier for us to create our schema in the **Global** manner, then create a **Graph** utilizing elements from that **Global** schema. Adding schema to an existing **Graph** is possible, it just requites a [SchemaIhange Job](https://docs.tigergraph.com/gsql-ref/3.3/ddl-and-loading/modifying-a-graph-schema) rather than schema definition.

Because this process is creating a new **Graph** from our **Global** schema, we need to drop our old **Graph** because we can't have two graphs with the same name.

Remember how we needed a **Secret** and **Token** to connect to a graph specifically? Once we delete our current Graph and create a new one, we will need to create a new **Secret** and **Token** tied to that Graph so we can connect to it.

In [5]:
print(conn.gsql('DROP GRAPH Patents'))

The graph Patents is dropped.
