# 10 Minutes to Katana Graph 
This tutorial shows you how to use the remote Python API of the Katana Graph Intelligence Platform (KGIP) to create a simple graph and try out some of Katana Graph's features on that graph. (Learn more about the differences between the KGIP remote and distributed Python APIs at [The Two Katana APIs](Two-Katana-APIs).)

This notebook assumes that the `katana` library is available to your Jupyter notebook's Python processor. 

First, we'll import that library that enables Katana features:

In [1]:
from katana import remote

## Creating a graph

We'll use an instance of the the `katana` library's [`Client`](katana.remote.Client) class to create our graph.

In [2]:
client = remote.Client()

When you create a graph, you can put it in a named database that you [create yourself](katana.remote.Client.create_database), but we will store this graph in the default database.

Setting a partition value for a graph lets you divide the processing of that graph across that number of worker nodes in a cluster. To play with the KGIP features, we'll use a very small graph, so we only need one partition. 

In [3]:
graph = client.create_graph(num_partitions=1)

We can start by using the `Graph` class's `num_nodes` method to find out how many nodes our new graph has. (This may take a minute or so because the system has to do some setup before it can complete its first operation on the graph.)

In [4]:
print(graph.num_nodes())

          0/? [?op/s]

0


Next we'll load four nodes and three edges about some fictional company employees into the new graph. For our sample graph, well pass a Cypher `CREATE` query to the KGIP Python API's `query()` method. You can pass almost any kind of Cypher query to this method to modify and query the graph. 

KGIP is a cloud-native platform, and in a more serious application you would be loading a much larger data set from cloud-based storage resources such as AWS or Google Cloud storage buckets. This data can be CSV, KGIP's own RDG format, pandas DataFrames, Parquet, and other formats. 

In [5]:
result = graph.query(
"""  
CREATE (alex:Employee {name:"Alex",title: "president",hireDate: date("2022-07-03")}) ,
        (bailey:Employee {name:"Bailey",title: "CMO",hireDate: date("2022-07-04")}) ,
        (bailey)-[:REPORTS_TO]->(alex),
        (cassidy:Employee {name:"Cassidy",title: "CTO",hireDate: date("2022-07-04")}) ,
        (cassidy)-[:REPORTS_TO]->(alex),
        (dana:Consultant {name:"Dana",title: "engineer",hireDate: date("2022-07-05")}),
        (dana)-[:REPORTS_TO]->(cassidy)
"""
)

          0/? [?op/s]

Now how many nodes does `graph` have? 

In [6]:
print(graph.num_nodes())

          0/? [?op/s]

4


## Viewing the graph schema

We can ask the KGIP API about our graph's schema and print out information about its node and edge types. It won't be much for our little graph, but you can use these same tools on much larger and more complex graphs. 

In [7]:
schema = graph.schema()
print("\nNodes:")
print(schema.nodes())
print("\nEdges:")
print(schema.edges())

          0/? [?op/s]


Nodes:
           type               properties
0  [Consultant]  [hireDate, name, title]
1    [Employee]  [hireDate, name, title]

Edges:
         type properties from_node_type to_node_type
0  REPORTS_TO         []   [Consultant]   [Employee]
1  REPORTS_TO         []     [Employee]   [Employee]


We can also show an interactive visualization of our schema:

In [8]:
schema.visualize()

GraphVizWidget(edges=[{'from': 'Consultant', 'to': 'Employee', 'label': 'REPORTS_TO', 'type': 'REPORTS_TO', 'p…

![schema visualization](images/TenMinutesToKG4.png)

## Querying the graph 

Let's run a Cypher query on our graph to ask which employees report to who and then show the result of the query:

In [9]:
result = graph.query(
"""  
MATCH (e1) -[:REPORTS_TO]->(e2) 
RETURN e1.name, e1.title, e1.hireDate, e2.name AS reportsTo
"""
)

          0/? [?op/s]

We could use a regular Python `print()` statement to see the results displayed in columns, but instead we'll use the `table()` method, which outputs a scrollable, searchable display.

In [10]:
result.table()

GridBox(children=(HBox(children=(Text(value='', placeholder='Search...'), Label(value='Count: 3 rows'), HBox(c…

![interactive table](images/TenMinutesToKG3.png)

When a query returns a node set and includes the `contextualize=True` parameter, we can use the `visualize()` method with the result set. This displays an interactive, customizable visual version of the graph. Try clicking and dragging nodes, dragging the background, and zooming in and out of the image below with your mouse wheel. (You can do all of these with the graph schema visualization shown above as well.)

In [11]:
result = graph.query(
"""  
MATCH (e1) -[:REPORTS_TO]->(e2) 
RETURN e1, e2
""", contextualize=True
)

result.visualize()

          0/? [?op/s]

          0/? [?op/s]

GraphVizWidget(edges=[{'from': 0, 'to': 1, 'label': 'REPORTS_TO', 'type': 'REPORTS_TO', 'properties': {}}, {'f…

![node query result visualization](images/TenMinutesToKG2.png)

The visualization can be customized interactively with the Customize panel on the right. You can also automate the customization by creating a `GraphVisOptions` structure and passing that to the `visualize()` method:

In [12]:
from katana_visualization_widget import GraphVisOptions, NodeVisOption, EdgeVisOption, ANY

options = GraphVisOptions(
    node_options=[NodeVisOption(ANY, label="title")
    ]
)
result.visualize(graph_vis_options=options)

GraphVizWidget(edges=[{'from': 0, 'to': 1, 'label': 'REPORTS_TO', 'type': 'REPORTS_TO', 'properties': {}}, {'f…

![customized node query visualization](images/TenMinutesToKG1.png)

## Updating and graph versioning

A brand new graph has a `version` value of 0. This gets updated after certain kinds of changes to the graph. Let's see what the `graph` current version is at this point:

In [13]:
print(graph.version)

1


Now we'll edit the graph by giving Bailey a promotion:

In [14]:
result = graph.query(
"""  
MATCH (e:Employee) WHERE e.name = "Bailey" SET e.title = "vice president"
"""
)

          0/? [?op/s]

(You can re-run the "Querying the graph" query above to confirm that Bailey's `title` value has been updated.) Let's look at the graph's query version now: 

In [15]:
print(graph.version)

2


The KGIP [graph checkpointing](data-mgmt/graph-checkpoints.rst) feature lets you save multiple versions of the same graph and return to any saved version whenever you like.

## Analytics

We're going to run the PageRank algorithm developed by Google's founders on the graph that we created. This will assign each node in the graph its own PageRank value, stored in a property that we specify.

For that property name, we can't use one that is already used as a property for the graph, so the following helper function makes it easier to define a new, unique name each time the PageRank algorithm is called on our graph.

In [16]:
import random

# Add a random number to a name stub to make a unique name
def unique_name(name):
    return f"{name}_{random.randint(1, 10000)}"

prop_name = unique_name("nodeRank")

Now (after importing the `analytics` library) we can execute PageRank on our graph:

In [17]:
from katana.remote import analytics

analytics.pagerank(graph, result_property_name = prop_name) 

          0/? [?op/s]

Querying the graph and printing the result shows what value has been assigned to each node's `prop_name` property:

In [18]:
result = graph.query(
    f"""  
    MATCH (e:Employee) 
      RETURN e.name, e.{prop_name}
    """
)
print(result)

          0/? [?op/s]

    e.name  e.nodeRank_8886
0  Cassidy         0.977500
1   Bailey         0.850000
2     Alex         1.124125


With more nodes pointing to the Alex node than to any other, that node got the highest PageRank value. 

Of course this is a toy example, and you would get more interesting results with a larger dataset, but this shows how easy it is to run an analytics algorithm with the KGIP. In the Reference Guide entry on [PageRank](analytics/pagerank.rst) you will see additional parameters that can be passed to this method to tune how the algorithm executes. 

Don't miss the [other analytics algorithms](analytics/index.rst) that are available to use with your data.

## Re-using and deleting the graph

When KGIP creates a graph it assigns it an ID value. We can get that value as shown below and use it to retrieve the graph in a later session with the [`client.get_graph_by_id()`](katana.remote.Client.get_graph_by_id) method,  and the retrieved graph would still have the nodes, edges, and properties that were added above. This would even work if your Internet connection or the cluster hosting your KGIP deployment went down, as long as you had saved this ID first. 

In [19]:
print(graph.graph_id)

7yoq4iy9VybFKUcDZF2CTzwXTxn6NMcf2XLzFpMUkpf


We mentioned earlier that this graph was being created in the default database. If you iterate through `client.get_database("default").graphs_in_database()` and print the `graph_id` value of each, you will find the ID value that you output in the previous cell. 

This particular graph won't be listed if you delete it, though. Uncomment the following cell and run it to do so. If you don't get an error, you know that the `delete()` operation worked.

In [20]:
# graph.delete()

## Next steps

Now that you've had this brief tour of how you can use a Jupyter notebook to work with graphs in KGIP, you can explore more things to try in these sections of the KGIP documentation: 

- **[Graph Query](query/index.rst)**:  How to write and run Cypher queries that explore and update your data, and how to create interactive visualizations of your query results. 
- **[Graph Analytics](analytics/index.rst)**: The analytics routines provided by Katana, what they can do for you, and how to use them with your data.
- **[Graph AI](ai/index.rst)**:  How to apply Graph Neural Networks (GNNs) and other machine learning techniques that take advantage of graph technology.
