***

# Increase Insight into your Graph Data on Graph Studio
By Oracle Spatial and Graph Team
***

# Overview:

This notebook shows how we can access graphs in Autonomous Database instance. We can then run algorithms on and query the graph. From the result set, we can transform the result set using common datascience tools like pandas, and pyplot.

---

## Step 1: Import required libraries
This can also be done at any point in the notebook, but for simplicity, we can import all necessary libraries at the start to use them throughout the rest of the notebook.

In [None]:
from opg4py.adb import AdbClient
import pandas
import matplotlib.pyplot as plt

In [None]:
from pypgx import setloglevel
setloglevel('ROOT', 'OFF')

## Step 2: Connect to ADB
The following four paragraphs create the config for an Autonomous Database connection, creates a client connection, checks if the graph client is attached, starts the Graph Studio environment, and checks what user started the job creation job for Graph Studio.

In [None]:
config = {
          'tenancy_ocid': '<tenancy_ocid>',
          'database': '<autonomous_database_name>',
          'database_ocid': '<autonomous_database_ocid'>,
          'username': 'GRAPHUSER',
          'password': '<graphuser_password>',
          'endpoint': 'https://<hostname-prefix>.adb.<region>.oraclecloudapps.com/'
 }

client = AdbClient(config)
client.__enter__()

In [None]:
client.is_attached()

In [None]:
# If the environment is already started, the line below will throw a "IllegalStateException: environment currently attached" error.
# If so, skip the next two paragraphs

job = client.start_environment(10)
job.get()
job.get_name()

In [None]:
job.get_created_by()

## Step 3: Create PGX Session and Load Bank Graph into Memory

The next paragraph creates a pgx session. Here, we assume that you have created a BANK_GRAPH in Graph Studio. If you have not, you can launch an ADB environment with this [reference architecture](https://docs.oracle.com/en/solutions/oci-adb-graph-analytics/index.html), and find instructions to create the property graph [here](https://docs.oracle.com/en/cloud/paas/autonomous-database/csgru/create-graph-existing-relational-tables.html). 
<br />
<br />
The paragraph after checks if the Bank Graph is already loaded into memory, if it is not, it is loaded.

In [None]:
instance = client.get_pgx_instance()
session = instance.create_session("adb-session")

In [None]:
GRAPH_NAME="BANK_GRAPH";
# try getting the graph from the in-memory graph server
graph = session.get_graph(GRAPH_NAME);
# if it does not exist read it into memory
if (graph == None) :
    session.read_graph_by_name(GRAPH_NAME, "pg_view")
    print("Graph "+ GRAPH_NAME + " successfully loaded")
    graph = session.get_graph(GRAPH_NAME);
else :
    print("Graph '"+ GRAPH_NAME + "' already loaded");

## Step 4: Run PageRank Algorithm

Pagerank measures the importance of each node within the graph, based on the number incoming relationships and the importance of the corresponding source nodes

In [None]:
graph.get_or_create_vertex_property("pagerank", data_type='double', dim=0)
analyst = session.create_analyst()
analyst.pagerank(graph, tol=0.001, damping=0.85, max_iter=100, norm=False, rank='pagerank');

## Step 5: Query Graph

Run the following paragraph to query the BANK_GRAPH. This will return a result set which we can then print. Later in the notebook, we will use this result set with some common Data Science conda packages.

In [None]:
rs = graph.execute_pgql("SELECT a.acct_id, a.pagerank_2 as pagerank FROM MATCH (a) ON bank_graph ORDER BY acct_id asc")
rs.print()

## Step 6: Convert to Pandas Dataframe 

In the following paragraphs, we will convert the result set to a pandas dataframe, and gather some basic statistics from that dataframe.

In [None]:
result_df = rs.to_pandas()
print(result_df)

In [None]:
# get basic statistics for the numerical columns of a Pandas DataFrame
result_df.describe()

In [None]:
# Calculate the standard deviation of the given set of numbers, DataFrame, column, and rows
result_df.std()

## Step 7: Create Visualization Charts
With our data organized as a data frame, we can easily use the matplotlib package to create charts for further analysis.

In [None]:
result_df.plot()
plt.show()

In [None]:
rs_df = graph.execute_pgql("SELECT a.acct_id, a.pagerank_2 as pagerank FROM MATCH (a) ON bank_graph").to_pandas()
df = rs_df.sort_values(by='acct_id', ascending=False)
accounts = df['acct_id']
values = df['pagerank']
plt.bar(accounts, values, color ='maroon', width = 0.4)
plt.xlabel("Account ID")
plt.ylabel("Page Rank Value")
plt.title("Page Rank Value by Account ID")
plt.show()

## Step 8: Close PGX Session

In [None]:
# Close the session after executing all graph queries
session.close()