# A Deeper Dive into Graph Algorithms
What added information can we get from the community-related properties of our objects - companies, managers and holdings? Let's for example find 10 most popular companies where asset managers making their investments.
<br>First we need to make connection with the neo4j database.

## Connect met Neo4j
You'll need to enter the credentials from your Neo4j instance below.  You can get these by running the command ":server connect" in the Neo4j Browser.  The default DB_USER and DB_NAME are always neo4j.

In [18]:
# Edit these variables!
DB_URL = "neo4j+s://<to be disclosed during hdh>.databases.neo4j.io"
DB_PASS = "<to be disclosed during hdh>"

DB_USER = "neo4j"
DB_NAME = "neo4j"

In [20]:
from graphdatascience import GraphDataScience
gds = GraphDataScience(DB_URL, auth=(DB_USER, DB_PASS), aura_ds=True)

First we're going to create an in memory graph represtation of the data in Neo4j Graph Data Science (GDS).

In [5]:
result = gds.run_cypher(
  """
    CALL gds.graph.project(
      'mygraph',
      ['Company', 'Manager', 'Holding'],
      {
          OWNS: {orientation: 'UNDIRECTED'},
          PARTOF: {orientation: 'UNDIRECTED'}
      }
    )
    YIELD
      graphName AS graph,
      relationshipProjection AS readProjection,
      nodeCount AS nodes,
      relationshipCount AS rels
  """
)
display(result)

Unnamed: 0,graph,readProjection,nodes,rels
0,mygraph,"{'PARTOF': {'orientation': 'UNDIRECTED', 'inde...",458170,1787688


Note, if you get an error saying the graph already exists, that's probably because you ran this code before. You can destroy it using this command (after uncommenting):

In [None]:
# result = gds.run_cypher(
#   """
#     CALL gds.graph.drop('mygraph',false)
#   """
# )
# display(result)

Now, let's list the details of the graph to make sure the projection was created as we want.

In [None]:
result = gds.run_cypher(
  """
    CALL gds.graph.list()
  """
)
display(result)

The most popular for investments companies would most likely have more "PARTOF" relations with the holdings than less popular. So, the node centrality seems to be the right metric here. Let's use for example the PageRank algorithm finding the most central nodes.
<br>Looking at our nodes properties, we find that the name of the companies can be found in "nameOfIssuer" attribute. We'll use this in our cypher query, and then will look only at nodes of type company in our output dataframe (other nodes don't have "nameOfIssuer" attribute).

In [28]:
result = gds.run_cypher(
    """
        CALL gds.pageRank.stream('mygraph')
        YIELD nodeId, score
        RETURN gds.util.asNode(nodeId).nameOfIssuer AS name, score
        ORDER BY score DESC, name ASC
    """
)

And here we've got our result:

In [None]:
result[result["name"].notna()].head(10)

So we see that we can extract quite interesting information about specific community-related characteristics of our objects (nodes), which can be quite useful in a downstream e.g. classification model.
<br>You can further explore other graph algorithms, looking for example at node connectivity, local communities, or the topological specificities of the entire graph. On this [page](https://neo4j.com/docs/graph-data-science/current/algorithms/) you can find the descriptions of the available graph algorithms, together with explanations and use examples.