# Running Graph Analytics on Large Scale Graphs Effortlessly with Nvidia and Memgraph 

Through this short tutorial, you will see how to use **Memgraph** and perform various graph analytics in terms of seconds on Facebook dataset containing more then 1 million edges using **Nvidia cuGraph** and **Memgraph**. If you have a huge dataset on which you want to run analytics, you can import it in **Memgraph** by using **Python**, any then use any of the following algorithms:

* Balanced Cut [clustering]
* Spectral Clustering 
* HITS [hubs vs authorities analytics]
* PageRank
* Leiden [community]
* Louvain [community]
* Katz Centrality 
* Betweenness Centrality 

All above algorithms are powered by **Nvidia** and they will execute on **GPU**. Of course, you will still need to query database by using **Cypher** language, but it is pretty simple and straigthforward. You can think of Cypher as SQL for graph databases. It contains many of the same language constructs like `CREATE`, `UPDATE`, `DELETE`... and it's used to query the database.

Before we start, here is a short summary of things you can do from now on with **Memgraph** and **Nvidia**:
* efortlessly import data inside graph database
* run  analytics on graphs and get results really fast - up to 4 seconds for 1.3M edges graph
* run GPU algorithm from graph database




## Prerequisites

In this tutorial we will use following tech, and in order to follow you will need to install:

- [Jupyter](https://jupyter.org/install)
- [Docker](https://docs.docker.com/get-docker/)
- [GQLAlchemy](https://pypi.org/project/gqlalchemy/)

Docker is used because Memgraph is a native Linux application and cannot be installed on Windows and macOS.

## Installation using Docker

After installing Docker, you can set up Memgraph by running:

```
docker run -it -p 7687:7687 -p 3000:3000 -p 7444:7444 memgraph/memgraph-platform
```

This command will start the download and after it finishes, run the Memgraph container.

## Connecting to Memgraph with GQLAlchemy

We will be using the **GQLAlchemy** object graph mapper (OGM) to connect to Memgraph and execute **Cypher** queries easily from **Python**. GQLAlchemy also serves as a Python driver/client for Memgraph. Go to our [installation](https://memgraph.com/docs/gqlalchemy/) page to check how to install it. It is pretty straigthforward and you can install it with `pip`.


In the next few steps we will do the following:
* we will show you how to import large dataset from **CSV** file inside **Memgraph** in terms of seconds. 
* then you will see how to perform `PageRank` and `Louvain community detection` all by using **Python**

But before we continue, we need to import `GQLAlchemy`. 


Let's import `qglalchemy` and connect to **Memgraph** using `host` and `port`. And we will clear our database, just in case.

In [21]:
from gqlalchemy import Memgraph

In [22]:
memgraph = Memgraph("127.0.0.1", 7690)

In [3]:
memgraph.drop_database()

**CSV** files containing **Facebook** dataset have the following structure:
```
node_1,node_2
0,1794
0,3102
0,16645
```
Dataset constists of Facebook pages (from November 2017). It represent blue verified Facebook page networks of different categories. Nodes represent the pages and edges are mutual likes among them. The nodes are reindexed (start from 0) in order to achieve a certain level of anonimity. In order for **Memgraph** to import queries really fast, we will create index for node with label `Page` on `id` property.  

In [4]:
memgraph.execute(
    """
    CREATE INDEX ON :Page(id);
    """
)

Now let's list out all the csv files we have from `data` subfolder.

In [7]:
import os
from os import listdir
from os.path import isfile, join
csv_dir_path = os.path.abspath("./data/facebook_clean_data/")
csv_files = [join(csv_dir_path, f) for f in listdir(csv_dir_path) if isfile(join(csv_dir_path, f))]


In [8]:
for csv_file_path in csv_files:
    memgraph.execute(
        f"""
        LOAD CSV FROM "{csv_file_path}" WITH HEADER AS row
        MERGE (p1:Page {{id: row.node_1}}) 
        MERGE (p2:Page {{id: row.node_2}}) 
        MERGE (p1)-[:LIKES]->(p2);
        """
    )


## PageRank
Now, we will execute PageRank to find important pages of a Facebook dataset. To read more about how does **Pagerank** work, you can go to our **[docs]()** page. All algorithms mentioned in [introduction](#introduction) part are developed by **Nvidia** and now are integrated under **MAGE - Memgraph Advanced Graph Extensions** . Our goal in **Memgraph** is for you to have it very easy when it comes to using algorithms on graph database and getting results really fast. They are implemented in C++ or Python. You don't need to understand how all those pieces are connected together in order to execute PageRank.

You can do it with following line and you will get results in ~4 seconds for 1.3 million edges graph. In other part of query we will create and set `rank` property of node to value that `cugraph.pagerank` algorithm returned under variable `rank` for every `node`.

In [11]:
  memgraph.execute(
        """
        CALL cugraph.pagerank.get() YIELD node,rank
        SET node.rank = rank;
        """
    )

Now, ranks are ready and you can retrieve them with following Python call:

In [19]:
results =  memgraph.execute_and_fetch(
        """
        MATCH (n)
        RETURN n.id as node, n.rank as rank
        ORDER BY rank DESC
        LIMIT 10;
        """
    )
for dict_result in results:
    print(f"node id: {dict_result['node']}, rank: {dict_result['rank']}")

node id: 50493, rank: 0.0030278728385218327
node id: 31456, rank: 0.0027350282311318468
node id: 50150, rank: 0.0025153975342989345
node id: 48099, rank: 0.0023413620866201052
node id: 49956, rank: 0.0020696403564964
node id: 23866, rank: 0.001955167533390466
node id: 50442, rank: 0.0019417018181751462
node id: 49609, rank: 0.0018211204462452515
node id: 50272, rank: 0.0018123518843272954
node id: 49676, rank: 0.0014821440895415787


Results will be available in dictionary form. One more thing you can do is visualize results with **Memgraph Lab**. Besides creating beautiful visualizations powered by D3.js and our graph style script, you can use **Memgraph Lab** to query graph database write your own graph algorithms in **Python or C++ or even Rust**, check Memgraph Database Logs, visualize graph schema. If you don't have any idea on which dataset you can do it, there are plenty of datasets available for you to start and explore. 

Now, let's find top 3 Pages and visualize their relationships in graph. We will use following query in Memgraph Lab:
```
MATCH (n)
WITH n
ORDER BY n.rank DESC
LIMIT 3
MATCH (n)<-[e]-(m)
RETURN *;
```

In [None]:
# insert image

Now, that's it considering PageRank, next you will see how to use Louvain community detection in order to get communities.

## Louvain modularity maximization
From our [docs](https://memgraph.com/docs/mage/query-modules/cpp/community-detection) page:
> The Louvain algorithm belongs to the modularity maximization family of community detection algorithms. Each node is initially assigned to its own community, and the algorithm uses a greedy heuristic to search for the community partition with the highest modularity score by merging previously obtained communities.

What it means is that connected Louvain algorithm measuers how connected are the nodes within a community if we would compare them to how connected they would be in a random network. Also it recursively merges communities into a single node and executes the modularity clustering on the condensed graphs. This is a one of the most popular community detection algorithms. Let's run it to find how many communities we have inside graph:

In [23]:
memgraph.execute(
    """
    CALL cugraph.louvain.get() YIELD cluster_id, node
    SET node.cluster_id = cluster_id;
    """
)

In [None]:
Now, let us find number of communities:

In [28]:
results =  memgraph.execute_and_fetch(
        """
        MATCH (n)
        WITH DISTINCT n.cluster_id as cluster_id
        RETURN count(cluster_id ) as num_of_clusters;
        """
    )
# we will get only 1 result
result = list(results)[0]

#don't forget that results are saved in a dict
print(f"Number of clusters: {result['num_of_clusters']}")


Number of clusters: 2664


Next, you can also visualize some of these communities. You can for example find nodes that belong to one communities, but are connected to other node that belongs in opposing communities.
As for Louvain, it tries to minimize that number of nodes, so we shoudn't see that many of them. In Memgraph Lab try to execute following query:

```
MATCH  (n2)<-[e1]-(n1)-[e]->(m1)
WHERE n1.cluster_id != m1.cluster_id AND n1.cluster_id = n2.cluster_id
RETURN *
LIMIT 1000;
```

Here is the graph style script we used to create this beautiful colorizations:

#### insert image

You can read more about background of such color scheme from our [blog post](https://memgraph.com/blog/optimizing-telco-networks-with-graph-coloring-and-memgraph-mage)