# Projecting Bimodal to Unimodal

# Database Versioning

In order to really explore the rammifications of the different Graph Models, I created three versions of the Marvel Universe Social Network. The one that we have right now is a straight up Bimodal Graph.

This is the type of graph that I work with for my job. During the course of my work, I've started wondering several things:

- Is it necessary to project a bimodal graph to a unimodal graph to run the graph algorithms?
- Is it easier to use a projected unimodal graph when looking to engineer features for machine learning?
- Can a unimodal and bimodal version of the graph exist in the same space and still be usable?

To answer these questions, I created three versions of the Marvel graph:

1. Bimodal
2. Mixed bimodal and projected unimodal
3. Unimodal

Now, you can go thorugh all the steps above to get two more databases to the same point as the one we've already got. Or you can simply close the Bimodal Graph.

### 1. Clone database

Click on the dots in the upper right corner of the database and choose `Clone`

<img src='images/clone_db1.png'>

### 2. Rename database

1. The default database name is `DBMS`. To change it you have to go to `Manage` then click on the pencil type icon next to the default name

<img src='images/rename_db1.png'>

2. Type in the name you want and click the checkbox

<img src='images/rename_db2.png'>

*Side note* I tried to rename the database from the main page under My Project, but the name didn't reflect the change when I checked the properties in `Manage`. Also, when I went to create a second clone, I got an error stating that there was already a database named DBMS.

# Unimodal Projection

## Context

The purpose of projecting a bimodal graph to a unimodal structure is to directly connect nodes of interest.

The Marvel example naturally lends itself to a bimodal structure. Heroes are in comics and we can tell what heroes appear together by connecting them through the comics they're in together.

<img src='images/unimodal_pro2.png' width=650>

However, if we want to run our queries and metrics on just what heroes know each other or appear in the same comics, we might want a unimodal projection of this network.

<img src='images/unimodal_pro3.png' width=650>

There are a couple of benefits to a projection like this.

1. Queries from one hero to another are easier
  - Instead of `(hero1)-[:APPEARS_IN]-(comic)-[:APPEARS_IN)-(hero2)` we can just write `(hero1)-[:KNOWS]-(hero2)`
2. We can put a weight on the relationship, which will tell us how many comics the heroes are in together
3. If we want, we can also create a list on the relationship that will tell us the names of the comics they appear in together

### Caveats

One thing to keep in mind when projecting a bimodal network into a unimodal structure is that you'll get a lot of *cliques*.

**Clique**: a subset of nodes where every distinct node is connected to every other distinct node.

There are four cliques in the example above. But to make it more clear, let's look at an example from a single comic.

I pulled the comic W2 50 and the heroes in it. Some statistics about this subgraph:

- This for 1 comic
- There are 9 heroes
- Total of 10 nodes
- There are 9 relationships

<img src='images/clique1.png' width=650>

When we project this to a unimodal structure, the number of relationships multiples significantly. The statistics for the projected version:

- There are 0 nodes for the comic (we removed it so we could connect heroes directly)
- There are 9 heroes
- Total of 9 nodes
- There are 36 relationships

<img src='images/clique2.png' width=650>

I'm not quite sure of the repercussions of this are for using metrics to find patterns and trends of interest, but that's the question that this series of entries will answer.

### Code

#### Find small examples

To find a small subgraph for the examples above, I did a degree count, then picked a name from the list. Generally I just went for something that looked vaguely familiar and was easy to spell. From this query, I picked "DARK CRAWLER".

```
MATCH (h:Hero)-[r]-(o:Comic)
WITH h.name as h_name, count(r) as h_degree
WHERE h_degree > 4
RETURN h_name, h_degree
Order by h_degree
```

For this query, I wanted something with a small number of connections, so I counted all relationships, then scrolled to the end of the list. "SHIVA" had nine connections, which seemed like it would fit the bill.

```MATCH (h:Hero)-[r]-(o)
RETURN h.name, count(r) as h_degree
Order by h_degree
```

#### Send Hero Dark Crawler’s hero connections to Gephi

```
MATCH path = (h1:Hero {name: 'DARK CRAWLER'})-[:KNOWS]-(h2)
CALL apoc.gephi.add(null, 'workspace1',path,'weight') yield nodes
RETURN *
```

#### Send Hero Dark Crawler’s comic-hero connections to Gephi

```
MATCH path = (h1:Hero {name: 'DARK CRAWLER'})-[:APPEARS_IN]-(c:Comic)-[:APPEARS_IN]-(h2:Hero)
CALL apoc.gephi.add(null, 'workspace2', path) yield nodes
RETURN *
```

#### Send Hero Shiva’s hero connections to Gephi

```
MATCH path = (h1:Hero {name: 'SHIVA'})-[:KNOWS]-(h2)
CALL apoc.gephi.add(null, 'workspace1',path,'weight') yield nodes
RETURN *
```

#### Send Comic connections to Gephi

```
MATCH path = (c:Comic {name: 'W2 50'})-[:APPEARS_IN]-(h:Hero)
CALL apoc.gephi.add(null, 'workspace2', path) yield nodes
RETURN *
```




## Project Bimodal to Unimodal

Now that we know why we want to project to unimodal and at least one challenge to address, we need to make a graph that reflects this structure.

I created two versions: one that has both sets of information and one that removes the comics and their relationships.

### Context

Why did I do this? Why not just use the mixed graph and have it all?

There are a couple reasons. First, I find that not all functions do what I think they will. For example, I ran the `gds.alpha.degree.stream()` function from the Graph Data Science Library. The [Degree Centrality](https://neo4j.com/docs/graph-data-science/current/algorithms/degree-centrality/) doc page states "Degree centrality measures the number of incoming and outgoing relationships from a node."

However, when I went to use it, the number of relationships wasn't accurately reflected in the numbers it returned. Instead I used the `count()` function, which allowed me to hone in on different relationship populations: heroes to comics or heroes to heroes or the combination of both.

#### Hero to hero degree count

```
MATCH (h:Hero)-[r]-(o:Hero)
RETURN h.name, count(r) as h_degree
Order by h_degree desc
```

#### Hero to comic degree count

```
MATCH (h:Hero)-[r]-(o:Comic)
RETURN h.name, count(r) as h_degree
Order by h_degree desc
```

#### Hero to all degree count

```
MATCH (h:Hero)-[r]-(o)
RETURN h.name, count(r) as h_degree
Order by h_degree desc
```

Because of this mismatch between expectations and results, I tend to test things from multiple angles before trusting my results.

Second, I want multiple versions of the graph to see if it's easier and faster to structure it one way vs another. One of the major road blocks I've encountered at work is that it takes forever to run some of the queries and algorithms I want. These problem queries end up timing out or temporarily crashing the graph. As is expected, our developers and software engineers get rather testy with me when that happens.

### Code

#### Add weighted edges to the graph

```
Call apoc.periodic.iterate('MATCH (h1:Hero)-->(:Comic)<--(h2:Hero) where id(h1) < id(h2) RETURN h1, h2',
'MERGE (h1)-[r:KNOWS]-(h2) on CREATE SET r.weight = 1 on MATCH SET r.weight = r.weight+1', {batchSize:5000, parallel:false, iterateList:True});
```

#### Reduce to Unimodal network

Create a copy of the mixed graph that was made from the query above. Now we want to remove the comics and all the relationships that connect to the comic nodes. To do this, we just need to `DETACH DELETE` all nodes with the `Comic` label.

```
MATCH (c:Comic)
DETACH DELETE c;
```