# Entry G14: Global Counts Comparison

The notebook that accompanies this entry is the cleaned up, concise version of the three notebooks that accompanied [Entry G6](https://julielinx.github.io/blog/g06_global_counts/), but limited to just the global counts for the three different graph models. 

And as long as I'm cleaning things up, I decided to provide some additional pictures and commentarion on global counts. However, this entry is a supplement to [Entry G6](https://julielinx.github.io/blog/g06_global_counts/), not a replacement, so be sure to read that entry first.

## Overview

Now that the info for all three graph models are pulled into the same notebook, we can really start to see how the graph model effects the nodes and relationships in the graph.

The global count metrics I used can be summed up by counting relationships and nodes in the picture below:

<img src='images/global_counts.png'>

The picture has:

- 16 nodes, of which 12 are Hero nodes and 4 are Comic nodes
- 13 relationships (all of which are between a Hero and a Comic, never between two of the same node type)
- 1 isolated Hero node (outlined in orange)

## Node Counts

With the node counts for all three graph models in the same DataFrame it's easy to see how the graph model effects the nodes in each graph.

The Hero nodes are the same for all three models and the Comic nodes are the same for the Bimodal Model and Mixed Model. As far as the nodes are concerned, the only model that is different is that the Comic nodes were removed from the Unimodal Model.

In [1]:
import pandas as pd
from neo4j import GraphDatabase

In [2]:
uri = "bolt://localhost:7687"

driver = GraphDatabase.driver(uri, auth=('neo4j', 'password'))

uni_session = driver.session(database="unimodal")
bi_session = driver.session(database="bimodal")
mix_session = driver.session(database="mixmodal")

In [3]:
node_cts = pd.DataFrame(uni_session.run("call apoc.meta.stats() YIELD labels").value()).rename({0:'uimodal'}).append(
pd.DataFrame(bi_session.run("call apoc.meta.stats() YIELD labels").value()).rename({0:'bimodal'})).append(
pd.DataFrame(mix_session.run("call apoc.meta.stats() YIELD labels").value()).rename({0:'mixmodal'})).fillna(0)
node_cts['total'] = node_cts['Hero'] + node_cts['Comic']

node_cts

Unnamed: 0,Hero,Comic,total
uimodal,6439,0.0,6439.0
bimodal,6439,12651.0,19090.0
mixmodal,6439,12651.0,19090.0


## Relationship Count

The relationships are where the main differences are between the three graph models.

When looking at the total relationship counts, each model has a different count:

- Bimodal is smallest with 96,104
- Unimodal is in the middle with 171,644
- Mixed is largest with 267,748

Keep in mind that the Unimodal Model has weighted relationships (for information on weighted relationships see [Entry G4](https://julielinx.github.io/blog/g04_graph_model_rels/)). While we are projecting relationships based on the original Bimodal Model, we end up with a lot more connections in the projected version. If we include the weights we get a total relationship count of 579,191 for Hero to Hero relationships (which we know from the [Entry G13 notebook](https://github.com/julielinx/datascience_diaries/blob/master/graph/13a_nb_weighted_degree_comparison.ipynb)). That's around 6 times the number of connections from the original representation.

In [6]:
rel_cts = pd.DataFrame(uni_session.run('''MATCH ()-[r]->()
RETURN type(r) as rel_type, count(r) as count''').data(
)).rename({0:'unimodal'}).pivot(columns='rel_type', values='count').append(
pd.DataFrame(bi_session.run('''MATCH ()-[r]->()
RETURN type(r) as rel_type, count(r) as count''').data(
)).rename({0:'bimodal'}).pivot(columns='rel_type', values='count')).append(
pd.DataFrame(mix_session.run('''MATCH ()-[r]->()
RETURN type(r) as rel_type, count(r) as count''').data(
)).rename({0:'mixmodal', 1:'mixmodal'}).pivot(columns='rel_type', values='count')).fillna(0)
rel_cts['Total'] = rel_cts['KNOWS'] + rel_cts['APPEARS_IN']
rel_cts

Unnamed: 0,KNOWS,APPEARS_IN,Total
unimodal,171644.0,0.0,171644.0
bimodal,0.0,96104.0,96104.0
mixmodal,171644.0,96104.0,267748.0


When we break these down by relationship type, it becomes obvious that the Unimodal and Mixed Models have the same count for `KNOWS` relationships, while the Bimodal and Mixed Models have the same count for `APPEARS_IN`. The total count for the Mixed Model is just the addition of the relationships in the Unimodal and Bimodal Models.

This reflects how we created the models:

- Started with Bimodal:
  - Hero nodes
  - Comic nodes
  - `APPEARS_IN` relationships
- Created the Mixed model: added the `KNOWS` relationship between Hero nodes
  - Hero nodes
  - Comic nodes
  - `APPEARS_IN` relationships
  - `KNOWS` relationships
- Reduced to the Unimodal model: removed the Comic nodes and all the `APPEARS_IN` relationships
  - Hero nodes
  - `KNOWS` relationships

## Isolate Count and Percent

The isolate count and percent just give us more information about the connectedness of the graph. When we look at the isolate count and percent for the Bimodal Model below, we can see that the diagram I used at the beginning of the entry to illustrate the three global counts is actually impossible.

That graph has two nodes types (must be the Bimodal or Mixed Models) and relationships only between Hero to Comic (rules out the Mixed Model). However, the actual metrics tell us that there are no isolated nodes in the Bimodal Model. Good thing that diagram was for illustration purposes only.

In [7]:
pd.DataFrame(uni_session.run('''MATCH (n) WHERE NOT (n)--() 
WITH COUNT(distinct n) as isolates_count
MATCH ()-[r]->()
WITH count(r) as relation_ct, isolates_count
MATCH (c)
with count(distinct c) as node_count, isolates_count, relation_ct
return node_count, relation_ct, isolates_count,
round(toFloat(isolates_count)/node_count*10000) / 100 as isolates_pct''').data()).rename({0:'unimodal'}).append(
pd.DataFrame(bi_session.run('''MATCH (n) WHERE NOT (n)--() 
WITH COUNT(distinct n) as isolates_count
MATCH ()-[r]->()
WITH count(r) as relation_ct, isolates_count
MATCH (c)
with count(distinct c) as node_count, isolates_count, relation_ct
return node_count, relation_ct, isolates_count,
round(toFloat(isolates_count)/node_count*10000) / 100 as isolates_pct''').data())).rename({0:'bimodal'}).append(
pd.DataFrame(mix_session.run('''MATCH (n) WHERE NOT (n)--() 
WITH COUNT(distinct n) as isolates_count
MATCH ()-[r]->()
WITH count(r) as relation_ct, isolates_count
MATCH (c)
with count(distinct c) as node_count, isolates_count, relation_ct
return node_count, relation_ct, isolates_count,
round(toFloat(isolates_count)/node_count*10000) / 100 as isolates_pct''').data()).rename({0:'mixmodal'}))

Unnamed: 0,node_count,relation_ct,isolates_count,isolates_pct
unimodal,6439,171644,18,0.28
bimodal,19090,96104,0,0.0
mixmodal,19090,267748,0,0.0


# Up Next

Global Density Comparison

# Resources

- [Entry G4: Modeling Relationships](https://julielinx.github.io/blog/g04_graph_model_rels/)
- [Entry G6: Blobal Graph Counts](https://julielinx.github.io/blog/g06_global_counts/)
- [Entry G13: Weighted Degree Comparison](https://julielinx.github.io/blog/g13_weighted_degree_comparison/)
- [Entry G13 notebook](https://github.com/julielinx/datascience_diaries/blob/master/graph/13a_nb_weighted_degree_comparison.ipynb)