# Data Analysis
Now that the data is uploaded to the Neo4j server, we can use the same py2neo package to make queries to the database

## Overview
First, let's just take a look at some basic metrics about our data: Total number of Nodes, breakdown of comics vs heroes, and how many edges our graph has.

In [1]:
%cd /home/jovyan/work
from py2neo import Graph
import time

# Connect to the Neo4j database
graph = Graph("bolt://neo4j:7687", auth=("neo4j", "1234"))

# Count nodes
total_nodes_count = graph.evaluate("MATCH (n) RETURN count(n)")
hero_nodes_count = graph.evaluate("MATCH (n:hero) RETURN count(n)")
comic_nodes_count = graph.evaluate("MATCH (n:comic) RETURN count(n)")

print(f"Total nodes count: {total_nodes_count}")
print(f"Hero nodes count: {hero_nodes_count}")
print(f"Comic nodes count: {comic_nodes_count}")

# Count edges
total_edges_count = graph.evaluate("MATCH ()-[r]->() RETURN count(r)")
print(f"Total edges count: {total_edges_count}")


/home/jovyan/work
Total nodes count: 19090
Hero nodes count: 6439
Comic nodes count: 12651
Total edges count: 94527


### Timing Function
Let's also define a function to time how long our queries take: 

In [2]:
# Function to execute a Cypher query and measure execution time
def execute_query(query):
    start_time = time.time()
    result = graph.run(query).data()
    end_time = time.time()
    completion_time = end_time - start_time
    return result, completion_time

## Most Common Heroes
Let's take a look at which heroes appear in the most stories, by querying the database for which node's of `type = 'hero'` have the most relationships

In [3]:

# Find the nodes of type 'hero' with the most connections
query = """
MATCH (hero:hero)--(connected)
RETURN hero.name AS hero_name, COUNT(connected) AS connection_count
ORDER BY connection_count DESC
LIMIT 20
"""

result, completion_time = execute_query(query)

print("Heroes with the most comic book appearances:")
for idx, record in enumerate(result, start=1):
    hero_name = record['hero_name']
    connection_count = record['connection_count']
    print(f"   {idx}. Hero: {hero_name}, Comic Count: {connection_count}")

print(f"Execution time: {completion_time:.2f} seconds")

Heroes with the most comic book appearances:
   1. Hero: CAPTAIN AMERICA, Comic Count: 1334
   2. Hero: IRON MAN/TONY STARK, Comic Count: 1150
   3. Hero: THING/BENJAMIN J. GR, Comic Count: 963
   4. Hero: THOR/DR. DONALD BLAK, Comic Count: 956
   5. Hero: HUMAN TORCH/JOHNNY S, Comic Count: 886
   6. Hero: MR. FANTASTIC/REED R, Comic Count: 854
   7. Hero: HULK/DR. ROBERT BRUC, Comic Count: 835
   8. Hero: WOLVERINE/LOGAN, Comic Count: 819
   9. Hero: INVISIBLE WOMAN/SUE, Comic Count: 762
   10. Hero: SCARLET WITCH/WANDA, Comic Count: 643
   11. Hero: BEAST/HENRY &HANK& P, Comic Count: 635
   12. Hero: DR. STRANGE/STEPHEN, Comic Count: 631
   13. Hero: WATSON-PARKER, MARY, Comic Count: 622
   14. Hero: DAREDEVIL/MATT MURDO, Comic Count: 619
   15. Hero: HAWK, Comic Count: 605
   16. Hero: VISION, Comic Count: 603
   17. Hero: CYCLOPS/SCOTT SUMMER, Comic Count: 585
   18. Hero: WASP/JANET VAN DYNE, Comic Count: 581
   19. Hero: JAMESON, J. JONAH, Comic Count: 577
   20. Hero: ANT-MAN/DR

## Captain America's Social Network
Now let's take a look at our most popular hero- Captain America. We'll query the database to find how many different comic's he's appeared in, how many different heroes he's appeared with, and which of those heroes he has appeared with most often.

In [4]:
# 1. How many nodes is 'CAPTAIN AMERICA' connected to
query1 = """
MATCH (ca:hero {name: 'CAPTAIN AMERICA'})--(connected)
RETURN count(DISTINCT connected) AS connected_nodes_count
"""

result1, completion_time1 = execute_query(query1)
connected_nodes_count = result1[0]['connected_nodes_count']
print(f"1. 'CAPTAIN AMERICA' is appears in {connected_nodes_count} comics.")
print(f"   Execution time: {completion_time1:.2f} seconds")

# 2. How many unique nodes are connected to nodes which are connected to 'CAPTAIN AMERICA'
query2 = """
MATCH (ca:hero {name: 'CAPTAIN AMERICA'})--(connected)--(connected_to_connected)
WHERE connected <> connected_to_connected
RETURN count(DISTINCT connected_to_connected) AS unique_connected_nodes_count
"""

result2, completion_time2 = execute_query(query2)
unique_connected_nodes_count = result2[0]['unique_connected_nodes_count']
print(f"2. There are {unique_connected_nodes_count} unique heroes which have appeared in the same comic as Captain America.")
print(f"   Execution time: {completion_time2:.2f} seconds")

# 3. What nodes appear most frequently in the second order connections
query3 = """
MATCH (ca:hero {name: 'CAPTAIN AMERICA'})--(connected)--(connected_to_connected)
WHERE connected <> connected_to_connected
WITH connected_to_connected, COUNT(connected_to_connected) AS freq
RETURN connected_to_connected.name AS node, freq
ORDER BY freq DESC
LIMIT 20
"""

result3, completion_time3 = execute_query(query3)
print("3. Captain America's Most Frequent Collaborators:")
for idx, record in enumerate(result3, start=1):
    node_name = record['node']
    frequency = record['freq']
    print(f"   {idx}. Hero: {node_name}, Co-Appearances: {frequency}")

print(f"   Execution time: {completion_time3:.2f} seconds")


1. 'CAPTAIN AMERICA' is appears in 1334 comics.
   Execution time: 0.13 seconds
2. There are 1918 unique heroes which have appeared in the same comic as Captain America.
   Execution time: 0.11 seconds
3. Captain America's Most Frequent Collaborators:
   1. Hero: IRON MAN/TONY STARK, Co-Appearances: 440
   2. Hero: VISION, Co-Appearances: 385
   3. Hero: THOR/DR. DONALD BLAK, Co-Appearances: 380
   4. Hero: WASP/JANET VAN DYNE, Co-Appearances: 376
   5. Hero: SCARLET WITCH/WANDA, Co-Appearances: 373
   6. Hero: HAWK, Co-Appearances: 319
   7. Hero: ANT-MAN/DR. HENRY J., Co-Appearances: 289
   8. Hero: JARVIS, EDWIN, Co-Appearances: 246
   9. Hero: WONDER MAN/SIMON WIL, Co-Appearances: 215
   10. Hero: FALCON/SAM WILSON, Co-Appearances: 189
   11. Hero: HERCULES [GREEK GOD], Co-Appearances: 183
   12. Hero: SHE-HULK/JENNIFER WA, Co-Appearances: 172
   13. Hero: THING/BENJAMIN J. GR, Co-Appearances: 170
   14. Hero: BEAST/HENRY &HANK& P, Co-Appearances: 169
   15. Hero: MR. FANTASTIC/REE