# Neo4j and Cypher Basics with Python: Social Network Analysis

In this notebook, we will practice Neo4j, a graph database, to model a simple social network. We will cover the concepts of:
- Connecting to a Neo4j database from Python
- Creating nodes and relationships
- Running Cypher queries

**Create a new project**:
A project in Neo4j is a container for one or more databases. It helps to organize multiple databases that are related to a specific task, theme, or application. For example, we might have a project called "Social Network" that contains databases for development, testing, and production environments.
   - Open Neo4j Desktop.
   - Click on "New" under the "Projects" section.
   - Name the project (e.g., "Social Network").
   
**Create a new database**:
A database within a project is an actual instance where the graph data is stored. Each database has its own data and schema. For example, within the "Social Network" project, we might have a database named "SocialNetworkDB" where we store all user nodes and their relationships.
   - Click on "Add" and choose "Local DBMS".
   - Name your database (e.g., "Social Network DBMS"), set a password (e.g., "SocialNetworkDBMS"), and click "Create".
   - Click "Start" to start your new database.


#### Setting up the connection to Neo4j from Python
First, we need to set up the connection to the Neo4j database. Make sure the Neo4j database is running.

In [1]:
from neo4j import GraphDatabase

# Define the connection details
uri = "bolt://localhost:7687"  # default URI for Neo4j
user = "neo4j"  # default user
password = "SocialNetworkDBMS"

# Create a driver instance
driver = GraphDatabase.driver(uri, auth=(user, password))

# Verify the connection
def test_connection(driver):
    try:
        with driver.session() as session:
            result = session.run("RETURN 'Connection successful' AS message")
            for record in result:
                print(record["message"])
    except Exception as e:
        print(f"Connection failed: {e}")

test_connection(driver)

Connection successful


### Creating nodes
Nodes are the entities in a graph. In our social network, users will be nodes.

In [2]:
# Function to create a user node
def create_user(tx, name, age):
    tx.run("CREATE (u:User {name: $name, age: $age})", name=name, age=age)

# Add some users to the database
with driver.session() as session:
    session.execute_write(create_user, "Alice", 30)
    session.execute_write(create_user, "Bob", 25)
    session.execute_write(create_user, "Charlie", 35)
    session.execute_write(create_user, "Diana", 28)
    session.execute_write(create_user, "Eli", 33)

print("Users created successfully.")

Users created successfully.


##### Explanation of the CREATE query

The `CREATE` query is used to create nodes and relationships in the graph. Here's the structure:

```cypher
CREATE (alias:Label {property1: value1, property2: value2, ...})
```

* `alias`: A variable that refers to the node. Variables in Cypher are used to refer to nodes, relationships, and paths in queries.
* `Label`: The label that categorizes the node.
* `property1, property2, ...`: Properties of the node.

In our example:

```cypher
CREATE (u:User {name: $name, age: $age})
```

* `(u:User ...)` creates a node with the label `User`. Here, `u` is a variable representing a node with the label User.
* `{name: $name, age: $age}` sets the properties `name` and `age`.

### Creating relationships

#### Creating directed relationships
Relationships connect nodes. Relationships can be directed, indicating a one-way connection between nodes. For example, a "follower" relationship where one user follows another.

In [3]:
# Function to create a FOLLOW relationship (directed)
def create_follower(tx, follower, followed):
    tx.run("""
    MATCH (a:User {name: $follower}), (b:User {name: $followed})
    CREATE (a)-[:FOLLOW]->(b)
    """, follower=follower, followed=followed)

# Add some follower relationships to the database
with driver.session() as session:
    session.execute_write(create_follower, "Alice", "Charlie")
    session.execute_write(create_follower, "Bob", "Eli")

print("Follower relationships created successfully.")

Follower relationships created successfully.


##### Explanation of the MATCH and CREATE query

The `MATCH` query is used to find existing nodes, and the `CREATE` query is used to create relationships. Here's the structure:

```cypher
MATCH (alias1:Label {property1: value1}), (alias2:Label {property2: value2})
CREATE (alias1)-[:RELATIONSHIP_TYPE]->(alias2)
```

* `MATCH (alias1:Label {property1: value1})`: Finds nodes that match the given label and property.
* `CREATE (alias1)-[:RELATIONSHIP_TYPE]->(alias2)`: Creates a relationship of type `RELATIONSHIP_TYPE` between the matched nodes found in the `MATCH` query.

In our example:

```cypher
MATCH (a:User {name: $follower}), (b:User {name: $followed})
CREATE (a)-[:FOLLOW]->(b)
```

* `MATCH (a:User {name: $follower}), (b:User {name: $followed})`: Finds users named `$follower` and `$followed`.
* `CREATE (a)-[:FOLLOW]->(b)`: Creates a `FOLLOW` relationship from user `a` to user `b`.

#### Creating bidirectional relationships
To model a bidirectional relationship, such as friendship, we need to create two directed relationships in opposite directions. In our social network, users can be friends with each other.

In [4]:
# Function to create a FRIEND relationship between two users
def create_friendship(tx, name1, name2):
    tx.run("""
    MATCH (a:User {name: $name1}), (b:User {name: $name2})
    CREATE (a)-[:FRIEND]->(b), (b)-[:FRIEND]->(a)
    """, name1=name1, name2=name2)

# Add some friendships to the database
with driver.session() as session:
    session.execute_write(create_friendship, "Alice", "Bob")
    session.execute_write(create_friendship, "Alice", "Charlie")
    session.execute_write(create_friendship, "Bob", "Diana")
    session.execute_write(create_friendship, "Diana", "Eli")

print("Friendships created successfully.")

Friendships created successfully.


##### Explanation CREATE query for bidirectional relationships

In our example:

```cypher
MATCH (a:User {name: $name1}), (b:User {name: $name2})
CREATE (a)-[:FRIEND]->(b), (b)-[:FRIEND]->(a)
```

* `MATCH (a:User {name: $name1}), (b:User {name: $name2})`: Finds users named `$name1` and `$name2`.
* `CREATE (a)-[:FRIEND]->(b), (b)-[:FRIEND]->(a)`: Creates a `FRIEND` relationship for users `a` and user `b`.
    * `(a)-[:FRIEND]->(b)`: Creates a `FRIEND` relationship from user `a` to user `b`.
    * `(b)-[:FRIEND]->(a)`: Creates a `FRIEND` relationship from user `b` to user `a`.

### Adding different types of nodes
We can have different types of nodes representing various entities in the domain model. For example, in a social network, besides User nodes, we might have Post and Comment nodes.

In [5]:
# Function to create a post node
def create_post(tx, id, content, timestamp):
    tx.run("CREATE (p:Post {id: $id, content: $content, timestamp: $timestamp})", id=id, content=content, timestamp=timestamp)

# Function to create a comment node
def create_comment(tx, id, content, timestamp):
    tx.run("CREATE (c:Comment {id: $id, content: $content, timestamp: $timestamp})", id=id, content=content, timestamp=timestamp)

# Add some posts and comments to the database
with driver.session() as session:
    session.execute_write(create_post, 1, "Hello, world!", "2024-07-16T10:00:00Z")
    session.execute_write(create_post, 2, "Good night!", "2024-07-16T11:00:00Z")
    session.execute_write(create_comment, 1, "Great post!", "2024-07-16T12:00:00Z")
    session.execute_write(create_comment, 2, "Very informative.", "2024-07-16T12:30:00Z")

print("Posts and comments created successfully.")

Posts and comments created successfully.


In these queries, we create two types of nodes, `Post` and `Comment`, each with properties such as `id`, `content`, and `timestamp`.

### Creating Relationships between Users, Posts, and Comments
Let's add the relationships between users and posts (e.g., POSTED), and between users and comments (e.g., COMMENTED).

In [6]:
# Function to create a POSTED relationship between a user and a post
def create_posted_relationship(tx, user_name, post_id):
    tx.run("""
    MATCH (u:User {name: $user_name}), (p:Post {id: $post_id})
    CREATE (u)-[:POSTED]->(p)
    """, user_name=user_name, post_id=post_id)

# Function to create a COMMENTED relationship between a user and a comment
def create_commented_relationship(tx, user_name, comment_id):
    tx.run("""
    MATCH (u:User {name: $user_name}), (c:Comment {id: $comment_id})
    CREATE (u)-[:COMMENTED]->(c)
    """, user_name=user_name, comment_id=comment_id)
    
    # Function to create a HAS_COMMENT relationship between a post and a comment
def create_has_comment_relationship(tx, post_id, comment_id):
    tx.run("""
    MATCH (p:Post {id: $post_id}), (c:Comment {id: $comment_id})
    CREATE (p)-[:HAS_COMMENT]->(c)
    """, post_id=post_id, comment_id=comment_id)
    
# Add relationships to the database
with driver.session() as session:
    # Alice and Bob post something
    session.execute_write(create_posted_relationship, "Alice", 1)
    session.execute_write(create_posted_relationship, "Bob", 2)
    
    # Alice and Charlie comment on posts
    session.execute_write(create_commented_relationship, "Alice", 1)
    session.execute_write(create_commented_relationship, "Charlie", 2)
    
    # Connect comments to posts
    session.execute_write(create_has_comment_relationship, 1, 2)
    session.execute_write(create_has_comment_relationship, 2, 1)

print("Relationships created successfully.")

Relationships created successfully.


### Adding properties to relationships
Relationships can have properties just like nodes. These properties can store various types of data such as strings, numbers, booleans, and arrays. This is useful for adding metadata to the relationships.

In [7]:
# Function to create a LIKES relationship with properties
def create_like(tx, user, post_id, timestamp):
    tx.run("""
    MATCH (u:User {name: $user}), (p:Post {id: $post_id})
    CREATE (u)-[:LIKES {timestamp: $timestamp}]->(p)
    """, user=user, post_id=post_id, timestamp=timestamp)

# Add some likes with properties to the database
with driver.session() as session:
    session.execute_write(create_like, "Alice", 1, "2024-07-16T12:00:00Z")
    session.execute_write(create_like, "Bob", 2, "2024-07-16T12:30:00Z")

print("Likes with properties created successfully.")

Likes with properties created successfully.


In these queries, we create LIKES relationships between users and posts with an additional timestamp property. A property of the LIKES relationship indicating when the like was made. This property is stored as a string representing the date and time.

* `MATCH (u:User {name: $user}), (p:Post {id: $post_id})`: Finds the user and post nodes based on the specified properties.
* `CREATE (u)-[:LIKES {timestamp: $timestamp}]->(p)`: Creates a LIKES relationship between the user and post with a timestamp property.

### Deleting nodes and relationships

We can also delete nodes and relationships using the `DELETE` query.

In [8]:
# Function to delete a user node
def delete_user(tx, name):
    tx.run("MATCH (u:User {name: $name}) DETACH DELETE u", name=name)

# Function to delete a FRIEND relationship
def delete_friendship(tx, name1, name2):
    tx.run("""
    MATCH (a:User {name: $name1})-[r:FRIEND]->(b:User {name: $name2})
    DELETE r
    """, name1=name1, name2=name2)

# Delete user 'Diana'
with driver.session() as session:
    session.execute_write(delete_user, "Diana")

print("User Diana deleted successfully.")

# Delete friendship between 'Alice' and 'Charlie'
with driver.session() as session:
    session.execute_write(delete_friendship, "Alice", "Charlie")

print("Friendship between Alice and Charlie deleted successfully.")

User Diana deleted successfully.
Friendship between Alice and Charlie deleted successfully.


##### Explanation of the DELETE query

The `DELETE` query is used to remove nodes and relationships from the graph.

* Deleting a node without relationships - When we delete a node, we are removing that node and all the data associated with it from the graph. If the node does not have any relationships, it can be deleted directly using:
    ```cypher
    MATCH (u:User {name: $name})
    DELETE u
    ```

* Deleting a node with relationships
If the node has relationships, we must also handle those relationships. By default, Neo4j will not allow us to delete a node that still has relationships because it would leave dangling references in the graph. To delete a node and its relationships, you can use the `DETACH DELETE` command. This command removes the node and all relationships connected to it:

    ```cypher
    MATCH (u:User {name: $name})
    DETACH DELETE u
    ```

    - `DETACH DELETE u`: Ensures that the user and all relationships connected to the user are deleted. This prevents leaving orphaned relationships in the database.

* Deleting relationships only
We can delete a relationship between nodes without deleting the nodes by matching the relationship and then using the DELETE command:

    ```cypher
    MATCH (a:User {name: $name1})-[r:FRIEND]->(b:User {name: $name2})
    DELETE r
    ```
    
    - `MATCH (a:User {name: $name1})-[r:FRIEND]->(b:User {name: $name2})`: Finds the `FRIEND` relationship between `$name1` and `$name2`.
    - `DELETE r`: Deletes the matched relationship.

### Running queries
Now, let's run some Cypher queries to retrieve and analyze data.

In [9]:
# Function to get all users
def get_all_users(tx):
    result = tx.run("MATCH (u:User) RETURN u.name AS name, u.age AS age")
    return [(record["name"], record["age"]) for record in result]

# Retrieve and print all users
print("All users:")
with driver.session() as session:
    users = session.execute_write(get_all_users)
    for name, age in users:
        print(f"User: {name}, Age: {age}")
        
# Function to get all posts
def get_all_posts(tx):
    result = tx.run("MATCH (p:Post) RETURN p.id AS id, p.content AS content, p.timestamp AS timestamp")
    return [(record["id"], record["content"], record["timestamp"]) for record in result]

# Retrieve and print all posts
print("\nAll posts:")
with driver.session() as session:
    posts = session.execute_write(get_all_posts)
    for id, content, timestamp in posts:
        print(f"Post ID: {id}, Content: {content}, Timestamp: {timestamp}")
        
# Function to get all comments
def get_all_comments(tx):
    result = tx.run("MATCH (c:Comment) RETURN c.id AS id, c.content AS content, c.timestamp AS timestamp")
    return [(record["id"], record["content"], record["timestamp"]) for record in result]

# Retrieve and print all comments
print("\nAll comments:")
with driver.session() as session:
    comments = session.execute_write(get_all_comments)
    for id, content, timestamp in comments:
        print(f"Comment ID: {id}, Content: {content}, Timestamp: {timestamp}")

All users:
User: Bob, Age: 25
User: Charlie, Age: 35
User: Alice, Age: 30
User: Eli, Age: 33

All posts:
Post ID: 1, Content: Hello, world!, Timestamp: 2024-07-16T10:00:00Z
Post ID: 2, Content: Good night!, Timestamp: 2024-07-16T11:00:00Z

All comments:
Comment ID: 1, Content: Great post!, Timestamp: 2024-07-16T12:00:00Z
Comment ID: 2, Content: Very informative., Timestamp: 2024-07-16T12:30:00Z


##### Explanation of the MATCH and RETURN query

The `MATCH` query is used to find nodes, and the `RETURN` query is used to get specific data from those nodes. Here's the structure:

```cypher
MATCH (alias:Label)
RETURN alias.property1, alias.property2, ...
```

- `MATCH (alias:Label)`: Finds nodes with the given label.
- `RETURN alias.property1, alias.property2, ...`: Returns the specified properties of the matched nodes.

In our example:
```cypher
MATCH (u:User)
RETURN u.name AS name, u.age AS age
```
- `MATCH (u:User)`: Finds all nodes with the label `User`. `u` is an alias for these nodes.
- `RETURN u.name AS name, u.age AS age`: Returns the `name` and `age` properties of the matched nodes.

In [10]:
# Function to get friends of a user
def get_friends(tx, name):
    result = tx.run("""
    MATCH (u:User {name: $name})-[:FRIEND]->(friend)
    RETURN friend.name AS name, friend.age AS age
    """, name=name)
    return [(record["name"], record["age"]) for record in result]

# Retrieve and print friends of Alice
with driver.session() as session:
    friends = session.execute_write(get_friends, "Alice")
    print("Alice's friends:")
    for name, age in friends:
        print(f"Friend: {name}, Age: {age}")

Alice's friends:
Friend: Bob, Age: 25


##### Explanation of the MATCH and RETURN query with relationships
In this query, we are also matching relationships. Here's the structure:

```cypher
MATCH (alias1:Label {property1: value1})-[:RELATIONSHIP_TYPE]->(alias2)
RETURN alias2.property1, alias2.property2, ...
```

- `MATCH (alias1:Label {property1: value1})-[:RELATIONSHIP_TYPE]->(alias2)`: Finds nodes and their relationships.
- `RETURN alias2.property1, alias2.property2, ...`: Returns the specified properties of the related nodes.

In our example, we're finding users by their name and then finding their friends through the `FRIEND` relationship.:
```cypher
MATCH (u:User {name: $name})-[:FRIEND]->(friend)
RETURN friend.name AS name, friend.age AS age
```
- `MATCH (u:User {name: $name})-[:FRIEND]->(friend)`: Finds users named `$name` and their friends.
    - `MATCH (u:User {name: $name})`: Finds the user node with the specified name. `$name` is a parameter.
    - `[:FRIEND]->(friend)`: Finds nodes that have a `FRIEND` relationship from the matched user. `friend` is an alias for these related nodes.
- `RETURN friend.name AS name, friend.age AS age`: Returns the `name` and `age` properties of the friends.

In [11]:
# Function to get friends of friends
def get_friends_of_friends(tx, name):
    result = tx.run("""
    MATCH (u:User {name: $name})-[:FRIEND]->(:User)-[:FRIEND]->(fof)
    RETURN fof.name AS name, fof.age AS age
    """, name=name)
    return [(record["name"], record["age"]) for record in result]

# Retrieve and print friends of friends of Alice
with driver.session() as session:
    friends_of_friends = session.execute_write(get_friends_of_friends, "Alice")
    print("Alice's friends of friends:")
    for name, age in friends_of_friends:
        print(f"Friend of Friend: {name}, Age: {age}")

Alice's friends of friends:
Friend of Friend: Alice, Age: 30


##### Explanation of the MATCH and RETURN query for friends of friends

This query finds nodes that are connected through two relationships. Here's the structure:

```cypher
MATCH (alias1:Label {property1: value1})-[:RELATIONSHIP_TYPE1]->(:Label)-[:RELATIONSHIP_TYPE2]->(alias2)
RETURN alias2.property1, alias2.property2, ...
```

- `MATCH (alias1:Label {property1: value1})-[:RELATIONSHIP_TYPE1]->(:Label)-[:RELATIONSHIP_TYPE2]->(alias2)`: Finds nodes connected through two relationships.
- `RETURN alias2.property1, alias2.property2, ...`: Returns the specified properties of the nodes at the end of the relationships.

In our example:
```cypher
MATCH (u:User {name: $name})-[:FRIEND]->(:User)-[:FRIEND]->(fof)
RETURN fof.name AS name, fof.age AS age
```
- `MATCH (u:User {name: $name})-[:FRIEND]->(:User)-[:FRIEND]->(fof)`: Finds users named `$name` and their friends' friends.
    - `MATCH (u:User {name: $name})`: Finds the user node with the specified name.
    - `[:FRIEND]->(:User)`: Finds user nodes that are friends of the matched user. The friend nodes are not given an alias here.
    - `[:FRIEND]->(fof)`: Finds nodes that are friends of the friend nodes. `fof` is an alias for these friends of friends.
- `RETURN fof.name AS name, fof.age AS age`: Returns the `name` and `age` properties of the friends' friends.

In [12]:
# Function to get users who liked a specific post
def get_users_who_liked_post(tx, post_id):
    result = tx.run("""
    MATCH (u:User)-[r:LIKES]->(p:Post {id: $post_id})
    RETURN u.name AS name, r.timestamp AS liked_at
    """, post_id=post_id)
    return [(record["name"], record["liked_at"]) for record in result]

# Retrieve and print users who liked post with ID 1
with driver.session() as session:
    likers = session.execute_write(get_users_who_liked_post, 1)
    print("Users who liked post ID 1:")
    for name, liked_at in likers:
        print(f"User: {name}, Liked at: {liked_at}")


Users who liked post ID 1:
User: Alice, Liked at: 2024-07-16T12:00:00Z


##### Explanation

* `MATCH (u:User)-[r:LIKES]->(p:Post {id: $post_id})`: Finds users who have a LIKES relationship to the post with the specified id.
* `RETURN u.name AS name, r.timestamp AS liked_at`: Returns the name property of the users and the timestamp property of the LIKES relationship.

In [13]:
# Function to get posts made by a specific user
def get_posts_by_user(tx, user_name):
    result = tx.run("""
    MATCH (u:User {name: $user_name})-[:POSTED]->(p:Post)
    RETURN p.id AS id, p.content AS content, p.timestamp AS timestamp
    """, user_name=user_name)
    return [(record["id"], record["content"], record["timestamp"]) for record in result]

# Retrieve and print posts made by Alice
with driver.session() as session:
    posts = session.execute_write(get_posts_by_user, "Alice")
    print("Posts made by Alice:")
    for id, content, timestamp in posts:
        print(f"Post ID: {id}, Content: {content}, Timestamp: {timestamp}")


# Function to get comments made by a specific user
def get_comments_by_user(tx, user_name):
    result = tx.run("""
    MATCH (u:User {name: $user_name})-[:COMMENTED]->(c:Comment)
    RETURN c.id AS id, c.content AS content, c.timestamp AS timestamp
    """, user_name=user_name)
    return [(record["id"], record["content"], record["timestamp"]) for record in result]

# Retrieve and print comments made by Alice
with driver.session() as session:
    comments = session.execute_write(get_comments_by_user, "Alice")
    print("\nComments made by Alice:")
    for id, content, timestamp in comments:
        print(f"Comment ID: {id}, Content: {content}, Timestamp: {timestamp}")
        

# Function to get comments on a specific post
def get_comments_on_post(tx, post_id):
    result = tx.run("""
    MATCH (p:Post {id: $post_id})-[:HAS_COMMENT]->(c:Comment)
    RETURN c.id AS id, c.content AS content, c.timestamp AS timestamp
    """, post_id=post_id)
    return [(record["id"], record["content"], record["timestamp"]) for record in result]

# Retrieve and print comments on post with ID 1
with driver.session() as session:
    comments = session.execute_write(get_comments_on_post, 1)
    print("\nComments on Post ID 1:")
    for id, content, timestamp in comments:
        print(f"Comment ID: {id}, Content: {content}, Timestamp: {timestamp}")

Posts made by Alice:
Post ID: 1, Content: Hello, world!, Timestamp: 2024-07-16T10:00:00Z

Comments made by Alice:
Comment ID: 1, Content: Great post!, Timestamp: 2024-07-16T12:00:00Z

Comments on Post ID 1:
Comment ID: 2, Content: Very informative., Timestamp: 2024-07-16T12:30:00Z


##### Explanation

* `MATCH (u:User {name: $user_name})-[:POSTED]->(p:Post)`: Finds posts that a user with the specified name has posted.
* `RETURN p.id AS id, p.content AS content, p.timestamp AS timestamp`: Returns the id, content, and timestamp properties of the posts.

### Deleting all nodes and relationships
To completely empty the database, we can delete all nodes and relationships. This is useful when we want to reset the database.

In [14]:
# Function to delete all nodes and relationships
def delete_all(tx):
    tx.run("MATCH (n) DETACH DELETE n")

# Delete all nodes and relationships in the database
with driver.session() as session:
    session.execute_write(delete_all)

print("All nodes and relationships deleted successfully.")

All nodes and relationships deleted successfully.


#### Closing the connection
Finally, close the connection to the database.

In [15]:
# Close the driver connection
driver.close()
print("Connection closed.")

Connection closed.
