# Load the CSV file

In this module, you will learn about:
- Creating nodes and relationships with data from a CSV file
- Assigning properties to nodes and relationships
- The importance of unique identifiers and how to create constraints

You will load a CSV file of "person" data into Person nodes in Neo4j. The CSV file contains the following fields:

- person_tmdbId
- bio
- born
- bornIn
- died
- person_imdb
- Id
- name
- person_poster
- person_url

Download the [persons.csv](https://data.neo4j.com/importing-cypher/persons.csv?_gl=1*15dcbv9*_ga*MTkzMzgxNTk1LjE3NTcyNTg0MzQ.*_ga_DZP8Z65KK4*czE3NjI2NDg5NzIkbzMyJGcxJHQxNzYyNjQ5NTA0JGoyMyRsMCRoMA..*_gcl_au*MjEzNTI4NjkxNy4xNzU3MjU4NDMzLjc4MDQ1OTczLjE3NTg0MTY3NjUuMTc1ODQxNjc2NA..*_ga_DL38Q8KGQC*czE3NjI2NDg5NzIkbzMyJGcxJHQxNzYyNjQ5NTA0JGoyMyRsMCRoMA..) file

In [2]:
import os

from dotenv import load_dotenv

load_dotenv()

import textwrap
from neo4j import GraphDatabase
from utils import execute_query


neo4j_uri = os.getenv("NEO4J_URI")
neo4j_user = os.getenv("NEO4J_USERNAME")
neo4j_pass = os.getenv("NEO4J_PASSWORD")
neo4j_db = os.getenv("NEO4J_DATABASE")


neo4j_driver = GraphDatabase.driver(neo4j_uri,
                                   auth=(neo4j_user,neo4j_pass))


cypher = textwrap.dedent("""
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
RETURN row
""")

res = execute_query(neo4j_driver, cypher)

print(res)

<neo4j._sync.work.result.Result object at 0x7f48789ce6f0>


# Create Person nodes

Run the Cypher statement to create the Person nodes:

```cypher
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = row.born,
p.died = row.died
```

In [3]:
cypher = textwrap.dedent("""
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = row.born,
p.died = row.died
""")

res = execute_query(neo4j_driver, cypher)

print(res)

<neo4j._sync.work.result.Result object at 0x7f4877fa0dd0>


Confirm the data is in the graph by returning the first 25 Person nodes:

In [4]:
cypher = textwrap.dedent("""
MATCH (p:Person) RETURN p LIMIT 25
""")

res = execute_query(neo4j_driver, cypher)

print(res)

<neo4j._sync.work.result.Result object at 0x7f4877fa1790>


# Unique IDs and Constraints



A Neo4j best practice is to use an ID as a unique property value for each node.

Unique IDs help ensure duplicate data is not created. When you load data from CSV files, you rely heavily upon the IDs specified in the file. If the IDs in your CSV file are not unique for the same entity (node), you could create duplicate data. You may also have problems loading the data and creating relationships between nodes.

You can add constraints to your database to stop the creation of nodes with duplicate IDs.

## Create a unique constraint

The syntax for creating a unique constraint on a property is:

```cypher
CREATE CONSTRAINT [constraint_name] [IF NOT EXISTS]
FOR (n:LabelName)
REQUIRE n.propertyName IS UNIQUE
```

The constraint is for a property on all nodes with a specified label.

The `constraint_name` is optional, but it is good practice to specify one. If you do not specify a constraint name, Neo4j will create one for you.

The `IF NOT EXISTS` clause is also optional - if not used Neo4j will generate an error if the constraint already exists.

## Person node constraint

The Person nodes you created should all have a unique tmbdId property.

You can create a constraint for the tmdbId property to ensure that all Person nodes have a unique `tmdbId` property value.

Review the following Cypher statement.
```cypher
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE
```

- The constraint name is Person_tmdbId.

- The optional clause IF NOT EXISTS is used - without which Neo4j would raise an error if the constraint exists.

- It applies to all nodes with the Person label.

- It requires the tmdbId property to be unique.


If you try to create a Person node with a duplicate tmdbId property value, Neo4j will raise an error.

```cypher
CREATE (p:Person {tmdbId: 3}) RETURN p
```

=>

```
22N80: Data exception - index entry conflict
Index entry conflict: Node(0) already exists with label `Label[0]` and property `PropertyKey[10]` = 3.
```

In [14]:
import pandas as pd

In [5]:
cypher = textwrap.dedent("""
CREATE CONSTRAINT Person_tmdbId
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE
""")

res = execute_query(neo4j_driver, cypher)

<neo4j._sync.work.result.Result object at 0x7f4877fbc7d0>


### Show constraint

You can check that the constraint has been created by running the following Cypher statement:

```cypher
SHOW CONSTRAINTS
```

In [28]:
cypher = textwrap.dedent("""
SHOW CONSTRAINTS
""")

res = execute_query(neo4j_driver, cypher)

res

<neo4j._sync.work.result.Result at 0x7f48761dfad0>

### Drop constaint
If you need to drop a constraint, use the following Cypher statement.

```cypher
DROP CONSTRAINT [constraint_name]
```