# Short Term Rentals - Data Import

We're going to work with a short term rentals dataset that we got from [InsideAirbnb](http://insideairbnb.com/). We'll be using Neo4j via the popular py2neo library.

We'll start by importing py2neo and the pandas library which we'll be using to play around with the data later on.

In [3]:
from py2neo import Graph
import pandas as pd

In [4]:
graph = Graph("bolt://localhost", auth=("neo4j", "neo"))

Let's create some variables for our import CSV files:

In [6]:
# listings_file = "https://guides.neo4j.com/listings/data/nyc/listings.csv.gz"
# reviews_file = "https://guides.neo4j.com/listings/data/nyc/reviews.csv.gz"

listings_file = "file:///listings.csv.gz"
reviews_file = "file:///reviews.csv.gz"

Now it's time to load the data into Neo4j. This is the graph that we want to create:

"arrows diagram here"

## Listings

In [11]:
constraint_query = """
CREATE CONSTRAINT ON (l:Listing)
ASSERT l.id IS UNIQUE
"""

import_query = """
LOAD CSV WITH HEADERS FROM $listingsFile AS row
WITH row WHERE row.id IS NOT NULL
MERGE (l:Listing {id: row.id})
SET l.name = row.name,
    l.price = toFloat(substring(row.price, 1)),
    l.weeklyPrice = toFloat(substring(row.weekly_price, 1)),
    l.cleaningFee = toFloat(substring(row.cleaning_fee, 1)),
    l.propertyType = row.property_type,
    l.accommodates = toInt(row.accommodates),
    l.bedrooms = toInt(row.bedrooms),
    l.bathrooms = toInt(row.bathrooms),
    l.availability365 = toInt(row.availability_365)
"""

graph.run(constraint_query).summary().counters
graph.run(import_query, {"listingsFile": listings_file}).summary().counters

{'labels_added': 50914, 'nodes_created': 50914, 'properties_set': 450338}

## Neighborhoods

In [12]:
constraint_query = """
CREATE CONSTRAINT ON (n:Neighborhood) 
ASSERT n.id IS UNIQUE
"""

import_query = """
LOAD CSV WITH HEADERS FROM $listingsFile AS row
WITH row WHERE row.id IS NOT NULL
MATCH (l:Listing {id: row.id})
MERGE (n:Neighborhood {id: coalesce(row.neighbourhood_cleansed, "NA")})
ON CREATE SET n.name = row.neighbourhood
MERGE (l)-[:IN_NEIGHBORHOOD]->(n);
"""

graph.run(constraint_query).summary().counters
graph.run(import_query, {"listingsFile": listings_file}).summary().counters

{'labels_added': 224, 'relationships_created': 50914, 'nodes_created': 224, 'properties_set': 439}

## Amenities

In [14]:
constraint_query = """
CREATE CONSTRAINT ON (a:Amenity) 
ASSERT a.name IS UNIQUE;
"""

import_query = """
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM $listingsFile AS row
WITH row WHERE row.id IS NOT NULL
MATCH (l:Listing {id: row.id})
WITH l, split(replace(replace(replace(row.amenities, '{', ''), '}', ''), '\"', ''), ',') AS amenities
UNWIND amenities AS amenity
MERGE (a:Amenity {name: amenity})
MERGE (l)-[:HAS]->(a)
"""

graph.run(constraint_query).summary().counters
graph.run(import_query, {"listingsFile": listings_file}).summary().counters

{'labels_added': 127, 'relationships_created': 981512, 'nodes_created': 127, 'properties_set': 127}

## Hosts

In [15]:
constraint_query = """
CREATE CONSTRAINT ON (h:Host) 
ASSERT h.id IS UNIQUE
"""

import_query = """
LOAD CSV WITH HEADERS FROM $listingsFile AS row
WITH row WHERE row.host_id IS NOT NULL
MERGE (h:Host {id: row.host_id})
ON CREATE SET h.name      = row.host_name,
              h.about     = row.host_abot,
              h.superhost = CASE WHEN row.host_is_super_host = "t" THEN True ELSE False END,
              h.location  = row.host_location,
              h.image     = row.host_picture_url
WITH row, h
MATCH (l:Listing {id: row.id})
MERGE (h)-[:HOSTS]->(l);
"""

graph.run(constraint_query)
graph.run(import_query, {"listingsFile": listings_file}).summary().counters

{'labels_added': 40309, 'relationships_created': 50914, 'nodes_created': 40309, 'properties_set': 201383}

## Reviews

In [21]:
user_constraint_query = """
CREATE CONSTRAINT ON (u:User) 
ASSERT u.id IS UNIQUE
"""

review_constraint_query = """
CREATE CONSTRAINT ON (r:Review) 
ASSERT r.id IS UNIQUE
"""


import_query = """
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM $reviewsFile AS row

// User
MERGE (u:User {id: row.reviewer_id})
SET u.name = row.reviewer_name

// Review
MERGE (r:Review {id: row.id})
SET r.date     = row.date,
    r.comments = row.comments
WITH row, u, r
MATCH (l:Listing {id: row.listing_id})
MERGE (u)-[:WROTE]->(r)
MERGE (r)-[:REVIEWS]->(l);
"""

graph.run(user_constraint_query).summary().counters
graph.run(review_constraint_query).summary().counters
graph.run(import_query, {"reviewsFile": reviews_file}).summary().counters

{'labels_added': 649493, 'relationships_created': 719078, 'nodes_created': 649493, 'properties_set': 3678110}

In the next notebook we'll explore the data we've imported.