### Terminal lines to download and extract the airbnb data

In [None]:
gsutil cp gs://cs327e-open-access/airbnb.zip .
unzip airbnb.zip

### Setting up CONNECT variable

In [1]:
%env CYPHER=/home/jupyter/neo4j-community-4.1.3/bin/cypher-shell
%env USER=neo4j
%env PW=neopass

env: CYPHER=/home/jupyter/neo4j-community-4.1.3/bin/cypher-shell
env: USER=neo4j
env: PW=neopass


In [2]:
CONNECT="$CYPHER -u $USER -p $PW"

In [3]:
!{CONNECT} "SHOW DATABASES"

+------------------------------------------------------------------------------------------------+
| name     | address          | role         | requestedStatus | currentStatus | error | default |
+------------------------------------------------------------------------------------------------+
| "neo4j"  | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
| "system" | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
+------------------------------------------------------------------------------------------------+

2 rows available after 776 ms, consumed after another 106 ms


### Empty the database

In [4]:
!{CONNECT} "MATCH (n) DETACH DELETE n"

0 rows available after 50 ms, consumed after another 0 ms


In [5]:
#confirm emptiness
!{CONNECT} "MATCH (n) RETURN n"

+---+
| n |
+---+
+---+

0 rows available after 29 ms, consumed after another 1 ms


### Load airbnb data into Neo4j

In [6]:
!cat /home/jupyter/Amaryllis/airbnb/load_data.cypher | {CONNECT} --format plain

COUNT(l)
5835
COUNT(a)
42
COUNT(n)
41
COUNT(h)
4633
COUNT(u)
55917
COUNT(r)
62976


### Verify successful data load (Goal: 129,444 nodes)

In [4]:
!{CONNECT} "MATCH (n) RETURN count(n)"

+----------+
| count(n) |
+----------+
| 129444   |
+----------+

1 row available after 197 ms, consumed after another 1 ms


### Get Node Counts for Unique Node Labels

In [5]:
!{CONNECT} "MATCH (n) RETURN distinct labels(n), count(n)"

+-----------------------------+
| labels(n)        | count(n) |
+-----------------------------+
| ["Listing"]      | 5835     |
| ["Amenity"]      | 42       |
| ["Neighborhood"] | 41       |
| ["Host"]         | 4633     |
| ["User"]         | 55917    |
| ["Review"]       | 62976    |
+-----------------------------+

6 rows available after 70 ms, consumed after another 414 ms


### Sample Data: 10 Random Nodes

In [6]:
!{CONNECT} "MATCH (n) RETURN n LIMIT 10"

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Listing {bedrooms: 1, listing_id: "72635", availability_365: 240, price: 300.0, accommodates: 6, name: "3 Private Bedrooms, SW Austin", property_type: "House", bathrooms: 2})                                      |
| (:Listing {bedrooms: 1, cleaning_fee: 75.0, weekly_price: 600.0, listing_id: "5386323", availability_365: 364, price: 99.0, ac

### Sample Data: 10 Random Relationships

In [7]:
!{CONNECT} "MATCH ()-[r]->() RETURN r LIMIT 10"

+------------+
| r          |
+------------+
| [:REVIEWS] |
| [:HOSTS]   |
| [:HOSTS]   |
| [:HOSTS]   |
| [:HOSTS]   |
| [:HOSTS]   |
| [:HOSTS]   |
| [:HOSTS]   |
| [:REVIEWS] |
| [:REVIEWS] |
+------------+

10 rows available after 112 ms, consumed after another 127 ms


### Q1. How many hosts are located in "Austin, Texas, United States"?

In [8]:
!{CONNECT} "MATCH (h:Host) WHERE h.location = 'Austin, Texas, United States' RETURN count(h)"

+----------+
| count(h) |
+----------+
| 3774     |
+----------+

1 row available after 77 ms, consumed after another 136 ms


### Q2. Which listings does host_id = "4641823" have? Return the listing name, property_type, price, and availability_365 sorted by price. Limit the results to 10.

In [9]:
!{CONNECT} "MATCH (h:Host)-[r]->(l:Listing) WHERE h.host_id='4641823' RETURN l.name, l.property_type, l.price, l.availability_365 ORDER BY l.price LIMIT 10"

+----------------------------------------------------------------------------------------+
| l.name                                | l.property_type | l.price | l.availability_365 |
+----------------------------------------------------------------------------------------+
| "1BR Convenient Austin Condo "        | "Apartment"     | 93.0    | 354                |
| "1BR Inviting Downtown Condo, Pool"   | "Apartment"     | 99.0    | 335                |
| "2BR/1.5BA Charming House Off SoCo"   | "House"         | 110.0   | 357                |
| "2BR Prime East-Side Downtown"        | "House"         | 121.0   | 341                |
| "1BR SoCo Treehouse Studio"           | "House"         | 129.0   | 327                |
| "1BR/1.5BA East 6th, Colorful 2Story" | "Apartment"     | 134.0   | 344                |
| "3BR Prestigious Home Travis Heights" | "House"         | 138.0   | 0                  |
| "1BR/1.5BA Perfectly Located Casita"  | "House"         | 140.0   | 351                |

### Q3. Which users wrote a review for listing_id = "5293632"? Return the user’s id and name sorted alphabetically by name. Limit the results to 10.

In [10]:
!{CONNECT} "MATCH (u:User)-[w:WROTE]->(r:Review)-[p:REVIEWS]->(l:Listing) WHERE l.listing_id = '5293632' RETURN u.user_id, u.name ORDER BY u.name LIMIT 10" 

+--------------------------------+
| u.user_id  | u.name            |
+--------------------------------+
| "18286390" | "Annie"           |
| "30193020" | "Carole"          |
| "16497582" | "Cory"            |
| "35022795" | "Craig And Trina" |
| "13281665" | "Dianne"          |
| "29601600" | "Hannah"          |
| "11940539" | "Jacob"           |
| "3213433"  | "Jessie"          |
| "41722221" | "Johannes"        |
| "28480571" | "Ju-Ju"           |
+--------------------------------+

10 rows available after 129 ms, consumed after another 57 ms


### Q4. Which users wrote a review for any listing which has the amenities "Washer" and "Dryer"? Return the user’s id and name sorted alphabetically by name. Limit the results to 10.

In [11]:
!{CONNECT} "MATCH (u:User)-[w:WROTE]->(r:Review)-[p:REVIEWS]->(l:Listing) WHERE (:Amenity{{name:'Washer'}})<-[:HAS]-(l)-[:HAS]->(:Amenity{{name:'Dryer'}}) RETURN u.user_id, u.name ORDER BY u.name LIMIT 10"

+-------------------------------------+
| u.user_id  | u.name                 |
+-------------------------------------+
| "6524431"  | "'Ley"                 |
| "8026901"  | "(We Are) Bonnie & Ky" |
| "14689717" | "(email hidden)"       |
| "11495251" | "(email hidden)"       |
| "10251681" | "(email hidden)"       |
| "8293309"  | "(email hidden)"       |
| "15315643" | "(email hidden)"       |
| "12694638" | "(email hidden)"       |
| "13381969" | "(email hidden)"       |
| "12694638" | "(email hidden)"       |
+-------------------------------------+

10 rows available after 501 ms, consumed after another 2341 ms


### Q5. Which listings have 3 bedrooms and are located in the Clarksville neighborhood? Return the listing name, property_type, price, and availability_365 sorted by price. Limit the results to 5.

In [12]:
!{CONNECT} "MATCH (l:Listing)-[r]->(n:Neighborhood) WHERE l.bedrooms=3 AND n.name='Clarksville' RETURN l.name, l.property_type, l.price, l.availability_365 ORDER BY l.price LIMIT 5"

+----------------------------------------------------------------------------------------+
| l.name                                | l.property_type | l.price | l.availability_365 |
+----------------------------------------------------------------------------------------+
| "3BR/2.5BA Exquisite Townhouse"       | "House"         | 222.0   | 358                |
| "3BR/2.5BA Tarrytown Duplex, Austin!" | "House"         | 249.0   | 336                |
| "Austin downtown hideaway"            | "House"         | 249.0   | 364                |
| "3BD Luxury Cottage by Lake Austin"   | "House"         | 290.0   | 309                |
| "Entire Adorable Downtown House"      | "House"         | 295.0   | 309                |
+----------------------------------------------------------------------------------------+

5 rows available after 60 ms, consumed after another 19 ms


### Q6. Which amenities are the most common? Return the name of the amenity and its frequency. Sort the results by count in descending order. Limit the results to 5.

In [13]:
!{CONNECT} "MATCH (l:Listing)-[r]->(a:Amenity) RETURN a.name, count(*) as frequency ORDER BY frequency DESC LIMIT 5"

+----------------------------------------+
| a.name                     | frequency |
+----------------------------------------+
| "Air Conditioning"         | 5615      |
| "Wireless Internet"        | 5479      |
| "Heating"                  | 5440      |
| "Kitchen"                  | 5400      |
| "Free Parking on Premises" | 5123      |
+----------------------------------------+

5 rows available after 39 ms, consumed after another 277 ms


### Q7. Which neighborhoods have the highest number of listings? Return the neighborhood’s name and zip code (neighborhood_id) along with the number of listings they have sorted by the number of listings in descending order. Limit the results to 5.

In [14]:
!{CONNECT} "MATCH (l:Listing)-[r]->(n:Neighborhood) RETURN n.name, n.neighborhood_id, count(*) as num_of_listings ORDER BY num_of_listings DESC LIMIT 5"

+--------------------------------------------------------+
| n.name           | n.neighborhood_id | num_of_listings |
+--------------------------------------------------------+
| NULL             | "78704"           | 1601            |
| NULL             | "78702"           | 797             |
| "Clarksville"    | "78703"           | 419             |
| "East Riverside" | "78741"           | 414             |
| NULL             | "78745"           | 328             |
+--------------------------------------------------------+

5 rows available after 33 ms, consumed after another 31 ms
