## Question 4
I set up an ssh tunnel to work with the sample_airbnb database in python.

#### Server set up

In [18]:
from sshtunnel import SSHTunnelForwarder
from pymongo import MongoClient
from random import sample
from pprint import pprint
from getpass import getpass

In [20]:
MONGO_HOST = "10.10.11.10"
MONGO_DB = "sample_airbnb"
MONGO_USER = "mtweed"
MONGO_PASS = getpass("Enter your password: ")

Enter your password: ········


In [21]:
server = SSHTunnelForwarder(
    MONGO_HOST,
    ssh_username=MONGO_USER,
    ssh_password=MONGO_PASS,
    remote_bind_address=('127.0.0.1', 27017)
)

In [22]:
server.start()

In [23]:
client = MongoClient('127.0.0.1', server.local_bind_port)

In [24]:
db = client[MONGO_DB]

In [26]:
listings = db.listingsAndReviews

I began by characterizing an example document to see what fields were present and determin how to make the required queries if possible.

In [28]:
cursor = listings.find()

In [32]:
pprint(cursor.next())

{'_id': '13317720',
 'access': '',
 'accommodates': 4,
 'address': {'country': 'Brazil',
             'country_code': 'BR',
             'government_area': 'Recreio dos Bandeirantes',
             'location': {'coordinates': [-43.48662201316989,
                                          -23.022882730052668],
                          'is_location_exact': True,
                          'type': 'Point'},
             'market': 'Rio De Janeiro',
             'street': 'Rio de Janeiro, Rio de Janeiro, Brazil',
             'suburb': 'Recreio dos Bandeirantes'},
 'amenities': ['TV',
               'Air conditioning',
               'Pool',
               'Kitchen',
               'Free parking on premises',
               'Smoking allowed',
               'Gym',
               'Elevator',
               'Family/kid friendly',
               'Washer',
               'Dryer',
               'Smoke detector',
               'First aid kit',
               'Fire extinguisher',
               '

### Part a

I found that there were three feilds that could be searched to find listings that are pet or kid friendly.  I found instances of 'pet friendly' and 'kid friendly' in the summary and description field. I used this query to return the listing that contained these phrases in the summary and description field.

In [46]:
cursor = listings.find({'$or' : [{'summary' : {'$regex' : 'pet friendly'}}, 
                                 {'summary' : {'$regex' : 'kid friendly'}},
                                 {'description' : {'$regex' : 'pet friendly'}}, 
                                 {'description' : {'$regex' : 'kid friendly'}}]},
                                 {'_id':1, 'summary':1,'description':1, 'amenities':1})

In [47]:
pprint(cursor.next())

{'_id': '18250284',
 'amenities': ['TV',
               'Cable TV',
               'Internet',
               'Wifi',
               'Air conditioning',
               'Kitchen',
               'Pets allowed',
               'Pets live on this property',
               'Dog(s)',
               'Heating',
               'Family/kid friendly',
               'Smoke detector',
               'Carbon monoxide detector',
               'First aid kit',
               'Fire extinguisher',
               'Essentials',
               'Shampoo',
               'Lock on bedroom door',
               'Hangers',
               'translation missing: en.hosting_amenity_50',
               'TV',
               'Host greets you'],
 'description': 'Perfect to explore the whole city - central to subways and '
                'easy access to all parts of the city.  Centrally located just '
                'one block north of Central Park in an area now known as South '
                'Harlem.   We have 

This search seemed effective, however, I found some interesting cases that suggested that this was not the best method of querying this info. For example, this one uses 'pet friendly' in the description to indicate that people should be okay with her cat but the listing is not actually 'pet friendly'.

In [58]:
cursor = listings.find({'_id':'18944834'},
                       {'summary':1,'description':1,'amenities':1})

In [59]:
pprint(cursor.next())

{'_id': '18944834',
 'amenities': ['TV',
               'Wifi',
               'Kitchen',
               'Paid parking off premises',
               'Breakfast',
               'Pets live on this property',
               'Elevator',
               'Heating',
               'Washer',
               'Dryer',
               'Smoke detector',
               'Carbon monoxide detector',
               'First aid kit',
               'Essentials',
               'Shampoo',
               'Hangers',
               'Hair dryer',
               'translation missing: en.hosting_amenity_49',
               'translation missing: en.hosting_amenity_50',
               'Hot water',
               'Other'],
 'description': 'spectacular private bedroom in UES! a lot of natural lights! '
                'You can use kitchen and we will share a bathroom you will '
                'have your own private bedroom and we will share bathroom, '
                'kitchen and living room a lot of bars and resta

Another using pet friendly in description but not okay to bring pets:

In [56]:
cursor = listings.find({'_id':'13839238'},
                       {'summary':1,'description':1,'amenities':1})

In [57]:
pprint(cursor.next())

{'_id': '13839238',
 'amenities': ['Wifi',
               'Air conditioning',
               'Kitchen',
               'Gym',
               'Breakfast',
               'Elevator',
               'Heating',
               'Family/kid friendly',
               'Washer',
               'Dryer',
               'Smoke detector',
               'Fire extinguisher',
               'Essentials',
               'Shampoo',
               'Hair dryer',
               'translation missing: en.hosting_amenity_50'],
 'description': 'Bright room located in a two bedrooms apartment in BedStuy. '
                'New building with elevator, gym, rooftop and laundry in '
                'facilities. There is a cool coffee shop across the street and '
                'also a wine store.  Ideal for solo adventurers. The room is '
                'perfect for someone who wants to discover the city, easy '
                'access to main attractions in Brooklyn and Manhattan. As any '
                'othe

Due to this, I decided that the best way to query this information would be to use the 'amenities' field.  Another advantage to using this field is that the entries are all exact phrases. The phrases that indicate pet and kid friendliness are "Pets allowed", "Dog(s)", "Cats(s), "Other pet(s)", "Family/kid friendly", and "Children’s books and toys".  I used the following query to search for these terms.

In [67]:
cursor = listings.find({'$or' : [{'amenities' : {'$regex' : 'Pets allowed'}}, 
                                                {'amenities' : {'$regex' : 'Family/kid friendly'}},
                                                {'amenities' : {'$regex' : 'Dog(s)'}},
                                                {'amenities' : {'$regex' : 'Cat(s)'}},
                                                {'amenities' : {'$regex' : 'Other pet(s)'}},
                                                {'amenities' : {'$regex' : 'Children’s books and toys'}}]},
                                                {'_id':1, 'summary':1,'description':1, 'amenities':1})

In [68]:
pprint(cursor.next())

{'_id': '13316078',
 'amenities': ['TV',
               'Wifi',
               'Air conditioning',
               'Kitchen',
               'Free parking on premises',
               'Smoking allowed',
               'Gym',
               'Elevator',
               'Family/kid friendly',
               'Washer',
               'Essentials',
               'Shampoo',
               'Lock on bedroom door',
               'Hair dryer',
               'Iron',
               'Laptop friendly workspace'],
 'description': 'Meu espaço é perto de Próximo ao shopping, praia, '
                'supermercado, restaurantes. Você vai amar meu espaço por '
                'causa de o pé-direito alto, a localização e o ambiente. Meu '
                'espaço é bom para famílias (com crianças) e grandes grupos. '
                'Próximo ao estádio das olimpiadas',
 'summary': 'Meu espaço é perto de Próximo ao shopping, praia, supermercado, '
            'restaurantes. Você vai amar meu espaço por caus

### Part b

I do not think it would be possible to find, with absolute certainty, every listing that is within 1km of public transportation with the data contained in this database.  If we had a listing of the location of every public transportation location we could use the lattitude and longitude to place every listing in relation to them.  Then we could list every listing that was within 1km.  Eventhough that is not very realistic for this dataset, in many listing (especially those in europe) the transport field contins information about the nearest public transportation.  We can write a general query to find any mention of local public transportation by searching the transit field for as many indicator words as possible (I used bus, subway, walking, trolley, tram, metro, underground, and station).

In [73]:
cursor = listings.find({'$or' : [{'transit' : {'$regex':'bus'}}, 
                                 {'transit' : {'$regex':'subway'}}, 
                                 {'transit' : {'$regex':'walking'}}, 
                                 {'transit' : {'$regex':'trolley'}}, 
                                 {'transit' : {'$regex':'tram'}}, 
                                 {'transit' : {'$regex':'metro'}}, 
                                 {'transit' : {'$regex':'underground'}}, 
                                 {'transit' : {'$regex':'station'}}]}, 
                       {'address':1,'transit':1})

In [74]:
pprint(cursor.next())

{'_id': '12396367',
 'address': {'country': 'Spain',
             'country_code': 'ES',
             'government_area': 'Sant Pere, Santa Caterina i la Ribera',
             'location': {'coordinates': [2.18116, 41.38719],
                          'is_location_exact': False,
                          'type': 'Point'},
             'market': 'Barcelona',
             'street': 'Barcelona, CT, Spain',
             'suburb': 'Ciutat Vella'},
 'transit': 'Near to metro Arc de Triomph but you can also approach any barrio '
            'walking: 10min to Gothic, 15min to Raval, 10min to Passeig de '
            'Gracia and shopping zone!!'}


Additionally, many of the listings (specifically those in europe) list the distance to the various public transport, so we can add an addition constraint to the query.

In [75]:
cursor = listings.find({'$or' : [{'transit' : {'$regex':'bus'}}, 
                                 {'transit' : {'$regex':'subway'}}, 
                                 {'transit' : {'$regex':'walking'}}, 
                                 {'transit' : {'$regex':'trolley'}}, 
                                 {'transit' : {'$regex':'tram'}}, 
                                 {'transit' : {'$regex':'metro'}}, 
                                 {'transit' : {'$regex':'underground'}}, 
                                 {'transit' : {'$regex':'station'}}],
                       "transit": {'$regex': '1km'}}, 
                       {'address':1,'transit':1})

In [76]:
pprint(cursor.next())

{'_id': '15748956',
 'address': {'country': 'Australia',
             'country_code': 'AU',
             'government_area': 'Sydney',
             'location': {'coordinates': [151.18334, -33.88081],
                          'is_location_exact': True,
                          'type': 'Point'},
             'market': 'Sydney',
             'street': 'Glebe, NSW, Australia',
             'suburb': 'Glebe'},
 'transit': 'Sydney Airport - 9km away or 20mins by car depending on traffic.  '
            'Light Rail - Glebe Station is less than 1km or a 10 minute walk '
            'Darling Harbour - 3km away Sydney CBD - 3.5 km accessible by '
            'foot, busses, light rail  Sydney University and Broadway shopping '
            '- 1km away, easy walk. Blackwater Bay - 1km away to gorgeous park'}


In [78]:
server.stop()

In [79]:
server.is_active

False