# Question 2

#### Imports

I downloaded the yelp academic database and, since it was in json format, I decided to use Mongodb to create the database. I began by importing all of the json files into Mongodb. I only used the business collection to answer the question given, so only the business collection import is shown.
```
(base) [mtweed@aLinMac yelp_dataset]$ mongoimport --db=yelp --collection=business --file=yelp_academic_dataset_business.json 
2020-12-02T10:58:39.013-0500	connected to: mongodb://localhost/
2020-12-02T10:58:42.014-0500	[########................] yelp.business	52.6MB/146MB (36.1%)
2020-12-02T10:58:45.014-0500	[#################.......] yelp.business	105MB/146MB (72.2%)
2020-12-02T10:58:47.260-0500	[########################] yelp.business	146MB/146MB (100.0%)
2020-12-02T10:58:47.260-0500	209393 document(s) imported successfully. 0 document(s) failed to import.
```

In [1]:
from pymongo import MongoClient, GEOSPHERE 
from bson.regex import Regex
import warnings

This line was added to suppress the deprication warnings thrown by cur.count()

In [2]:
warnings.simplefilter('ignore')

#### DB connection

In [3]:
client = MongoClient('localhost',27017)

In [4]:
db = client['yelp']

In [5]:
db.list_collection_names()

['user', 'business', 'review', 'checkin', 'tip']

#### Example document
The following is an example of a document from the business database.

In [6]:
db.business.find_one()

{'_id': ObjectId('5fc832cc2720115715f99c27'),
 'business_id': 'f9NumwFMBDn751xgFiRbNA',
 'name': 'The Range At Lake Norman',
 'address': '10913 Bailey Rd',
 'city': 'Cornelius',
 'state': 'NC',
 'postal_code': '28031',
 'latitude': 35.4627242,
 'longitude': -80.8526119,
 'stars': 3.5,
 'review_count': 36,
 'is_open': 1,
 'attributes': {'BusinessAcceptsCreditCards': 'True',
  'BikeParking': 'True',
  'GoodForKids': 'False',
  'BusinessParking': "{'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}",
  'ByAppointmentOnly': 'False',
  'RestaurantsPriceRange2': '3'},
 'categories': 'Active Life, Gun/Rifle Ranges, Guns & Ammo, Shopping',
 'hours': {'Monday': '10:0-18:0',
  'Tuesday': '11:0-20:0',
  'Wednesday': '10:0-18:0',
  'Thursday': '11:0-20:0',
  'Friday': '11:0-20:0',
  'Saturday': '11:0-20:0',
  'Sunday': '13:0-18:0'}}

## Create a GEOjson object from lat and lng

The original dataset uses simple coordinate pairs.  I wish to use the geospatial query tools of mongodb, so I first created a GEOjson object out of the simple coordinates

In [7]:
cur = db.business.find({})

for i in range(cur.count()):
    nxt = cur.next()
    _id = nxt['_id']
    lat = nxt['latitude']
    lng = nxt['longitude']
    
    db.business.update_one({'_id':_id},{'$set': {'location': {'type': 'Point',
                                                              'coordinates': [lng, lat]}}})

In [8]:
db.business.find_one()

{'_id': ObjectId('5fc832cc2720115715f99c27'),
 'business_id': 'f9NumwFMBDn751xgFiRbNA',
 'name': 'The Range At Lake Norman',
 'address': '10913 Bailey Rd',
 'city': 'Cornelius',
 'state': 'NC',
 'postal_code': '28031',
 'latitude': 35.4627242,
 'longitude': -80.8526119,
 'stars': 3.5,
 'review_count': 36,
 'is_open': 1,
 'attributes': {'BusinessAcceptsCreditCards': 'True',
  'BikeParking': 'True',
  'GoodForKids': 'False',
  'BusinessParking': "{'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}",
  'ByAppointmentOnly': 'False',
  'RestaurantsPriceRange2': '3'},
 'categories': 'Active Life, Gun/Rifle Ranges, Guns & Ammo, Shopping',
 'hours': {'Monday': '10:0-18:0',
  'Tuesday': '11:0-20:0',
  'Wednesday': '10:0-18:0',
  'Thursday': '11:0-20:0',
  'Friday': '11:0-20:0',
  'Saturday': '11:0-20:0',
  'Sunday': '13:0-18:0'},
 'location': {'type': 'Point', 'coordinates': [-80.8526119, 35.4627242]}}

## Create a geosphere index on location

Now that the documents have a GEOjson object I created an index on the location field as well as indexes on the 'name' and 'categories'.  All of these attributes will be used for search later on.

In [9]:
db.business.create_index([('location', GEOSPHERE)])

'location_2dsphere'

In [10]:
db.business.create_index('name')

'name_1'

In [11]:
db.business.create_index('categories')

'categories_1'

# User Interface

This example user interface performs the query specified in the question.

In [12]:
def UI():
    print('********** Business finder **********\n')
    
    # This while loop takes in the user input and ensures that it is correct
    # It creates the variable that are used in the final search
    while True: 
        bus = input('Enter a business name: ') 
        t = input('Enter the type of a second business: ')
        d = float(input('Enter the maximum distance in miles: '))

        cur = db.business.find( {'name': Regex(bus, flags='i')}, 
                               {'name':1, 'address':1, 
                                'city':1, 'state':1, 'location':1} )
        res = cur.count()
        if res == 0:
            print('\n!!!No businesses were found with that name!!!\n')
            continue
        if res == 1:
            temp = cur.next()
            _id = temp['_id']
            name = temp['name']
            address = temp['address']
            city = temp['city']
            state = temp['state']
            location = temp['location']
            print('\nOne business found:\n',name,'\n',address,'\n',city,'\n',state,'\n')
            cor = input('Is this correct? [y/n]:').lower()
            if cor == 'y':
                break
            else:
                print('\nWe cound not find the requested business.')
                continue
        if res > 1:
            temps = [cur.next() for x in range(res)]
            print(f'\n\n{res} businesses found matching that name.\n')
            for n in range(len(temps)):
                print('*****************************')
                print(f'{n+1}) ',
                      temps[n]['name'],'\n   ',
                      temps[n]['address'],'\n   ',
                      temps[n]['city'],'\n   ',
                      temps[n]['state'],'\n')
                
                # 
                con = input('\n\nPress enter for more or enter s to select a number: ')
                if con == 's':
                    break
                else:
                    continue
            while True:
                try:
                    cor = int(input('\nEnter the number of the correct business or 0: '))
                    break
                except:
                    print('Invalid input.')
                    continue
            if cor == 0:
                print('\nWe cound not find the requested business.')
                continue
            else:
                name = temps[cor-1]['name']
                location = temps[cor-1]['location']
                break
                
    # The $nearsphere query is measured in meters so this conversion into miles is necessary
    METERS_PER_MILE = 1609.34
    
    # Final search
    cur = db.business.find({ 'location': 
                            { '$nearSphere': 
                             { '$geometry': location, '$maxDistance': d * METERS_PER_MILE } }, 
                            'categories': Regex(t, flags='i') }, 
                          {'name':1, 'address':1,'city':1,'state':1,'categories':1})
    
    
    # This prints out the results one by one until the user is satisfied.
    # The $nearSphere query returns the results with increasing distance from location
    res = cur.count()
    temp = [cur.next() for n in range(res)]
    print(f'\n\nThe following are the closest businesses to {name}\n',
          'of the specified type. (increasing distance)\n')
    for n in range(len(temp)):
        nxt = temp[n]
        if temp[n]['name'] == name:
            continue
        else:
            print('*************************')
            print(temp[n]['name'],'\n',
                  temp[n]['address'],'\n',
                  temp[n]['city'],'\n',
                  temp[n]['state'],'\n',
                  temp[n]['categories'],'\n')
        con = input('\nPress enter for more or enter q to quit: ')
        if con == 'q':
            break
        else:
            continue


# Answer

The following shows the answers the query for hospitals around the business Big Chicken.

In [13]:
UI()

********** Business finder **********

Enter a business name: big chicken
Enter the type of a second business: hospital
Enter the maximum distance in miles: 5

One business found:
 Big Chicken 
 4480 Paradise Rd, Ste 1200 
 Las Vegas 
 NV 

Is this correct? [y/n]:y


The following are the closest businesses to Big Chicken
 of the specified type. (increasing distance)

*************************
Paradise Pet Hospital 
 1060 E Flamingo Rd 
 Las Vegas 
 NV 
 Pet Services, Pet Boarding, Pet Sitting, Pets, Pet Groomers, Emergency Pet Hospital, Veterinarians 


Press enter for more or enter q to quit: 
*************************
Elite Medical Center 
 150 E Harmon Ave 
 Las Vegas 
 NV 
 Health & Medical, Hospitals, Emergency Rooms, Medical Centers, Doctors 


Press enter for more or enter q to quit: 
*************************
Desert Springs Hospital Medical Center 
 2075 E Flamingo Rd 
 Las Vegas 
 NV 
 Health & Medical, Medical Centers, Doctors, Hospitals 


Press enter for more or enter q to

### Exploration queries

I performed some other queries to explore the function.

In [14]:
UI()

********** Business finder **********

Enter a business name: taco
Enter the type of a second business: car wash
Enter the maximum distance in miles: 2


1153 businesses found matching that name.

*****************************
1)  202 Hometown Tacos 
    407 Lincoln Ave 
    Bellevue 
    PA 



Press enter for more or enter s to select a number: 
*****************************
2)  3 Amigos Taco Express 
    2263 Kresge Dr 
    Amherst 
    OH 



Press enter for more or enter s to select a number: 
*****************************
3)  911 Taco Bar 
     
    Las Vegas 
    NV 



Press enter for more or enter s to select a number: s

Enter the number of the correct business or 0: 2


The following are the closest businesses to 3 Amigos Taco Express
 of the specified type. (increasing distance)

*************************
24 Hour Laser Wash 
 612 N Leavitt Rd 
 Amherst 
 OH 
 Car Wash, Auto Repair, Automotive 


Press enter for more or enter q to quit: 


In [16]:
UI()

********** Business finder **********

Enter a business name: luxor
Enter the type of a second business: fast food
Enter the maximum distance in miles: 4


16 businesses found matching that name.

*****************************
1)  Blue Man Theatre At Luxor 
    3770 S Las Vegas Blvd 
    Las Vegas 
    NV 



Press enter for more or enter s to select a number: 
*****************************
2)  Johnny Rockets - Luxor 
    3900 S Las Vegas Blvd 
    Las Vegas 
    NV 



Press enter for more or enter s to select a number: 
*****************************
3)  Luxor Auto Group 
    909 W Main St 
    Mesa 
    AZ 



Press enter for more or enter s to select a number: 
*****************************
4)  Luxor Auto Group 
    3220 N Scottsdale Rd 
    Scottsdale 
    AZ 



Press enter for more or enter s to select a number: 
*****************************
5)  Luxor Cooling and Heating 
    3336 E Clifton Ave 
    Gilbert 
    AZ 



Press enter for more or enter s to select a number: 
*******

In [17]:
client.close()

### Moving forward

To achieve the desired outcome, this implementaion would have to be integrated into a web app, Furthermore, I used regex to match partial string and it handles case sensitivity well but it is only able to match properly spelled and ordered partial strings. It would be important to improve the fuzzy/partial string matching aspect but, according to my research, Mongo DB is ineffective at this task. Also, it would have to be determined whether or not this implementation is adequate since the initial search returned veterinary hospitals along with human hospitals.