# Python with MongoDB (using pymongo as driver)

### The dataset/collection already exist in the mongodb database. This notebook connects to existing mongodb collection; performing queries and aggregations.

In [1]:
# install pymongo
!pip install pymongo



In [2]:
import pymongo
from pprint import pprint

#### Connecting to the Mongodb Client

In [3]:
client = pymongo.MongoClient('mongodb+srv://esivw1eo:Family2020@cluster0.wxcci.mongodb.net/test')

#### Data Description

In [4]:
result = client['mydatabase']['house_rent_data_description'].find({},{'_id': False})

for x in result:
    pprint(x)

{'Area Locality': 'Locality of the Houses/Apartments/Flats',
 'Area Type': 'Size of the Houses/Apartments/Flats calculated on either Super '
              'Area or Carpet Area or Build Area',
 'BHK': 'Number of Bedrooms, Hall, Kitchen',
 'Bathroom': 'Number of Bathrooms',
 'City': 'City where the Houses/Apartments/Flats are Located',
 'Floor': 'Houses/Apartments/Flats situated in which Floor and Total Number of '
          'Floors (Example: Ground out of 2, 3 out of 5, etc.)',
 'Furnishing Status': 'Furnishing Status of the Houses/Apartments/Flats, '
                      'either it is Furnished or Semi-Furnished or Unfurnished',
 'Point of Contact': 'Whom should you contact for more information regarding '
                     'the Houses/Apartments/Flats',
 'Rent': 'Rent of the Houses/Apartments/Flats',
 'Size': 'Size of the Houses/Apartments/Flats in Square Feet',
 'Tenant Preferred': 'Type of Tenant Preferred by the Owner or Agent'}


#### Queries done on the **House_rent_data** collection

##### Query1

The first query done is to get the first ten documents with Furnishing status field starting with the letters **un** irrespective of its case and with 2 bathrooms sorted by decreasing order of rent.

In [5]:
filter1={'Furnishing Status': {'$regex': '^Un', '$options': 'i'}, 'Bathroom': 2}
sort=list({'Rent': -1}.items())

result1 = client['mydatabase']['house_rent_data'].find(filter=filter1, sort=sort, limit=10)

for x in result1:
    pprint(x)

{'Area Locality': 'Vettuvankeni',
 'Area Type': 'Carpet Area',
 'BHK': 2,
 'Bathroom': 2,
 'City': 'Chennai',
 'Floor': '1 out of 1',
 'Furnishing Status': 'Unfurnished',
 'Point of Contact': 'Contact Owner',
 'Posted On': datetime.datetime(2022, 7, 6, 0, 0),
 'Rent': 600000.0,
 'Size': 950.0,
 'Tenant Preferred': 'Bachelors',
 '_id': ObjectId('63128587fd93c813f3257ff9')}
{'Area Locality': 'Bandra West',
 'Area Type': 'Carpet Area',
 'BHK': 2,
 'Bathroom': 2,
 'City': 'Mumbai',
 'Floor': '4 out of 8',
 'Furnishing Status': 'Unfurnished',
 'Point of Contact': 'Contact Agent',
 'Posted On': datetime.datetime(2022, 6, 24, 0, 0),
 'Rent': 210000.0,
 'Size': 1000.0,
 'Tenant Preferred': 'Bachelors',
 '_id': ObjectId('63128586fd93c813f3257404')}
{'Area Locality': 'Avenue S, Santoshpur',
 'Area Type': 'Carpet Area',
 'BHK': 2,
 'Bathroom': 2,
 'City': 'Kolkata',
 'Floor': 'Ground out of 1',
 'Furnishing Status': 'Unfurnished',
 'Point of Contact': 'Contact Owner',
 'Posted On': datetime.datet

##### Query 2

The second query done is to get the first 20 documents of our houses for rent posted in Kolkata city with **Tenant preferred** field containing the words **family** irrespectinve of its case or with any combination with any other words/letters.

In [6]:
filter2={'City': 'Kolkata', 'Tenant Preferred': {'$regex': '.*Family.*', '$options': 'i'}}


result2 = client['mydatabase']['house_rent_data'].find(filter=filter2, limit=20)
for x in result2:
    pprint(x)

{'Area Locality': 'Bandel',
 'Area Type': 'Super Area',
 'BHK': 2,
 'Bathroom': 2,
 'City': 'Kolkata',
 'Floor': 'Ground out of 2',
 'Furnishing Status': 'Unfurnished',
 'Point of Contact': 'Contact Owner',
 'Posted On': datetime.datetime(2022, 5, 18, 0, 0),
 'Rent': 10000.0,
 'Size': 1100.0,
 'Tenant Preferred': 'Bachelors/Family',
 '_id': ObjectId('63128586fd93c813f32571b1')}
{'Area Locality': 'Phool Bagan, Kankurgachi',
 'Area Type': 'Super Area',
 'BHK': 2,
 'Bathroom': 1,
 'City': 'Kolkata',
 'Floor': '1 out of 3',
 'Furnishing Status': 'Semi-Furnished',
 'Point of Contact': 'Contact Owner',
 'Posted On': datetime.datetime(2022, 5, 13, 0, 0),
 'Rent': 20000.0,
 'Size': 800.0,
 'Tenant Preferred': 'Bachelors/Family',
 '_id': ObjectId('63128586fd93c813f32571b2')}
{'Area Locality': 'Salt Lake City Sector 2',
 'Area Type': 'Super Area',
 'BHK': 2,
 'Bathroom': 1,
 'City': 'Kolkata',
 'Floor': '1 out of 3',
 'Furnishing Status': 'Semi-Furnished',
 'Point of Contact': 'Contact Owner',
 

#### Aggregation pipeline

##### Pipeline 1

In [7]:
pipe1 = client['mydatabase']['house_rent_data'].aggregate([{'$group': {'_id': '$Tenant Preferred', 'Num_of_Houses_listed': {'$sum': 1}}}])

for x in pipe1:
    pprint(x)

{'Num_of_Houses_listed': 3444, '_id': 'Bachelors/Family'}
{'Num_of_Houses_listed': 830, '_id': 'Bachelors'}
{'Num_of_Houses_listed': 472, '_id': 'Family'}


Alternatively, we could use the count aggregator

In [8]:
pipe1alt = client['mydatabase']['house_rent_data'].aggregate([{'$group': {'_id': '$Tenant Preferred', 'Num_of_Houses_listed': {'$count': {}}}}])

for x in pipe1alt:
    pprint(x)

{'Num_of_Houses_listed': 472, '_id': 'Family'}
{'Num_of_Houses_listed': 3444, '_id': 'Bachelors/Family'}
{'Num_of_Houses_listed': 830, '_id': 'Bachelors'}


##### Pipeline 2

In [9]:
pipe2 = client['mydatabase']['house_rent_data'].aggregate([{'$group': {'_id': '$Furnishing Status', 'Num_of_Houses_listed': {'$sum': 1}}}])

for x in pipe2:
    pprint(x)

{'Num_of_Houses_listed': 1815, '_id': 'Unfurnished'}
{'Num_of_Houses_listed': 2251, '_id': 'Semi-Furnished'}
{'Num_of_Houses_listed': 680, '_id': 'Furnished'}


Alternatively, we could use the count aggregator (**$count**)

In [10]:
pipe2alt = client['mydatabase']['house_rent_data'].aggregate([{'$group': {'_id': '$Furnishing Status', 'Num_of_Houses_listed': {'$count': {}}}}])

for x in pipe2alt:
    pprint(x)

{'Num_of_Houses_listed': 680, '_id': 'Furnished'}
{'Num_of_Houses_listed': 1815, '_id': 'Unfurnished'}
{'Num_of_Houses_listed': 2251, '_id': 'Semi-Furnished'}
