# FIT3182 Assignment Part A

### Eu Jia Xin (30881676)


## Task 1. MongoDB Data Model 
Design a suitable data model to support efficient querying of the 2 datasets in MongoDB.

### 1. Example of the data model  


The following shows an **embedding** data model. 


```json

// This is an example of how the embedded data model record would look like for 7/3/2021 data.

    {'GHI': 161,
     '_id': ObjectId('627aa67f6be6617d47cf5fe8'),
     'air_temperature_celcius': 19,
     'date': datetime.datetime(2021, 3, 7, 0, 0),
     'hotspot': [{'confidence': 88,
                  'datetime': datetime.datetime(2021, 3, 7, 4, 16, 10),
                  'latitude': -37.7752,
                  'longitude': 141.9086,
                  'surface_temperature_celcius': 64}],
     'max_wind_speed': 20.0,
     'precipitation': 0.0,
     'precipitation_flag': 'I',
     'relative_humidity': 51.5,
     'station': 948701,
     'windspeed_knots': 10.2}
```


### 2. Justification for choosing the data model

Based on the two datasets, the climate data is recorded daily, whereas fire data is recorded based on the occurence of a fire on a particular day. Therefore, we can say that `climate_historic.csv` and `hotspot_historic.csv` has a one-to-many relationship.

Since there is a one-to-many relationship, we can use an **embedding** data model where we will *embed* the hotspot data into the climate data. For example, for every record of the climate data, there will be zero or more hotspot data inside the 'hotspot' list within the record.

The advantages of using an embedding data model is that **reading the embedded results is fast and easy**. In other words, this data model is more efficient as read operations are reduced in an embedding data model compared to a referencing model. Additionally, we will have a single collection in our database called climate, so we don't have to join between different collections.

The disadvantage of using embedding data model may be that it is unsuitable if the length of hotspots for each climate is very massive and long. In our case, this is not a big problem because MongoDB document size limit (16MB) is more than enough, and can hold very large number of hotspot data. Since our data focuses on hotspots in Victoria per day, realistically there won't be that many hotspots such that MongoDB cannot accomodate. 

## Task 2. Querying MongoDB using PyMongo

### Q2.1:  
Write a python program that will read the data from `hotspot_historic.csv` and `climate_historic.csv` and load them to the new database (e.g. fit3182_assignment_db). The collection(s) in fit3182_assignment_db will be based on the document model you have designed in Task A1.


In [16]:
# libraries for pymongo 
import pymongo
from pymongo import MongoClient

# libraries to process csv
import pandas as pd
import json 
from datetime import datetime 
from pprint import pprint

# file paths
CLIMATE_PATH = "datasets/climate_historic.csv"
HOTSPOT_PATH = "datasets/hotspot_historic.csv"

# date formates
DATE_FORMAT = '%Y-%m-%d'
DATETIME_FORMAT = '%Y-%m-%dT%H:%M:%S'

In [17]:
# read CSV data

climate_df = pd.read_csv(CLIMATE_PATH)
hotspot_df = pd.read_csv(HOTSPOT_PATH)

# convert datetimes
climate_df['date'] = pd.to_datetime(climate_df['date'], dayfirst=True)
hotspot_df['datetime'] = pd.to_datetime(hotspot_df['datetime'], dayfirst=True)
hotspot_df['date'] = pd.to_datetime(hotspot_df['date'], dayfirst=True)

# strip whitespaces in column names + rename to GHI for easier JSON read after
climate_df.rename(columns=lambda x: x.strip(), inplace=True)
climate_df.rename(columns={"GHI_w/m2":"GHI"}, inplace=True)

# # pre-process precipitation into value and flag (2 separate columns)
result = climate_df['precipitation'].str.split('(\d.[\d+]*)([A-I])', expand=True)
result = result.loc[:,[1,2]]
result.rename(columns={1:'precipitation', 2:'precipitation_flag'}, inplace=True)


# update climate dataframe with pre-processed precipitation columns
climate_df['precipitation'] = pd.to_numeric(result['precipitation'])
climate_df.insert(loc=7, column='precipitation_flag', value=result['precipitation_flag'].astype(str))

# prior to json conversion, format date properly
climate_df['date'] = climate_df['date'].dt.strftime(DATE_FORMAT)
hotspot_df['date'] = hotspot_df['date'].dt.strftime(DATE_FORMAT)
hotspot_df['datetime'] = hotspot_df['datetime'].dt.strftime(DATETIME_FORMAT)

climate_df.head()

Unnamed: 0,station,date,air_temperature_celcius,relative_humidity,windspeed_knots,max_wind_speed,precipitation,precipitation_flag,GHI
0,948700,2020-12-31,19,56.8,7.9,11.1,0.0,I,154
1,948700,2021-01-02,15,50.7,9.2,13.0,0.02,G,128
2,948700,2021-01-03,16,53.6,8.1,15.0,0.0,G,133
3,948700,2021-01-04,24,61.6,7.7,14.0,0.0,I,186
4,948700,2021-01-05,24,62.3,7.0,13.0,0.0,I,185


In [18]:
# convert climate data to a list of JSON objects
result = climate_df.to_json(orient="records")
document = json.loads(result)

# for every climate data, we will find the corresponding hotspot data (>=0)
for row in document:
    hotspots = hotspot_df.loc[hotspot_df['date'] == row['date']]
    # use a single 'datetime' field (encompasses the 'date' data already)
    hotspots = hotspots[['datetime', 'latitude', 'longitude', 'confidence', 'surface_temperature_celcius']]
    
    # add the embedding hotspot data into climate data
    hotspot_embed = hotspots.to_json(orient='records')
    hotspot_embed = json.loads(hotspot_embed)
    row['hotspot'] = hotspot_embed
    
    # processed current climate data, update to correct date format before inserting into db
    row['date'] = datetime.strptime(row['date'], DATE_FORMAT)
    for hotspot in row['hotspot']:
        hotspot['datetime'] = datetime.strptime(hotspot['datetime'], '%Y-%m-%dT%H:%M:%S')
    

document = sorted(document, key=lambda d: d['date']) 


In [19]:
# connect to MongoClient on default host and port
client = MongoClient() 

# get database
db = client.fit3182_assignment_db

# get climate collection (drop any existing collection to re-populate)
collection = db.climate
collection.drop()

# inserting the embedding document into our db collection
result = collection.insert_many(document)

### Q2.2:  
Write queries to answer the following tasks on `fit3182_assignment_db`.

In [20]:
def print_cursor(cursor):
    """
    Format printing of multiple records within cursor object
    """
    for record in cursor:
        pprint(record)

In [21]:
# 2.2a) Find climate data on 12th December 2021.

# use data query to find exact data
date_query = datetime(2021, 12, 12)
climate_data = collection.find_one({'date': date_query})

pprint(climate_data)

{'GHI': 156,
 '_id': ObjectId('628292f65b3178ab2dcced55'),
 'air_temperature_celcius': 19,
 'date': datetime.datetime(2021, 12, 12, 0, 0),
 'hotspot': [{'confidence': 53,
              'datetime': datetime.datetime(2021, 12, 12, 0, 45, 38),
              'latitude': -37.903,
              'longitude': 145.25,
              'surface_temperature_celcius': 44}],
 'max_wind_speed': 12.0,
 'precipitation': 0.0,
 'precipitation_flag': 'I',
 'relative_humidity': 55.3,
 'station': 948702,
 'windspeed_knots': 6.2}


In [22]:
# 2.2b) Find the latitude, longitude, surface temperature (°C), and confidence
# when the surface temperature (°C) was between 65 °C and 100 °C.

pipeline = [
    # flatten the hotspots 
    {'$unwind': '$hotspot'},
    # filter by matching valid temperature
    {'$match': {'hotspot.surface_temperature_celcius': {"$gte": 65, "$lte": 100}} },
    # project only needed fields
    {"$project": {'_id':False, 'hotspot.latitude':True, 'hotspot.longitude':True, 
                  'hotspot.surface_temperature_celcius':True, 'hotspot.confidence':True}}
     
]

hotspots = collection.aggregate(pipeline=pipeline)

print_cursor(hotspots)

{'hotspot': {'confidence': 94,
             'latitude': -37.2284,
             'longitude': 147.9187,
             'surface_temperature_celcius': 73}}
{'hotspot': {'confidence': 97,
             'latitude': -37.6572,
             'longitude': 142.0703,
             'surface_temperature_celcius': 80}}
{'hotspot': {'confidence': 84,
             'latitude': -37.0193,
             'longitude': 148.1459,
             'surface_temperature_celcius': 71}}
{'hotspot': {'confidence': 100,
             'latitude': -37.4229,
             'longitude': 147.027,
             'surface_temperature_celcius': 99}}
{'hotspot': {'confidence': 80,
             'latitude': -37.0055,
             'longitude': 148.1582,
             'surface_temperature_celcius': 68}}
{'hotspot': {'confidence': 85,
             'latitude': -37.4128,
             'longitude': 147.0242,
             'surface_temperature_celcius': 98}}
{'hotspot': {'confidence': 90,
             'latitude': -34.357,
             'longitude': 141

             'latitude': -37.0769,
             'longitude': 141.042,
             'surface_temperature_celcius': 72}}
{'hotspot': {'confidence': 79,
             'latitude': -36.883,
             'longitude': 142.1637,
             'surface_temperature_celcius': 67}}
{'hotspot': {'confidence': 87,
             'latitude': -36.0973,
             'longitude': 143.4279,
             'surface_temperature_celcius': 92}}
{'hotspot': {'confidence': 100,
             'latitude': -36.1819,
             'longitude': 145.9269,
             'surface_temperature_celcius': 98}}
{'hotspot': {'confidence': 87,
             'latitude': -37.6387,
             'longitude': 142.9032,
             'surface_temperature_celcius': 88}}
{'hotspot': {'confidence': 93,
             'latitude': -37.6439,
             'longitude': 142.913,
             'surface_temperature_celcius': 83}}
{'hotspot': {'confidence': 93,
             'latitude': -37.3165,
             'longitude': 147.4932,
             'surface_tem

             'surface_temperature_celcius': 75}}
{'hotspot': {'confidence': 100,
             'latitude': -35.8182,
             'longitude': 143.8739,
             'surface_temperature_celcius': 92}}
{'hotspot': {'confidence': 100,
             'latitude': -37.9246,
             'longitude': 146.2464,
             'surface_temperature_celcius': 96}}
{'hotspot': {'confidence': 94,
             'latitude': -36.3313,
             'longitude': 141.0017,
             'surface_temperature_celcius': 73}}
{'hotspot': {'confidence': 90,
             'latitude': -36.2857,
             'longitude': 141.3467,
             'surface_temperature_celcius': 67}}
{'hotspot': {'confidence': 92,
             'latitude': -36.0749,
             'longitude': 143.4058,
             'surface_temperature_celcius': 70}}
{'hotspot': {'confidence': 91,
             'latitude': -36.4449,
             'longitude': 140.9836,
             'surface_temperature_celcius': 68}}
{'hotspot': {'confidence': 100,
           

{'hotspot': {'confidence': 100,
             'latitude': -36.8791,
             'longitude': 142.7261,
             'surface_temperature_celcius': 93}}
{'hotspot': {'confidence': 100,
             'latitude': -36.7309,
             'longitude': 142.3633,
             'surface_temperature_celcius': 88}}
{'hotspot': {'confidence': 92,
             'latitude': -36.882,
             'longitude': 142.7108,
             'surface_temperature_celcius': 70}}
{'hotspot': {'confidence': 93,
             'latitude': -37.5135,
             'longitude': 142.7238,
             'surface_temperature_celcius': 72}}
{'hotspot': {'confidence': 94,
             'latitude': -36.2121,
             'longitude': 144.6885,
             'surface_temperature_celcius': 74}}
{'hotspot': {'confidence': 97,
             'latitude': -36.2206,
             'longitude': 144.6856,
             'surface_temperature_celcius': 80}}
{'hotspot': {'confidence': 88,
             'latitude': -38.0132,
             'longitude': 1

{'hotspot': {'confidence': 100,
             'latitude': -38.4349,
             'longitude': 146.3122,
             'surface_temperature_celcius': 93}}
{'hotspot': {'confidence': 89,
             'latitude': -38.4792,
             'longitude': 146.3081,
             'surface_temperature_celcius': 65}}
{'hotspot': {'confidence': 100,
             'latitude': -37.9047,
             'longitude': 141.0945,
             'surface_temperature_celcius': 90}}
{'hotspot': {'confidence': 91,
             'latitude': -36.7685,
             'longitude': 142.7134,
             'surface_temperature_celcius': 69}}
{'hotspot': {'confidence': 91,
             'latitude': -36.7801,
             'longitude': 142.7151,
             'surface_temperature_celcius': 68}}
{'hotspot': {'confidence': 96,
             'latitude': -36.1002,
             'longitude': 142.3405,
             'surface_temperature_celcius': 78}}
{'hotspot': {'confidence': 90,
             'latitude': -36.4325,
             'longitude': 

In [23]:
# 2.2c) Find date, surface temperature (°C), air temperature (°C), 
# relative humidity and max wind speed on 15th and 16th of December 2021.

date1 = datetime(2021, 12, 15)
date2 = datetime(2021, 12, 16)

pipeline = [
    {'$match': {'$or': [{"date":date1}, {"date":date2}] } },
    {'$project': {'_id':False, 
                  'date':True,
                  'hotspot.surface_temperature_celcius':True, 
                  'air_temperature_celcius':True, 
                  'relative_humidity': True, 
                  'max_wind_speed':True, }}
] 
result = collection.aggregate(pipeline=pipeline)

print_cursor(result)

{'air_temperature_celcius': 18,
 'date': datetime.datetime(2021, 12, 15, 0, 0),
 'hotspot': [{'surface_temperature_celcius': 42},
             {'surface_temperature_celcius': 36},
             {'surface_temperature_celcius': 38},
             {'surface_temperature_celcius': 40}],
 'max_wind_speed': 14.0,
 'relative_humidity': 52.0}
{'air_temperature_celcius': 18,
 'date': datetime.datetime(2021, 12, 16, 0, 0),
 'hotspot': [{'surface_temperature_celcius': 43},
             {'surface_temperature_celcius': 33},
             {'surface_temperature_celcius': 54},
             {'surface_temperature_celcius': 73},
             {'surface_temperature_celcius': 55},
             {'surface_temperature_celcius': 75},
             {'surface_temperature_celcius': 55},
             {'surface_temperature_celcius': 66},
             {'surface_temperature_celcius': 56},
             {'surface_temperature_celcius': 60},
             {'surface_temperature_celcius': 73},
             {'surface_temperature_c

In [24]:
# 2.2d) Find datetime, air temperature (°C), surface temperature (°C) and confidence
# when the confidence is between 80 and 100

pipeline = [
    # flatten hotspots
    {'$unwind': '$hotspot'},
    # filter for hotspots with valid confidence level
    {'$match': {'hotspot.confidence': {"$gte": 80, "$lte": 100}} },
    # project needed fields
    {'$project': {'_id':False, 
#                   "hotspot.datetime":{ "$dateToString":{"format":DATETIME_FORMAT, "date":"$hotspot.datetime"}},
                  'hotspot.datetime':True,
                  'air_temperature_celcius':True, 
                  'hotspot.surface_temperature_celcius':True, 'hotspot.confidence':True }},
] 
result = collection.aggregate(pipeline=pipeline)

print_cursor(result)

{'air_temperature_celcius': 20,
 'hotspot': {'confidence': 87,
             'datetime': datetime.datetime(2021, 3, 6, 5, 6, 30),
             'surface_temperature_celcius': 62}}
{'air_temperature_celcius': 20,
 'hotspot': {'confidence': 85,
             'datetime': datetime.datetime(2021, 3, 6, 5, 6, 20),
             'surface_temperature_celcius': 59}}
{'air_temperature_celcius': 19,
 'hotspot': {'confidence': 88,
             'datetime': datetime.datetime(2021, 3, 7, 4, 16, 10),
             'surface_temperature_celcius': 64}}
{'air_temperature_celcius': 23,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 3, 9, 13, 23, 40),
             'surface_temperature_celcius': 41}}
{'air_temperature_celcius': 19,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 3, 10, 4, 48, 40),
             'surface_temperature_celcius': 105}}
{'air_temperature_celcius': 19,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datet

 'hotspot': {'confidence': 89,
             'datetime': datetime.datetime(2021, 4, 3, 3, 50, 40),
             'surface_temperature_celcius': 65}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 81,
             'datetime': datetime.datetime(2021, 4, 3, 1, 7, 20),
             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 93,
             'datetime': datetime.datetime(2021, 4, 4, 15, 31),
             'surface_temperature_celcius': 44}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 4, 15, 30, 40),
             'surface_temperature_celcius': 64}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 88,
             'datetime': datetime.datetime(2021, 4, 4, 4, 41, 50),
             'surface_temperature_celcius': 63}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 4, 4, 4, 41, 10),
         

 'hotspot': {'confidence': 93,
             'datetime': datetime.datetime(2021, 4, 5, 5, 15, 30),
             'surface_temperature_celcius': 83}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 93,
             'datetime': datetime.datetime(2021, 4, 5, 3, 40, 50),
             'surface_temperature_celcius': 72}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 95,
             'datetime': datetime.datetime(2021, 4, 5, 3, 40, 30),
             'surface_temperature_celcius': 76}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 81,
             'datetime': datetime.datetime(2021, 4, 5, 3, 40, 20),
             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 89,
             'datetime': datetime.datetime(2021, 4, 5, 3, 39, 30),
             'surface_temperature_celcius': 65}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 4, 5, 3, 39, 30),
       

             'datetime': datetime.datetime(2021, 4, 12, 3, 50),
             'surface_temperature_celcius': 61}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 92,
             'datetime': datetime.datetime(2021, 4, 12, 3, 47, 50),
             'surface_temperature_celcius': 70}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 4, 12, 3, 47, 50),
             'surface_temperature_celcius': 58}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 4, 12, 3, 47, 30),
             'surface_temperature_celcius': 58}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 12, 3, 47, 30),
             'surface_temperature_celcius': 106}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 83,
             'datetime': datetime.datetime(2021, 4, 12, 3, 47, 20),
             'surface_temperature_

             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 50),
             'surface_temperature_celcius': 58}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 85,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 50),
             'surface_temperature_celcius': 59}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 40),
             'surface_temperature_celcius': 104}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 90,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 40),
             'surface_temperature_celcius': 66}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 80,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 40),
             'surface_temperature_celcius': 53}}
{'air_temperature_celcius': 16,
 'hotspot': {'con

             'surface_temperature_celcius': 75}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'surface_temperature_celcius': 58}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'surface_temperature_celcius': 58}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'surface_temperature_celcius': 61}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'surface_temperature_celcius': 92}}
{'air_temperature_celcius': 16,
 'hotspot': {'confidence': 83,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'surface_temperature_celcius': 56}}
{'air_temperature_celcius': 16,
 'hotspot': {'conf

 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 4, 18, 4, 56, 20),
             'surface_temperature_celcius': 61}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 4, 18, 4, 55),
             'surface_temperature_celcius': 56}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 91,
             'datetime': datetime.datetime(2021, 4, 18, 4, 54, 40),
             'surface_temperature_celcius': 68}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 90,
             'datetime': datetime.datetime(2021, 4, 18, 4, 53),
             'surface_temperature_celcius': 66}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 4, 18, 4, 52, 20),
             'surface_temperature_celcius': 60}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 18, 4, 52),
            

 'hotspot': {'confidence': 92,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'surface_temperature_celcius': 70}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 96,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'surface_temperature_celcius': 79}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 83,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'surface_temperature_celcius': 56}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 81,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 98,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'surface_temperature_celcius': 84}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
 

 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 23, 5, 2, 50),
             'surface_temperature_celcius': 101}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 81,
             'datetime': datetime.datetime(2021, 4, 23, 5, 2, 50),
             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 19,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 4, 24, 4, 8, 20),
             'surface_temperature_celcius': 78}}
{'air_temperature_celcius': 19,
 'hotspot': {'confidence': 94,
             'datetime': datetime.datetime(2021, 4, 24, 4, 8, 20),
             'surface_temperature_celcius': 108}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 4, 25, 5, 2, 30),
             'surface_temperature_celcius': 56}}
{'air_temperature_celcius': 15,
 'hotspot': {'confidence': 87,
             'datetime': datetime.datetime(2021, 4, 29, 4, 33),
       

{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 5, 4, 4, 44, 40),
             'surface_temperature_celcius': 105}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 5, 4, 4, 44, 40),
             'surface_temperature_celcius': 88}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 92,
             'datetime': datetime.datetime(2021, 5, 4, 4, 44, 40),
             'surface_temperature_celcius': 70}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 93,
             'datetime': datetime.datetime(2021, 5, 4, 4, 44, 40),
             'surface_temperature_celcius': 72}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 94,
             'datetime': datetime.datetime(2021, 5, 4, 4, 44, 40),
             'surface_temperature_celcius': 74}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datet

             'surface_temperature_celcius': 54}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 96,
             'datetime': datetime.datetime(2021, 5, 10, 4, 8, 10),
             'surface_temperature_celcius': 78}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 87,
             'datetime': datetime.datetime(2021, 5, 10, 4, 8, 10),
             'surface_temperature_celcius': 62}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 5, 10, 4, 8, 10),
             'surface_temperature_celcius': 61}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 94,
             'datetime': datetime.datetime(2021, 5, 10, 4, 8, 10),
             'surface_temperature_celcius': 74}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 82,
             'datetime': datetime.datetime(2021, 5, 10, 4, 8, 10),
             'surface_temperature_celcius': 55}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence

 'hotspot': {'confidence': 94,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'surface_temperature_celcius': 74}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 86,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'surface_temperature_celcius': 60}}
{'air_temperature_celcius': 10,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 5, 15, 0, 12, 10),
             'surface_temperature_celcius': 90}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 80,
             'datetime': datetime.datetime(2021, 5, 22, 4, 43, 30),
             'surface_temperature_celcius': 53}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 5, 22, 4, 34, 10),
             'surface_temperature_celcius': 101}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 91,
             'datetime': datetime.datetime(2021, 5, 22, 4, 32, 20)

             'surface_temperature_celcius': 84}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 9, 23, 12, 47, 12),
             'surface_temperature_celcius': 79}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 88,
             'datetime': datetime.datetime(2021, 9, 23, 4, 59, 13),
             'surface_temperature_celcius': 60}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 87,
             'datetime': datetime.datetime(2021, 9, 23, 4, 59, 11),
             'surface_temperature_celcius': 60}}
{'air_temperature_celcius': 17,
 'hotspot': {'confidence': 80,
             'datetime': datetime.datetime(2021, 9, 23, 4, 59, 11),
             'surface_temperature_celcius': 50}}
{'air_temperature_celcius': 14,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 9, 24, 15, 7, 47),
             'surface_temperature_celcius': 65}}
{'air_temperature_celcius': 14,
 'hotspot': {'co

             'surface_temperature_celcius': 55}}
{'air_temperature_celcius': 26,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 11, 30, 12, 22, 15),
             'surface_temperature_celcius': 65}}
{'air_temperature_celcius': 26,
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 11, 30, 12, 22, 14),
             'surface_temperature_celcius': 71}}
{'air_temperature_celcius': 26,
 'hotspot': {'confidence': 84,
             'datetime': datetime.datetime(2021, 11, 30, 4, 35),
             'surface_temperature_celcius': 59}}
{'air_temperature_celcius': 26,
 'hotspot': {'confidence': 98,
             'datetime': datetime.datetime(2021, 11, 30, 4, 34, 57),
             'surface_temperature_celcius': 83}}
{'air_temperature_celcius': 26,
 'hotspot': {'confidence': 81,
             'datetime': datetime.datetime(2021, 11, 30, 4, 34, 57),
             'surface_temperature_celcius': 57}}
{'air_temperature_celcius': 26,
 'hotspot': {'

In [25]:
# 2.2e) Find the top 10 records with the highest surface temperature (°C).

pipeline = [
    # flatten hotspots
    {'$unwind': '$hotspot'},
    # sort descending order to get highest surface temperature from top
    {'$sort': {'hotspot.surface_temperature_celcius':pymongo.DESCENDING}},
    # only take top 10 records
    {'$limit':10}
]

result = collection.aggregate(pipeline=pipeline)
print_cursor(result)


{'GHI': 122,
 '_id': ObjectId('628292f65b3178ab2dccec67'),
 'air_temperature_celcius': 15,
 'date': datetime.datetime(2021, 4, 18, 0, 0),
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 18, 4, 52),
             'latitude': -38.1665,
             'longitude': 143.062,
             'surface_temperature_celcius': 124},
 'max_wind_speed': 9.9,
 'precipitation': 0.0,
 'precipitation_flag': 'I',
 'relative_humidity': 56.1,
 'station': 948701,
 'windspeed_knots': 5.1}
{'GHI': 140,
 '_id': ObjectId('628292f65b3178ab2dccec59'),
 'air_temperature_celcius': 16,
 'date': datetime.datetime(2021, 4, 4, 0, 0),
 'hotspot': {'confidence': 100,
             'datetime': datetime.datetime(2021, 4, 4, 4, 32, 50),
             'latitude': -36.343,
             'longitude': 142.1986,
             'surface_temperature_celcius': 123},
 'max_wind_speed': 12.0,
 'precipitation': 0.0,
 'precipitation_flag': 'I',
 'relative_humidity': 47.5,
 'station': 948701,
 'windspeed_knots'

In [26]:
# 2.2f) Find the number of fires each day. You are required to only display the total
# number of fires and the date in the output.

pipeline = [
    {'$project': {'_id':False, 
                  "date":{ "$dateToString":{"format":DATE_FORMAT, "date":"$date"}},
                  'count': {'$size': '$hotspot'}}},
]

result = collection.aggregate(pipeline=pipeline)

for record in result:
    formatted_output = {'date':record['date'], 'number_of_fires': record['count']}
    pprint(formatted_output)

{'date': '2020-12-31', 'number_of_fires': 0}
{'date': '2021-01-01', 'number_of_fires': 0}
{'date': '2021-01-02', 'number_of_fires': 0}
{'date': '2021-01-03', 'number_of_fires': 0}
{'date': '2021-01-04', 'number_of_fires': 0}
{'date': '2021-01-05', 'number_of_fires': 0}
{'date': '2021-01-06', 'number_of_fires': 0}
{'date': '2021-01-07', 'number_of_fires': 0}
{'date': '2021-01-08', 'number_of_fires': 0}
{'date': '2021-01-09', 'number_of_fires': 0}
{'date': '2021-01-10', 'number_of_fires': 0}
{'date': '2021-01-11', 'number_of_fires': 0}
{'date': '2021-01-12', 'number_of_fires': 0}
{'date': '2021-01-13', 'number_of_fires': 0}
{'date': '2021-01-14', 'number_of_fires': 0}
{'date': '2021-01-15', 'number_of_fires': 0}
{'date': '2021-01-16', 'number_of_fires': 0}
{'date': '2021-01-17', 'number_of_fires': 0}
{'date': '2021-01-18', 'number_of_fires': 0}
{'date': '2021-01-19', 'number_of_fires': 0}
{'date': '2021-01-20', 'number_of_fires': 0}
{'date': '2021-01-21', 'number_of_fires': 0}
{'date': '

In [27]:
# 2.2g) Find the records of fires where the confidence is below 70

pipeline = [
    # flatten hotspots
    {'$unwind': '$hotspot'},
    # filter for valid confidence levels
    {'$match': { 'hotspot.confidence':{'$lte': 70} } },
    # we only want the record of fire
    {'$project': {'_id':False, 'hotspot':True}}
]

result = collection.aggregate(pipeline=pipeline)
print_cursor(result)

{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 3, 8, 4, 51),
             'latitude': -37.7885,
             'longitude': 141.9352,
             'surface_temperature_celcius': 55}}
{'hotspot': {'confidence': 54,
             'datetime': datetime.datetime(2021, 3, 9, 3, 57),
             'latitude': -37.7171,
             'longitude': 147.5866,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 55,
             'datetime': datetime.datetime(2021, 3, 10, 4, 43),
             'latitude': -36.2544,
             'longitude': 148.0353,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 54,
             'datetime': datetime.datetime(2021, 3, 10, 4, 42, 30),
             'latitude': -37.2197,
             'longitude': 147.9621,
             'surface_temperature_celcius': 43}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(2021, 3, 13, 23, 58, 50),
             'latitude': -37.0286,
   

             'datetime': datetime.datetime(2021, 4, 1, 4, 2, 40),
             'latitude': -36.2044,
             'longitude': 145.7706,
             'surface_temperature_celcius': 51}}
{'hotspot': {'confidence': 57,
             'datetime': datetime.datetime(2021, 4, 1, 4, 2, 40),
             'latitude': -36.5245,
             'longitude': 143.1556,
             'surface_temperature_celcius': 43}}
{'hotspot': {'confidence': 50,
             'datetime': datetime.datetime(2021, 4, 1, 4, 2, 40),
             'latitude': -36.1002,
             'longitude': 147.6494,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 60,
             'datetime': datetime.datetime(2021, 4, 2, 4, 53, 30),
             'latitude': -36.3872,
             'longitude': 145.3858,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 63,
             'datetime': datetime.datetime(2021, 4, 2, 4, 45, 10),
             'latitude': -36.494,
             'longitude': 145.

{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 4, 5, 3, 39, 10),
             'latitude': -36.9405,
             'longitude': 142.8627,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 4, 5, 3, 39, 10),
             'latitude': -37.6339,
             'longitude': 143.4108,
             'surface_temperature_celcius': 46}}
{'hotspot': {'confidence': 66,
             'datetime': datetime.datetime(2021, 4, 6, 4, 25, 40),
             'latitude': -37.7069,
             'longitude': 142.1573,
             'surface_temperature_celcius': 53}}
{'hotspot': {'confidence': 50,
             'datetime': datetime.datetime(2021, 4, 6, 4, 25, 40),
             'latitude': -36.0834,
             'longitude': 146.5364,
             'surface_temperature_celcius': 53}}
{'hotspot': {'confidence': 62,
             'datetime': datetime.datetime(2021, 4, 6, 4, 23, 50),
             'latitude': -36.3

             'latitude': -37.7974,
             'longitude': 148.5498,
             'surface_temperature_celcius': 53}}
{'hotspot': {'confidence': 67,
             'datetime': datetime.datetime(2021, 4, 8, 4, 15, 20),
             'latitude': -37.34,
             'longitude': 149.3668,
             'surface_temperature_celcius': 74}}
{'hotspot': {'confidence': 62,
             'datetime': datetime.datetime(2021, 4, 8, 4, 14, 40),
             'latitude': -37.8567,
             'longitude': 147.1576,
             'surface_temperature_celcius': 48}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(2021, 4, 8, 4, 12, 20),
             'latitude': -37.4368,
             'longitude': 149.1191,
             'surface_temperature_celcius': 52}}
{'hotspot': {'confidence': 58,
             'datetime': datetime.datetime(2021, 4, 8, 4, 11, 40),
             'latitude': -37.9408,
             'longitude': 146.0611,
             'surface_temperature_celcius': 47}}
{'hotspot'

             'latitude': -37.7929,
             'longitude': 143.1197,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 4, 12, 3, 44, 40),
             'latitude': -37.7972,
             'longitude': 143.1153,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 59,
             'datetime': datetime.datetime(2021, 4, 12, 3, 44, 40),
             'latitude': -37.3656,
             'longitude': 143.8206,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 4, 13, 4, 36, 30),
             'latitude': -37.8865,
             'longitude': 143.3048,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 70,
             'datetime': datetime.datetime(2021, 4, 13, 4, 36, 20),
             'latitude': -36.5414,
             'longitude': 145.4159,
             'surface_temperature_celcius': 45}}
{'ho

             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'latitude': -37.1089,
             'longitude': 141.8394,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 66,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'latitude': -36.7804,
             'longitude': 145.1698,
             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'latitude': -37.1949,
             'longitude': 142.675,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 62,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'latitude': -37.4784,
             'longitude': 143.015,
             'surface_temperature_celcius': 47}}
{'hotspot': {'confidence': 66,
             'datetime': datetime.datetime(2021, 4, 13, 4, 26, 30),
             'latitude': -37.5358,
             'longitude

             'surface_temperature_celcius': 41}}
{'hotspot': {'confidence': 67,
             'datetime': datetime.datetime(2021, 4, 15, 4, 14, 20),
             'latitude': -36.7315,
             'longitude': 143.9232,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 51,
             'datetime': datetime.datetime(2021, 4, 15, 4, 14, 20),
             'latitude': -37.8937,
             'longitude': 147.1785,
             'surface_temperature_celcius': 38}}
{'hotspot': {'confidence': 63,
             'datetime': datetime.datetime(2021, 4, 15, 4, 14, 20),
             'latitude': -36.1937,
             'longitude': 143.5674,
             'surface_temperature_celcius': 41}}
{'hotspot': {'confidence': 55,
             'datetime': datetime.datetime(2021, 4, 16, 4, 58, 30),
             'latitude': -37.2904,
             'longitude': 143.9209,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 58,
             'datetime': datetime.datetime(2

             'surface_temperature_celcius': 43}}
{'hotspot': {'confidence': 62,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -36.9171,
             'longitude': 143.4749,
             'surface_temperature_celcius': 41}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -37.9373,
             'longitude': 146.0744,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 67,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -38.1758,
             'longitude': 144.489,
             'surface_temperature_celcius': 50}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -38.2234,
             'longitude': 143.9549,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(20

             'surface_temperature_celcius': 43}}
{'hotspot': {'confidence': 56,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -36.7564,
             'longitude': 142.343,
             'surface_temperature_celcius': 41}}
{'hotspot': {'confidence': 67,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -36.0924,
             'longitude': 145.5751,
             'surface_temperature_celcius': 47}}
{'hotspot': {'confidence': 67,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -36.1021,
             'longitude': 145.5294,
             'surface_temperature_celcius': 47}}
{'hotspot': {'confidence': 65,
             'datetime': datetime.datetime(2021, 4, 18, 4, 44, 50),
             'latitude': -36.7149,
             'longitude': 142.3934,
             'surface_temperature_celcius': 47}}
{'hotspot': {'confidence': 61,
             'datetime': datetime.datetime(20

             'longitude': 147.6711,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 5, 3, 4, 2, 30),
             'latitude': -36.3891,
             'longitude': 141.0253,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 5, 3, 4, 2, 20),
             'latitude': -36.8763,
             'longitude': 142.7266,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 58,
             'datetime': datetime.datetime(2021, 5, 3, 4, 2, 20),
             'latitude': -35.4619,
             'longitude': 143.5251,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 5, 3, 4, 2, 20),
             'latitude': -36.2549,
             'longitude': 141.9908,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 70,
             'da

             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 5, 8, 4, 20, 10),
             'latitude': -36.4226,
             'longitude': 141.6752,
             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 51,
             'datetime': datetime.datetime(2021, 5, 8, 4, 20, 10),
             'latitude': -36.0744,
             'longitude': 146.688,
             'surface_temperature_celcius': 38}}
{'hotspot': {'confidence': 50,
             'datetime': datetime.datetime(2021, 5, 8, 4, 20, 10),
             'latitude': -36.2756,
             'longitude': 145.811,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 5, 8, 0, 1, 40),
             'latitude': -38.4068,
             'longitude': 147.0682,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 60,
             'datetime': datetime.datetime(2021, 5,

             'datetime': datetime.datetime(2021, 5, 11, 4, 50, 40),
             'latitude': -36.3909,
             'longitude': 141.1843,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 58,
             'datetime': datetime.datetime(2021, 5, 11, 4, 50, 40),
             'latitude': -36.4713,
             'longitude': 144.4747,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 70,
             'datetime': datetime.datetime(2021, 5, 11, 0, 33, 30),
             'latitude': -38.5657,
             'longitude': 146.438,
             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 5, 11, 0, 33, 20),
             'latitude': -38.5675,
             'longitude': 146.4563,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 5, 12, 4, 2, 30),
             'latitude': -36.5375,
             'longitude

             'longitude': 143.0946,
             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 52,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'latitude': -36.5646,
             'longitude': 142.3068,
             'surface_temperature_celcius': 48}}
{'hotspot': {'confidence': 70,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'latitude': -36.3073,
             'longitude': 144.2678,
             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'latitude': -36.7126,
             'longitude': 141.6358,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 63,
             'datetime': datetime.datetime(2021, 5, 15, 4, 26, 20),
             'latitude': -36.4347,
             'longitude': 143.5704,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 70,
        

             'surface_temperature_celcius': 39}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 6, 14, 4, 38, 30),
             'latitude': -36.059,
             'longitude': 143.7718,
             'surface_temperature_celcius': 44}}
{'hotspot': {'confidence': 66,
             'datetime': datetime.datetime(2021, 6, 16, 4, 34, 30),
             'latitude': -37.3583,
             'longitude': 143.0203,
             'surface_temperature_celcius': 43}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 6, 16, 4, 26, 10),
             'latitude': -36.7084,
             'longitude': 142.7354,
             'surface_temperature_celcius': 42}}
{'hotspot': {'confidence': 70,
             'datetime': datetime.datetime(2021, 6, 18, 4, 14),
             'latitude': -36.7179,
             'longitude': 142.2536,
             'surface_temperature_celcius': 45}}
{'hotspot': {'confidence': 53,
             'datetime': datetime.datetime(2021, 

             'surface_temperature_celcius': 34}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 10, 1, 4, 10, 28),
             'latitude': -37.46,
             'longitude': 148.113,
             'surface_temperature_celcius': 52}}
{'hotspot': {'confidence': 50,
             'datetime': datetime.datetime(2021, 10, 2, 23, 44, 31),
             'latitude': -37.466,
             'longitude': 148.1,
             'surface_temperature_celcius': 29}}
{'hotspot': {'confidence': 59,
             'datetime': datetime.datetime(2021, 10, 2, 4, 53, 10),
             'latitude': -37.475,
             'longitude': 148.134,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 68,
             'datetime': datetime.datetime(2021, 10, 2, 0, 39, 34),
             'latitude': -37.456,
             'longitude': 148.11,
             'surface_temperature_celcius': 40}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 10, 3,

{'hotspot': {'confidence': 61,
             'datetime': datetime.datetime(2021, 11, 30, 15, 38, 32),
             'latitude': -37.38,
             'longitude': 149.334,
             'surface_temperature_celcius': 31}}
{'hotspot': {'confidence': 62,
             'datetime': datetime.datetime(2021, 11, 30, 12, 22, 15),
             'latitude': -37.602,
             'longitude': 149.295,
             'surface_temperature_celcius': 33}}
{'hotspot': {'confidence': 64,
             'datetime': datetime.datetime(2021, 11, 30, 12, 22, 15),
             'latitude': -37.598,
             'longitude': 149.29,
             'surface_temperature_celcius': 32}}
{'hotspot': {'confidence': 69,
             'datetime': datetime.datetime(2021, 11, 30, 4, 34, 57),
             'latitude': -37.61,
             'longitude': 149.279,
             'surface_temperature_celcius': 48}}
{'hotspot': {'confidence': 60,
             'datetime': datetime.datetime(2021, 11, 30, 0, 21),
             'latitude': -37.637

In [28]:
# 2.2h) Find the average surface temperature (°C) for each day. You are required to
# only display average surface temperature (°C) and the date in the output.

pipeline = [
    # don't have to flatten. we want to retain the data where there is no hotspot record
    
    {'$project': {'_id':False, 
                  "date":{ "$dateToString":{"format":"%d/%m/%Y", "date":"$date"}},
                  'avg_surface_temp': {'$avg':'$hotspot.surface_temperature_celcius'}}}
]

result = collection.aggregate(pipeline=pipeline)
print_cursor(result)


{'avg_surface_temp': None, 'date': '31/12/2020'}
{'avg_surface_temp': None, 'date': '01/01/2021'}
{'avg_surface_temp': None, 'date': '02/01/2021'}
{'avg_surface_temp': None, 'date': '03/01/2021'}
{'avg_surface_temp': None, 'date': '04/01/2021'}
{'avg_surface_temp': None, 'date': '05/01/2021'}
{'avg_surface_temp': None, 'date': '06/01/2021'}
{'avg_surface_temp': None, 'date': '07/01/2021'}
{'avg_surface_temp': None, 'date': '08/01/2021'}
{'avg_surface_temp': None, 'date': '09/01/2021'}
{'avg_surface_temp': None, 'date': '10/01/2021'}
{'avg_surface_temp': None, 'date': '11/01/2021'}
{'avg_surface_temp': None, 'date': '12/01/2021'}
{'avg_surface_temp': None, 'date': '13/01/2021'}
{'avg_surface_temp': None, 'date': '14/01/2021'}
{'avg_surface_temp': None, 'date': '15/01/2021'}
{'avg_surface_temp': None, 'date': '16/01/2021'}
{'avg_surface_temp': None, 'date': '17/01/2021'}
{'avg_surface_temp': None, 'date': '18/01/2021'}
{'avg_surface_temp': None, 'date': '19/01/2021'}
{'avg_surface_temp':

In [29]:
# 2.2i) Find the top 10 records with the lowest GHI.

# don't have to aggregate because we can directly sort by GHI
result = collection.find().sort('GHI', pymongo.ASCENDING).limit(10)

print_cursor(result)


{'GHI': 47,
 '_id': ObjectId('628292f65b3178ab2dccecd1'),
 'air_temperature_celcius': 5,
 'date': datetime.datetime(2021, 8, 2, 0, 0),
 'hotspot': [{'confidence': 94,
              'datetime': datetime.datetime(2021, 8, 2, 3, 45, 40),
              'latitude': -37.4796,
              'longitude': 141.9403,
              'surface_temperature_celcius': 87},
             {'confidence': 54,
              'datetime': datetime.datetime(2021, 8, 2, 3, 45),
              'latitude': -37.491,
              'longitude': 141.936,
              'surface_temperature_celcius': 40}],
 'max_wind_speed': 5.1,
 'precipitation': 0.0,
 'precipitation_flag': 'I',
 'relative_humidity': 38.6,
 'station': 948701,
 'windspeed_knots': 1.8}
{'GHI': 48,
 '_id': ObjectId('628292f65b3178ab2dccecb0'),
 'air_temperature_celcius': 5,
 'date': datetime.datetime(2021, 6, 30, 0, 0),
 'hotspot': [{'confidence': 78,
              'datetime': datetime.datetime(2021, 6, 30, 4, 41, 25),
              'latitude': -36.834,
    

In [30]:
# 2.2j) Find the records with a 24-hour precipitation recorded between 0.20 to 0.35.

# 24-hour precipitation -> precipitation flag = G
query = {'$and':[
        {'precipitation_flag': {'$in':['D', 'F', 'G']}},
        {'precipitation': {"$gte": 0.20, "$lte": 0.35}}
    ]}

result = collection.find(query)

print_cursor(result)



{'GHI': 157,
 '_id': ObjectId('628292f65b3178ab2dccec08'),
 'air_temperature_celcius': 19,
 'date': datetime.datetime(2021, 1, 13, 0, 0),
 'hotspot': [],
 'max_wind_speed': 18.1,
 'precipitation': 0.31,
 'precipitation_flag': 'G',
 'relative_humidity': 54.1,
 'station': 948700,
 'windspeed_knots': 11.2}
{'GHI': 146,
 '_id': ObjectId('628292f65b3178ab2dccec53'),
 'air_temperature_celcius': 17,
 'date': datetime.datetime(2021, 3, 29, 0, 0),
 'hotspot': [{'confidence': 69,
              'datetime': datetime.datetime(2021, 3, 29, 0, 48, 40),
              'latitude': -34.2648,
              'longitude': 141.6325,
              'surface_temperature_celcius': 51}],
 'max_wind_speed': 21.0,
 'precipitation': 0.24,
 'precipitation_flag': 'G',
 'relative_humidity': 49.9,
 'station': 948701,
 'windspeed_knots': 12.2}
{'GHI': 166,
 '_id': ObjectId('628292f65b3178ab2dccec69'),
 'air_temperature_celcius': 20,
 'date': datetime.datetime(2021, 4, 20, 0, 0),
 'hotspot': [{'confidence': 84,
           

              'longitude': 149.233,
              'surface_temperature_celcius': 55},
             {'confidence': 100,
              'datetime': datetime.datetime(2021, 11, 30, 12, 22, 15),
              'latitude': -37.642,
              'longitude': 149.263,
              'surface_temperature_celcius': 65},
             {'confidence': 100,
              'datetime': datetime.datetime(2021, 11, 30, 12, 22, 14),
              'latitude': -37.634,
              'longitude': 149.237,
              'surface_temperature_celcius': 71},
             {'confidence': 84,
              'datetime': datetime.datetime(2021, 11, 30, 4, 35),
              'latitude': -37.384,
              'longitude': 149.336,
              'surface_temperature_celcius': 59},
             {'confidence': 73,
              'datetime': datetime.datetime(2021, 11, 30, 4, 35),
              'latitude': -37.389,
              'longitude': 149.311,
              'surface_temperature_celcius': 50},
             {'confidence'