# Task 1

## Data Model

As Station-Date is a 1-many relationship;
    Date-Climate is a 1-1 relationship and
    Date-Hotspot is a 1-0/many relationship.
The data models are designed as the following:

```
Station Collection:
    Stations = [
        {
            id: station,
            dates: [id of date]
        }
    ]
Dates Collection:
    Dates = [
        {
            id: date,
            climate: {
                // climate attributes
            },
            hotspots: [id of hotspot]
        }
    ]
Hotspots Collection:
    Hotspots = [
        {
            id: datetime,
            date: date (id of date in Dates)
            other_attributes: ...
        }
    ]
```

## Justification
### Benefits
- Decoupling Station from Date allows Dates to be referenced from Station
- Putting the data of climate in Dates allows direct access to climate from a given data
- Decoupling Dates from Hotspots because transitive dependency thing (datetime -> date)

### Drawback
- Only Year 2022 contains data of hotspots, so there are a lot of empty list for date.hotspots


# 2. Queueing MongoDB with PyMongo
## 2.1 Read data and create database

### Importing Libraries

In [37]:
from pymongo import MongoClient
import pandas as pd             # pandas version 2.0.1
import datetime

### Connecting to Database

In [2]:
client = MongoClient('mongodb://localhost:27017/')
db = client.fit3182_assignment_db

### Reading Datasets

In [31]:
climate_historic = pd.read_csv("../dataset/climate_historic.csv")
hotspot_historic = pd.read_csv("../dataset/hotspot_historic.csv")

### Creating Dates collection

In [32]:
# this is done as Pymongo use datetime for date and time
def raw_date_to_datetime(date: str) -> datetime:
    [dd, mm, yyyy] = date.split("/")
    return datetime.datetime(int(yyyy), int(mm), int(dd))

dates = pd.DataFrame(climate_historic.date).merge(hotspot_historic.date).date   # merging both datasets' date
dates = pd.DataFrame(dates.unique())                                            # Get all unique dates from both data set
dates = list(dates[0].apply(raw_date_to_datetime))                              # date:str -> date:datetime; Series -> List
climate_historic.date = climate_historic.date.apply(raw_date_to_datetime)       # date:str -> date:datetime
hotspot_historic.date = hotspot_historic.date.apply(raw_date_to_datetime)       # date:str -> date:datetime

In [38]:
dates_col = [
    {
        "_id": date,
        "climate": {
            "air_temperature": int(climate_historic[climate_historic.date == date]["air_temperature_celcius"].array[0]),
            "relative_humidity": float(climate_historic[climate_historic.date == date]["relative_humidity"].array[0]),
            "windspeed_knots": float(climate_historic[climate_historic.date == date]["windspeed_knots"].array[0]),
            "max_wind_speed": float(climate_historic[climate_historic.date == date]["max_wind_speed"].array[0]),
            "precipitation": climate_historic[climate_historic.date == date]["precipitation"].array[0],
            "ghi": int(climate_historic[climate_historic.date == date]["GHI_w/m2"].array[0])
        },
        "hotspots": [
            {"hotspot_id": hotspot_id}
            for hotspot_id in hotspot_historic[hotspot_historic.date == date]["datetime"].array
        ]
    }
    for date in dates
]

### Inserting Lists to MongoDb

In [40]:
from pymongo.errors import BulkWriteError

try:
    db.dates.insert_many(dates_col)
except BulkWriteError:
    print("Duplicated Keys in dates (Data already inserted)")
else:
    print("Data inserted")


Duplicated Keys in dates (Data already inserted)


# Task 2