Import the following librairies and modules

- `pickle`
- `warnings`
- `itertools`
- `numpy`
- `pandas`
- `folium` => if it is not found in your base environment, run this command once in a cell `! pip install folium`
- ``StandardScaler``    [StandardScaler](https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.StandardScaler.html)
- `DBSCAN`  [DBSCAN](https://scikit-learn.org/dev/modules/generated/sklearn.cluster.DBSCAN.html)
- `Silhoutte Score`     [Silhoutte score](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.silhouette_score.html)

In [148]:
# To suppress the warnings in the notebook
warnings.filterwarnings("ignore")

##### Step 1:

- read the csv file `weather_data.csv` and store in the variable `df`
- keep only these columns : `Stn_Name` `Lat` `Long` `Tm` `Tx` `Tn`
- display the first **5** rows

##### Step 2:
- Check the missing values.
    - If you find the missing values, drop the **rows** having atleast 1 missing value.

- keep only these columns  `Lat` `Long` `Tm` `Tx` `Tn` and store in a new variable `X`,rather then `df`.
- transform the dataframe `X` into **multidimensional array** using [nan_to_num](https://numpy.org/doc/2.0/reference/generated/numpy.nan_to_num.html)
- **fit_transform** the **ndarray** `X` using **StandardScaler** and store in variable `X`
- display the **ndarray** `X`

##### Step 3:

- Make the map of Canada using folium and store it in a variable `map_canada`. [Folium map](https://python-visualization.github.io/folium/latest/user_guide/map.html)
    - Canada is situated between latitude `56` and longitude `-106`
    - keep `zoom_start=4`. You can even play with **zoom_start** to see the difference.
    
- Now we would like to indicate the weather stations on this map. To do this:
    - Iterate over the columns `Lat` and `Long` of your `df`. You need to `zip` both columns to keep one to one correspondance between latitude value and longitude value. [Hint](https://www.python-engineer.com/posts/zip-for-loop/)
        - For each latitude/logitude pair, put a **red** dot on the `map_canada`. [Folium CircleMaker](https://python-visualization.github.io/folium/latest/user_guide/vector_layers/circle_and_circle_marker.html)
        - Keep ``radius=1`` ``fill=True`` ``color='red'`` in the arguments

- Display the output of variable `map_Canada`

##### Step 4:

To tune the hyperparameters of the `DBSCAN` model, we can either do hit and trial method and record the results, or we can use grid-search to find the optimum hyperparameters. Here we are choosing the later option.

To do grid-search, first you have to make the combinations of hyperparameters.

- Make a sequence between **0.01** and **1** using [linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) and store it in a variable `epsilon`. Keep `num=20`

- Make another sequence between **2** and **25** using [arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) and store it in a variable `min_samples`. Keep `step=2`

- Using [itertools](https://www.geeksforgeeks.org/python-itertools-product/), make a **list** of combinations of `epsilon` and `min_samples` and store it in a variable `combinations`
- Show the count of combimations in the list `combinations`

In [157]:
def gridSearch(combinations,X):
    scores = []
    
    for i, (eps, num_samples) in enumerate(combinations):
        
        gs_model = DBSCAN(eps=eps, min_samples=num_samples)
        gs_model.fit(X)
        labels = gs_model.labels_

        if (not list(set(labels)) == [-1]) and (len(list(set(labels))) > 2):

            scr = silhouette_score(X, gs_model.labels_,metric="euclidean",sample_size=1000,random_state=200)
            scores.append(scr)
            print(f"at iteration {i}, silhoutte_score={scr}, we found number of clusters={len(list(set(labels)))}")

    best_index = np.argmax(scores)
    best_parameter = combinations[best_index]

    print(f"\nbest values are : eps={best_parameter[0]}, min_samples={best_parameter[1]}")
    
    return best_parameter[0], best_parameter[1]

- Call the `gridSearch` method and find the best hyperparameters

##### Step 5:

Now that we have our best hyperparameter values, we can use them to make a **DBSCAN** model.
- Call the `DBSCAN()` method, set the best hyperparameter values and store in a variable `model`
- Fit the `model` on the **ndarray** `X`.

- Find the unique values of labels of model using `set` method. Display these values

- Add a new column `Labels` in the dataframe `df`.
- Display the first **5** rows of dataframe `df`.

In [162]:
color_map = {-1:'gray', 0:'red',1:'blue',2:'green',3:'yellow',4:'orange', 5:'pink',6:'purple'}

##### Step 6:

- Iterate over the columns `Lat`, `Long` and `Labels` using `zip` function.
    - Copy and paste **label = folium.Popup('Cluster : '+str(lbl),parse_html=True)**
    - Use `folium.CircleMaker` as you used before.
        - `color=color_map[lbl]`
        - `fill=True`
        - `radius=1`
        - `popup=label`

##### Step 7:

- Save the model as we always do.

##### Real World Use-Case:

Now that you have trained a clustering model, let us test on real-world use-case to identify the relevant clusters.

- read the csv file `weather_data_rwi.csv` and store in the variable `df_rwi`. keep only these columns : `Stn_Name` `Lat` `Long` `Tm` `Tx` `Tn`.
- check the missing values. If you find the missing values, drop the **rows** having atleast 1 missing value.
- keep only these columns  `Lat` `Long` `Tm` `Tx` `Tn` and store in a new variable `X`,rather then `df`.
    - transform the dataframe `X` into **multidimensional array** using [nan_to_num](https://numpy.org/doc/2.0/reference/generated/numpy.nan_to_num.html)
    - **fit_transform** the **ndarray** `X` using **StandardScaler** and store in variable `X`
    - display the **ndarray** `X`
- load the saved model and store it in variable `loaded_model`.
- follow step 5 and step 6 to predict on the `loaded_model` and visualize it on `map_canada` 