goal of this notebook: introduce this project and its results

(what is k nearest p median and how to achieve it)

towards the problem, we need to define things like:
a. what type of input data belongs to? (from_cost_matrix/from_geodataframe/cost_dataframe. if from_geodataframe, the way to query the nearest facilities in spatial area)
b. how to deal with placeholder facility? (decision variable for them, how it works. it considers the impact of far facility in a special way, no need to store the large data of them, but still indicates if the client needs them)
c. how to deal with large na values enrousing from k nearest facilities? (normal matrix → sparse matrix. plus the latter has advantages like reducing storage space)

moreover, at the level of programming design, we need to decide things like:
a. what methods we need in this new class? (update k_list, create_sparse_matrix, solve, etc...)
b. what are the parameters and attributes of the object, and how to use them during the whole process? (the model becomes one of the attributes of the object, we call it 'problem'. and each loop/iteration will refresh/update the 'problem', the k_list and the sparse matrix)
c. what part we want to show to user, which part the user don't need to know? (we want the user to get access to the latest k_list besides the normal model results. and some methods, like, update k_list, create_sparse_matrix, the user don't need to know and use)

# K Nearest P Median Problem and Implementation

## Introduction

The K Nearest P Median problem presents a unique twist on the classical P Median problem, enabling us to address location allocation challenges more effectively. In this notebook, we'll dive into the problem's formulation, explore the role of k nearest facilities, placeholder facilities, and sparse matrices, and showcase the implementation of the `KNearestPMedian` class in the `spopt` package.

### Understanding the K Nearest P Median Problem

The K Nearest P Median problem extends the concept of facility location allocation by considering both nearest and non-nearest facilities. In the article by Church (2018), it proposed this new p-median model which can distinguish between near and far facilities, use both explicit and implicit variables for capacity allocations. By implementing this model, the `spopt` package can provide more accurate and efficient solutions to spatial optimization problems, typically for the problems with large data points, and large demand volume.

The model can be formulated as:

$\begin{array}
       \displaystyle \textbf{Minimize} & \displaystyle \sum_{i \in I}\sum_{k \in k_{i}}{a_i d_{ik} X_{ik}} + \sum_{i \in I}{g_i (d_{i{k_i}} + 1)}  &&& (1) \\
       \displaystyle \textbf{Subject To} & \sum_{k \in k_{i}}{X_{ik} + g_i = 1} && \forall i \in I & (2) \\
                                            & \sum_{j \in J}{Y_j} = p                                                                                   &&                                          & (3)                                                                               \\
                                            & \sum_{i \in I}{a_i X_{ik}} \leq {Y_{k} c_{k}}                                                             &&  \forall k \in k_{i}                     & (4)                                                                               \\  
                                            & X_{ij} \leq Y_{j}                                                                                         && \forall i \in I \quad \forall j \in J    & (5)                                                                               \\
                                            & X_{ij} \in \{0, 1\}                                                                                       && \forall i \in I \quad \forall j \in J    & (6)                                                                               \\
                                            & Y_j \in \{0, 1\}                                                                                          && \forall j \in J                          & (7)                                                                               \\
                                            &                                                                                                           &&                                          &                                                                                   \\ \end{array}$
                                            
$\begin{array} \displaystyle \textbf{Where}&& i& =& \textrm{index of demand points/areas/objects in set } I\\
                                            && j                                                                                                        & =                                         & \textrm{index of potential facility sites in set } J                              \\
                                            && p                                                                                                        & =                                         & \textrm{the number of facilities to be sited}                                     \\
                                            && a_i                                                                                                      & =                                         & \textrm{service load or population demand at client location } i                  \\
                                            && k_{i}                                                                                                    & =                                         & \textrm{the } k \textrm{ nearest facilities of client location } i                        \\
                                            && c_{j}                                                                                                    & =                                         & \textrm{the capacity of facility} j                                               \\   
                                            && d_{ij}                                                                                                   & =                                         & \textrm{shortest distance or travel time between locations } i \textrm{ and } j   \\
                                            && X_{ij}                                                                                                   & =                                         & \begin{cases}
                                                                                                                                                                                                       1, \textrm{if client location } i \textrm{ is served by facility } j             \\
                                                                                                                                                                                                       0, \textrm{otherwise}                                                            \\
                                                                                                                                                                                                      \end{cases}                                                                       \\
                                            && Y_j                                                                                                      & =                                         & \begin{cases}
                                                                                                                                                                                                       1, \textrm{if a facility is sited at location } j                                \\
                                                                                                                                                                                                       0, \textrm{otherwise}                                                            \\
                                                                                                                                                                                                      \end{cases}                                                                       \\ 
                                            && g_i                                                                                                      & =                                         & \begin{cases}
                                                                                                                                                                                                       1, \textrm{if the client } i {need to be served by non-k-nearest facilities}     \\
                                                                                                                                                                                                       0, \textrm{otherwise}                                                            \\
                                                                                                                                                                                                      \end{cases}                                                                       \\ \end{array}$

### Input Data Types: A Flexible Approach / Kdtree to get the nearest facilities

To cater to diverse scenarios, the `KNearestPMedian` class supports multiple input data types:

1. **From Cost Matrix:** Utilize precomputed distances between demand points and facility locations.
2. **From GeoDataFrame:** Leverage geospatial data for dynamic distance calculations based on a chosen metric.
3. **Cost DataFrame:** Employ a flexible cost matrix that accommodates various distance metrics.

### Placeholder Facilities: Enhancing Solutions

The K Nearest P Median problem introduces "placeholder facilities" to address scenarios where non-nearest facilities can yield better solutions. These placeholders bridge the gap between nearest facilities and strategically located options based on constraints.

### Efficiency with Sparse Matrices

Managing large datasets efficiently is paramount. The `KNearestPMedian` class employs sparse matrices to optimize memory usage while retaining computational efficiency for distance calculations.

## The `KNearestPMedian` Class Implementation

To exemplify these concepts, let's explore the `KNearestPMedian` class and its key methods:

```python
# Place the code from your implementation here
