# Finding the closest distance from a point to a point in each category of another dataframe

It is possible to determine the closest distance from a point to a given category. Let's do it.

---

### Preparing the environment
#### Installing the requirements.

In this repository we use classic Python libraries for data processing. However, it is important to verify that all dependencies are present.

In [None]:
!pip install -r requirements.txt

---

### Importing the core of the application.

In [None]:
import closestpoint as cp

---

### Loading dataframes

The `cp.upload_xy_data(filepath)` function receives as a parameter the path of the .csv file where the static data is stored.<br>
It is important that the longitude / East column is renamed as __X__, in capital letters, as well as the latitude / North as __Y__.<br>
In the same way, there must be an __'id'__ column that identifies each point.<br><br>
Although there are other columns, the algorithm _will only take these three, so you may want to set an id to be able to concatenate this information later.<br><br><br>
The `cp.upload_target_data(filepath)` function receives as a parameter the path of the .csv file where the target data is stored.<br>

The parameters of X, Y and id must be followed in the same way in this file. Also, for this specific file the classification column must be named __'category'__, in lowercase. This function will only take these four columns for processing.

In [None]:
listings = cp.upload_xy_data('test/csv/listing_test.csv')
venues = cp.upload_target_data('test/csv/venues_test.csv')

---

### Calculating the distances

This program uses Euclidean distances in its algorithm (the closest distance between two points is a straight line between them). In this version, it is recommended that the data be entered in __decimal degrees__, following the WGS84 geographic coordinate system (EPSG: 4326).<br><br>
In its first version, we recommend being careful with the amount of data to process, both in static data and target data, since the algorithm calculates ALL distances and then selects the smallest. Therefore, computational consumption must be taken into account.

In [None]:
data = cp.find_closest_points(listings, venues)

---

### Saving the data

It is possible to print the variable where the generated dataframe is saved. In addition, we present a function that allows saving this dataframe as a csv file, using the function `cp.save_csv(df, filepath)`.

In [None]:
cp.save_csv(data,
           'test/csv/data.csv')

---