# Main sandbox for R&D

## Initialization
<p> All imports goes here </p>

In [1]:
import pandas as pd
import osmnx as ox
import folium
import geopandas as gpd
from shapely.geometry import Point, Polygon

ox.config(log_console=True, use_cache=True)

  ox.config(log_console=True, use_cache=True)


## Raw Data Visualization
<p> First off we are going to take a peek at data on the map and see what we are dealing with </p>

In [2]:
# Reading the given csv file
coord_df = pd.read_csv("../data/rides-data.csv", index_col=0).reset_index(drop=True)
coord_df

Unnamed: 0,id,origin_lat,origin_lng,destination_lat,destination_lng
0,8050084674,35.769844,51.366798,35.761581,51.403202
1,8055547548,35.698536,51.490612,35.658817,51.397949
2,8052893242,35.729149,51.554649,35.742706,51.565334
3,8067231026,35.685909,51.419937,35.745754,51.420338
4,8051092783,35.742085,51.438961,35.603508,51.400173
...,...,...,...,...,...
99995,8074092556,35.787411,51.503502,35.795544,51.498772
99996,8064670176,35.763897,51.348629,35.775799,51.347256
99997,8050188722,35.685860,51.414677,35.784729,51.353745
99998,8071806219,35.738853,51.467167,35.745167,51.398777


### Reshaping the data for viewing.
<p> At first glance we don't need the type of the coordinates (Origin, Destination) as so if people are going to the crowded places, namely, universities and malls they probably are going to also use Snapp to get back. So overall we want to see the density over these places whether they are going to them or coming from them </p>

In [3]:
# Reshaping the DataFrame
df_long = pd.melt(
    coord_df,
    id_vars=["id"],
    value_vars=["origin_lat", "origin_lng", "destination_lat", "destination_lng"],
    var_name="type",
    value_name="coordinate",
)

# Split 'type' into two columns 'location_type' and 'coord_type'
df_long["location_type"] = df_long["type"].apply(lambda x: x.split("_")[0])
df_long["coord_type"] = df_long["type"].apply(lambda x: x.split("_")[1])

# Pivot the table to get 'lat' and 'long' in separate columns
df_points = df_long.pivot_table(
    index=["id", "location_type"],
    columns="coord_type",
    values="coordinate",
    aggfunc="first",
).reset_index()

# Rename columns for clarity
df_points.columns = ["id", "coord_type", "lat", "long"]

df_points.to_csv("../data/exploded_rides_data.csv")

<p> So by looking at map specially the places mentioned in the hint part we see that there is a high density of points there. We can assume that the task of finding entrances of these crowded places is an unspervised learning problem,
specifcally we need clustering mecahnism for points around these and the centroids of the clusters probably can give us a good estimates on the entrances. </p>

## Point Filtering

<p>
So now we want to try and find the entrances of these places mentioned in the hint part. Let's sit back and think a little about how we can do so. As mentioned we want to perform clustering on the data.
But before doing so on which points should we do the clustering?

The obvious exlusion is that we should not use all available points to us, because doing so leads to great error. Also not only malls and universities are crowded. There are other crowded places like squares (Enghelab, Azadi), intersections (valiasr) and more which don't have entrances.

So first we need to filter the points. But how can we do so?

One solution that comes to mind is to find the center of the place and then set a radius for example 100 meters and filter the points from the rides using this and then perform clustering.

There is a princial flaw to this method: The circle we choose as boundary may not contain relevent points or worst case any points at all depending on the structure of the place.
Let's illustrate this by some images.

<img src="../images/bad-boundary1.svg" width="400" height="400">
<br>
<small>Bad Boundary Example: The building is too long and the circle boundary does not contain relevent points</small>
<br><br>
<img src="../images/bad-boundary2.svg" width="400" height="400"><br>
<small>Bad Boundary Example. The center is outside the building structure itself.</small>

We may increase the 100 meters radius mentioned but this may lead to finding extra points that not relevent to that specific place, like if two malls are close to each other if the defined radius threshold is too high the boundary of one mall may contain points from other mall (Near San'at square exists several malls that are near each other like Setin, Milad Noor and Lidoma) and that leads to error in clustering.

Also some points that are inside of the building gets included by this approach (entrances are not placed inside a place).
As you see this method will not work for this case.

So what other options do we have?

How about choosing points that reside at the boundries of the place with a certain distance (Buffered Polygon)?
This is a more intuitive and logical approach as it is shape-agnostic and does not depened on the structure of the building. Also it reduces the chance of overlapping for places that are near each other (This of course needs a good distance value from polygon boundaries and is a heuristic that needs to be calculated carefully).
Another merit of this approach is that it does not contain the points inside of the building.<br>
<b>NOTE:</b> By looking at the points visualization there are not many points inside of a building and these can probably be removed using outlier detection methods, but this is prone to error and buffered polyon can be assumed as a safer and more accurate approach.


So overall by using this we can filter more accurate points that can be used in clustering for finding entrances.


<img src="../images/iranmall-buffer.png" width="488" height="348">
<br>
<small>Buffered Iran Mall Region</small>

So to summarize we discussed two methods for point filtering:
<ul>
    <li>
        <b>Circular Boundary From The Center</b><br>
        <small>Disadvantages</small>
        <ul>
            <li>May not contain all points</li>
            <li>May not contain any points at all</li>
            <li>May overlap with points of other places</li>
            <li>Contains points inside of a place</li>
        </ul>
        <small>Advantages</small>
        <ul>
            <li>Easy to implement and understand</li>
        </ul>
    </li>
    <li>
        <b>Buffered Polygon</b><br>
        <small>Disadvantages</small>
        <ul>
            <li>Harder to implement</li>
        </ul>
        <small>Advantages</small>
        <ul>
            <li>More intuitive</li>
            <li>Contains relevent points</li>
            <li>Exludes points inside of a place</li>
        </ul>
    </li>
</ul>
</p>

<p> Now let's dive into coding </p>

### Finding polygons of places

In [None]:
# Use queries to find polygons
place_query = 'South Terminal, Tehran, Iran'

# Fetch the geometries data
results = ox.features_from_place(place_query, tags={"place": True})
results

<p>
These sources where used for finding the OSM Ids:
<ul>
    <li><a href=https://www.openstreetmap.org>Open Street Map </a></li>
    <li><a href=https://nominatim.openstreetmap.org> Nominatim API </a></li>
<ul>
</p>

In [10]:
# Obtained by searching through google maps and OSM results.
places_meta = [
    {"place": "Opal", "osm_query": "W498492266"},
    {"place": "Koroush", "osm_query": "W320902874"},
    {"place": "Iran Mall", "osm_query": "R8129683"},
    {"place": "Paladium", "osm_query": "W678453222"},
    {"place": "Mehr Abad", "osm_query": "W175770954"},
    {"place": "West Terminal", "osm_query": "W182016096"},
    {"place": "Imam Khomeini Hospital", "osm_query": "W191445129"},
    {"place": "Shariati Hospital", "osm_query": "W438148006"},
    {"place": "Technical Faculties of Tehran University", "osm_query": "W385628505"},
]

### Visualizing buffered polygons

In [59]:
# Initialize the map with Azadi square
m = folium.Map(location=[35.699704, 51.337433], zoom_start=15)
entries = []

for place_meta in places_meta:
    # Fetch geometries from OSM
    polygon = ox.geocode_to_gdf(place_meta["osm_query"], by_osmid=True)
    # Project to UTM zone appropriate for Tehran (for accurate distance measurements)
    polygon = polygon.to_crs(epsg=32639)
    # Buffer the polygon by meters
    buffered_polygon = polygon["geometry"].buffer(50)

    # Project back to WGS84 for mapping
    polygon = polygon.to_crs(epsg=4326)
    buffered_polygon = buffered_polygon.to_crs(epsg=4326)

    # Adding the polygon to the map
    folium.GeoJson(polygon, name="Polygons").add_to(m)
    # Add the buffered polygon to the map
    folium.GeoJson(
        buffered_polygon,
        name="Buffers",
        style_function=lambda x: {
            "color": "red",
            "fillColor": "red",
            "fillOpacity": 0.1,
        },
    ).add_to(m)

    # Append data to the GeoDataFrame
    temp_gdf = gpd.GeoDataFrame(
        {
            "name": [place_meta["place"]],
            "original_polygon": [polygon.geometry.iloc[0]],
            "buffered_polygon": [buffered_polygon.geometry.iloc[0]],
        },
        geometry="original_polygon",
        crs="EPSG:4326",
    )
    entries.append(temp_gdf)
# The geoDataFrame containing original and buffered polygons for each place.
gdf = pd.concat(entries, ignore_index=True)

# Visualizing buffered polygons
folium.LayerControl().add_to(m)
m.save("/home/reza/Desktop/snapp-task/buffered_polygons.html")

In [60]:
gdf

Unnamed: 0,name,original_polygon,buffered_polygon
0,Opal,"POLYGON ((51.35069 35.77713, 51.35075 35.77685...",POLYGON ((51.35036078855301 35.777490128459654...
1,Koroush,"POLYGON ((51.31350 35.73830, 51.31369 35.73827...","POLYGON ((51.3129558755989 35.73837460289069, ..."
2,Iran Mall,"POLYGON ((51.18893 35.75469, 51.18955 35.75195...",POLYGON ((51.188825132572084 35.75512866221741...
3,Paladium,"POLYGON ((51.41324 35.79645, 51.41335 35.79616...","POLYGON ((51.41275653107516 35.79666532156729,..."
4,Mehr Abad,"POLYGON ((51.26210 35.68252, 51.26216 35.68238...","POLYGON ((51.26155839976606 35.68262503484549,..."
5,West Terminal,"POLYGON ((51.33123 35.70714, 51.33157 35.70185...","POLYGON ((51.33085766113199 35.70747283366887,..."
6,Imam Khomeini Hospital,"POLYGON ((51.37803 35.70742, 51.37980 35.70737...","POLYGON ((51.37748040440223 35.70742176466901,..."
7,Shariati Hospital,"POLYGON ((51.38520 35.71957, 51.38697 35.71986...",POLYGON ((51.38465163987453 35.719600464003385...
8,Technical Faculties of Tehran University,"POLYGON ((51.38450 35.72639, 51.38453 35.72506...",POLYGON ((51.38445160385868 35.726836320107886...


### Assigning Points

In [61]:
# Converting points to Geo Dataframe
points_geodf = gpd.GeoDataFrame({'geometry': [Point(i["long"], i["lat"]) for i in df_points[["long","lat"]].to_dict("records")]}, crs="EPSG:4326")
points_geodf

In [67]:
# Finding each point belong to which place based on buffered polygon
points_geodf["belongs_to"] = None
for idx, row in gdf.iterrows():
    within_mask = points_geodf.within(row["buffered_polygon"])
    points_geodf.loc[
        within_mask & points_geodf["belongs_to"].isnull(), "belongs_to"
    ] = row["name"]
points_geodf = (
    points_geodf[["geometry", "belongs_to"]]
    .dropna(subset=["belongs_to"])
    .reset_index(drop=True)
)
points_geodf.to_csv("../data/points_geodf.csv", index=False)
points_geodf

Unnamed: 0,geometry,belongs_to
0,POINT (51.19345 35.75567),Iran Mall
1,POINT (51.35123 35.77675),Opal
2,POINT (51.33088 35.68904),Mehr Abad
3,POINT (51.35137 35.77664),Opal
4,POINT (51.32251 35.69165),Mehr Abad
...,...,...
2983,POINT (51.35124 35.77662),Opal
2984,POINT (51.31413 35.73798),Koroush
2985,POINT (51.32312 35.69128),Mehr Abad
2986,POINT (51.32174 35.69170),Mehr Abad


### Ride Points Visualization
<p> Based on buffered polygon </p>

In [64]:
# Visualizing the points on map
for idx, point in points_geodf.iterrows():
    folium.Marker(
        location=[point.geometry.y, point.geometry.x],
        popup=f"{point["belongs_to"]}",
        icon=folium.Icon(color="green", icon="info-sign"),
    ).add_to(m)
m.save("/home/reza/Desktop/snapp-task/points_inside_buffer.html")


## ML
<p> This is where the magic happens! </p>