<a href="https://colab.research.google.com/github/kelsdoerksen/giga-connectivity/blob/main/GeoCLIP_Extraction_For_Connectivity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## C02 - Use GeoCLIP embeddings

Simple example of how to obtain pretrained GeoCLIP embeddings. Read the paper here:[https://arxiv.org/abs/2309.16020](https://arxiv.org/abs/2309.16020). First install the geoclip package (see [https://github.com/VicenteVivan/geo-clip](https://github.com/VicenteVivan/geo-clip))

In [1]:
!pip install geoclip

Collecting geoclip
  Downloading geoclip-1.2.0-py3-none-any.whl (40.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.3/40.3 MB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch->geoclip)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch->geoclip)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch->geoclip)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch->geoclip)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch->geoclip)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from tor

In [2]:
from geoclip import LocationEncoder
import torch
import torch.nn as nn
import pandas as pd
import ast

Load the pretrained model directly.

In [3]:
model = LocationEncoder()

Obtain GeoCLIP location embeddings.

In [4]:
# Get [lon, lat] of schools as float.64 tensor to extract embeddings for

def get_coords(df):
  """
  Function to return coords of school locations
  as 2D tensor to extract GeoCLIP embeddings for
  in order lon, lat
  """

  total_coords = []
  for i in range(len(df)):
    coord = torch.tensor((df.loc[i]['lon'], df.loc[i]['lat']))
    total_coords.append(coord)

  locations = torch.stack(total_coords)

  return locations

In [5]:
# Processing data for locations for the embeddings to be extracted from
RWA_df = pd.read_csv('RWA_id_info.csv')

In [6]:
# Get coordinates for aoi of interest
coords = get_coords(RWA_df)

In [7]:
model.eval()
with torch.no_grad():
  x = model(coords.flip(1).float()).detach().cpu()

In [8]:
identifying_info_df = RWA_df[['giga_id_school', 'connectivity', 'lat', 'lon', 'split', 'fid']]
emb_df = pd.DataFrame(x.numpy())

In [9]:
emb_df_labelled = pd.concat([identifying_info_df, emb_df], axis=1)

In [10]:
# Split into Train/Test/Val
emb_train = emb_df_labelled[emb_df_labelled['split'] =='Train']
emb_test = emb_df_labelled[emb_df_labelled['split'] =='Test']
emb_val = emb_df_labelled[emb_df_labelled['split'] =='Val']

In [11]:
# Export to dataframe
emb_train.to_csv('RWA_GeoCLIP_embeddings_TrainingData.csv')
emb_test.to_csv('RWA_GeoCLIP_embeddings_TestingData.csv')
emb_val.to_csv('RWA_GeoCLIP_embeddings_ValData.csv')