<a href="https://colab.research.google.com/github/kelsdoerksen/giga-connectivity/blob/main/CSP_Extraction_For_Connectivity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## C01 - Use CSP embeddings

Simple example of how to obtain pretrained CSP embeddings. Read the paper here:[https://arxiv.org/abs/2305.01118](https://arxiv.org/abs/2305.01118). Note that this notebook needs to be run with GPU enabled. To do this got to: "Runtime -> Change runtime type"

In [1]:
!rm -r sample_data .config # Empty current directory
!git clone https://github.com/gengchenmai/csp.git . # Clone CSP repository

Cloning into '.'...
remote: Enumerating objects: 86, done.[K
remote: Counting objects: 100% (86/86), done.[K
remote: Compressing objects: 100% (69/69), done.[K
remote: Total 86 (delta 26), reused 75 (delta 15), pack-reused 0[K
Receiving objects: 100% (86/86), 1.58 MiB | 11.33 MiB/s, done.
Resolving deltas: 100% (26/26), done.


Import required packages.

In [2]:
import numpy as np
import pandas as pd
import torch

import sys
sys.path.append('./main')

from main.utils import *
from main.models import *

Write helper function to load CPS models from checkpoint.

In [3]:
def get_csp(path):
    pretrained_csp = torch.load(path, map_location=torch.device('cpu'))

    params = pretrained_csp['params']
    loc_enc = get_model(
                            train_locs = None,
                            params = params,
                            spa_enc_type = params['spa_enc_type'],
                            num_inputs = params['num_loc_feats'],
                            num_classes = params['num_classes'],
                            num_filts = params['num_filts'],
                            num_users = params['num_users'],
                            device = params['device'])

    model = LocationImageEncoder(loc_enc = loc_enc,
                        train_loss = params["train_loss"],
                        unsuper_loss = params["unsuper_loss"],
                        cnn_feat_dim = params["cnn_feat_dim"],
                        spa_enc_type = params["spa_enc_type"]).to(params['device'])

    model.load_state_dict(pretrained_csp['state_dict'])

    return model

Download pretrained models. For details see here: [https://gengchenmai.github.io/csp-website/](https://gengchenmai.github.io/csp-website/)

In [4]:
!wget -O model_dir.zip 'https://www.dropbox.com/s/qxr644rj1qxekn2/model_dir.zip?dl=1'

--2024-07-22 21:28:58--  https://www.dropbox.com/s/qxr644rj1qxekn2/model_dir.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.81.18, 2620:100:6031:18::a27d:5112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.81.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/rdig1ezmywm9avc8qubi6/model_dir.zip?rlkey=p8k16a5ifi69e08rnvu8rvt86&dl=1 [following]
--2024-07-22 21:28:58--  https://www.dropbox.com/scl/fi/rdig1ezmywm9avc8qubi6/model_dir.zip?rlkey=p8k16a5ifi69e08rnvu8rvt86&dl=1
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc0c9855630aba94f9c2cddfef42.dl.dropboxusercontent.com/cd/0/inline/CXOpDEXIcCenR2SR-ke8SO33rtanhYTiR5R1AQGsmJxOdzfV4KBFpT-GNONUTxUXL_f_bXQFHF-kxNN0MblM90y613cPiSsuW8zPDnYOaufvUqFhbI0rmmKAonEFz0F3Gypsc2zxlpT3u2vlrQDfRQ8p/file?dl=1# [following]
--2024-07-22 21:28:59--  https://uc0c9855630aba94f9c2cddfef42.dl.dropboxuse

In [5]:
!unzip model_dir.zip

Archive:  model_dir.zip
   creating: model_dir/
  inflating: __MACOSX/._model_dir    
  inflating: model_dir/.DS_Store     
  inflating: __MACOSX/model_dir/._.DS_Store  
   creating: model_dir/model_inat_2018/
  inflating: __MACOSX/model_dir/._model_inat_2018  
   creating: model_dir/model_fmow/
  inflating: __MACOSX/model_dir/._model_fmow  
  inflating: model_dir/model_inat_2018/.DS_Store  
  inflating: __MACOSX/model_dir/model_inat_2018/._.DS_Store  
  inflating: model_dir/model_inat_2018/model_inat_2018_gridcell_0.0010_32_0.1000000_1_512_leakyrelu_contsoftmax_ratio0.050_0.000500_1.000_1_1.000_TMP20.0000_1.0000_1.0000.pth.tar  
  inflating: __MACOSX/model_dir/model_inat_2018/._model_inat_2018_gridcell_0.0010_32_0.1000000_1_512_leakyrelu_contsoftmax_ratio0.050_0.000500_1.000_1_1.000_TMP20.0000_1.0000_1.0000.pth.tar  
  inflating: model_dir/model_inat_2018/model_inat_2018_gridcell_0.0010_32_0.1000000_1_512_leakyrelu_UNSUPER-contsoftmax_0.000500_1.000_1_1.000_TMP20.0000_1.0000_1.0000.pt

Load CSP model. Using CSP model that is pre-trained grid location encoder on unlabelled fMoW training dataset

In [6]:
path = '/content/model_dir/model_fmow/model_fmow_gridcell_0.0010_32_0.1000000_1_512_gelu_UNSUPER-contsoftmax_0.000050_1.000_1_0.100_TMP1.0000_1.0000_1.0000.pth.tar'
model = get_csp(path)

  nn.init.xavier_uniform(self.linear.weight)


In [7]:
# Get [lon, lat] of schools as float.64 tensor to extract embeddings for

def get_coords(df):
  """
  Function to return coords of school locations
  as 2D tensor to extract GeoCLIP embeddings for
  in order lon, lat
  """

  total_coords = []
  for i in range(len(df)):
    coord = torch.tensor((df.loc[i]['lon'], df.loc[i]['lat']))
    total_coords.append(coord)

  locations = torch.stack(total_coords)

  return locations

In [8]:
# Processing data for locations for the embeddings to be extracted from
RWA_df = pd.read_csv('RWA_id_info.csv')

In [9]:
# Get coordinates for aoi of interest
coords = get_coords(RWA_df)

Use CSP model to obtain location embeddings.

In [10]:
model.eval()
with torch.no_grad():
    x = model.loc_enc(convert_loc_to_tensor(coords.numpy()),return_feats=True).detach().cpu()

In [11]:
identifying_info_df = RWA_df[['giga_id_school', 'connectivity', 'lat', 'lon', 'split', 'fid']]
emb_df = pd.DataFrame(x.numpy())
emb_df_labelled = pd.concat([identifying_info_df, emb_df], axis=1)

In [12]:
# Split into Train/Test/Val
emb_train = emb_df_labelled[emb_df_labelled['split'] =='Train']
emb_test = emb_df_labelled[emb_df_labelled['split'] =='Test']
emb_val = emb_df_labelled[emb_df_labelled['split'] =='Val']

In [13]:
# Export to dataframe
emb_train.to_csv('RWA_CSP_embeddings_TrainingData.csv')
emb_test.to_csv('RWA_CSP_embeddings_TestingData.csv')
emb_val.to_csv('RWA_CSP_embeddings_ValData.csv')