<a href="https://colab.research.google.com/github/kelsdoerksen/giga-connectivity/blob/main/CSP_Extraction_For_Connectivity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## C01 - Use CSP embeddings

Simple example of how to obtain pretrained CSP embeddings. Read the paper here:[https://arxiv.org/abs/2305.01118](https://arxiv.org/abs/2305.01118). Note that this notebook needs to be run with GPU enabled. To do this got to: "Runtime -> Change runtime type"

In [None]:
!rm -r sample_data .config # Empty current directory
!git clone https://github.com/gengchenmai/csp.git . # Clone CSP repository

Import required packages.

In [None]:
import numpy as np
import pandas as pd
import torch

import sys
sys.path.append('./main')

from main.utils import *
from main.models import *

Write helper function to load CPS models from checkpoint.

In [None]:
def get_csp(path):
    pretrained_csp = torch.load(path, map_location=torch.device('cpu'))

    params = pretrained_csp['params']
    loc_enc = get_model(
                            train_locs = None,
                            params = params,
                            spa_enc_type = params['spa_enc_type'],
                            num_inputs = params['num_loc_feats'],
                            num_classes = params['num_classes'],
                            num_filts = params['num_filts'],
                            num_users = params['num_users'],
                            device = params['device'])

    model = LocationImageEncoder(loc_enc = loc_enc,
                        train_loss = params["train_loss"],
                        unsuper_loss = params["unsuper_loss"],
                        cnn_feat_dim = params["cnn_feat_dim"],
                        spa_enc_type = params["spa_enc_type"]).to(params['device'])

    model.load_state_dict(pretrained_csp['state_dict'])

    return model

Download pretrained models. For details see here: [https://gengchenmai.github.io/csp-website/](https://gengchenmai.github.io/csp-website/)

In [None]:
!wget -O model_dir.zip 'https://www.dropbox.com/s/qxr644rj1qxekn2/model_dir.zip?dl=1'

In [None]:
!unzip model_dir.zip

Load CSP model. Using CSP model that is pre-trained grid location encoder on unlabelled fMoW training dataset

In [None]:
path = '/content/model_dir/model_fmow/model_fmow_gridcell_0.0010_32_0.1000000_1_512_gelu_UNSUPER-contsoftmax_0.000050_1.000_1_0.100_TMP1.0000_1.0000_1.0000.pth.tar'
model = get_csp(path)

In [None]:
# Get [lon, lat] of schools as float.64 tensor to extract embeddings for

def get_coords(df):
  """
  Function to return coords of school locations
  as 2D tensor to extract GeoCLIP embeddings for
  in order lon, lat
  """

  total_coords = []
  for i in range(len(df)):
    coord = torch.tensor((df.loc[i]['lon'], df.loc[i]['lat']))
    total_coords.append(coord)

  locations = torch.stack(total_coords)

  return locations

In [None]:
# Processing data for locations for the embeddings to be extracted from
aoi = 'BWA'
split = 'Testing'
aoi_df = pd.read_csv('{}Data_uncorrelated_fixed.csv'.format(split))

In [None]:
# Get coordinates for aoi of interest
coords = get_coords(aoi_df)

Use CSP model to obtain location embeddings.

In [None]:
model.eval()
with torch.no_grad():
    emb = model.loc_enc(convert_loc_to_tensor(coords.numpy()),return_feats=True).detach().cpu()

In [None]:
emb.shape

In [None]:
identifying_info_df = aoi_df[['giga_id_school', 'connectivity', 'lat', 'lon']]
emb_df = pd.DataFrame(emb.numpy())

emb_df_labelled = pd.concat([identifying_info_df, emb_df], axis=1)
emb_df_labelled['data_split'] = split

In [None]:
# Export to dataframe
emb_df_labelled.to_csv('{}_CSPfMoW_embeddings_{}.csv'.format(aoi, split))