<a href="https://colab.research.google.com/github/lawsonk16/Remote-Sensing-Datasets/blob/main/FAIR1M/FAIR1M_GSD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Estimating GSDs in FAIR1M

Since FAIR1M doesn't provide GSD values for its imagery, if we want to use GSD based techniques like GSD normalization or perform other similar experiments, we have to estimate the GSD using the information we already have.

In this notebook, I am going to look over the size ranges in pixels of planes across the dataset, and then develop a method for guesstimating GSD using the known sizes of those plane models and their mean and mode sizes within the dataset in pixels.

In [3]:
import os
import json
import pandas as pd

import sys
notebook_folders = ['/content/drive/MyDrive/Colab Notebooks/scripts/']

for folder in notebook_folders:
    sys.path.append(folder)

from coco_utils.coco_help import *

## Plane Counts
First, we get some basic data about how numerous each category of plane is within this dataset, sorted

In [4]:
ann_fp = '/content/drive/MyDrive/Colab Notebooks/Clean Datasets/FAIR1M/fair1m_coco.json'

In [30]:
cat_counts = get_category_counts(ann_fp)
count_df = pd.DataFrame.from_dict(cat_counts, orient='index').reset_index(drop=False)
count_df.columns = ['category', 'count']
plane_df = count_df[(count_df['category'].str.contains('Boeing')) | (count_df['category'].str.startswith('A'))].sort_values('count', ascending=False)
plane_df

Unnamed: 0,category,count
12,A220,6057
9,Boeing737,3949
11,A321,2505
10,Boeing747,1673
8,Boeing787,1669
21,A330,1599
18,Boeing777,1532
20,A350,1064
22,ARJ21,166


## Plane Sizes

Now, we see what kind of size distribution there is in pixels across the longest sizes of these planes

In [31]:
with open(ann_fp, 'r') as f:
  json_contents = json.load(f)

In [35]:
anns = json_contents['annotations']

In [62]:
# first, make a dataframe
anns_df = pd.DataFrame.from_dict(anns)

# second, find the longest side of each annotation
anns_df['width'] = anns_df.bbox.apply(lambda x: x[2])
anns_df['height'] = anns_df.bbox.apply(lambda x: x[3])
anns_df['longest_side'] = anns_df[["width", "height"]].max(axis=1)
anns_df.head()

Unnamed: 0,id,image_id,category_id,area,segmentation,bbox,iscrowd,width,height,longest_side
0,0,2167,33,,"[[580, 800], [534, 800], [534, 723], [580, 722...","[534, 722, 46, 78]",0,46,78,78
1,1,2167,10,,"[[312, 20], [303, 34], [296, 30], [304, 15], [...","[296, 15, 16, 19]",0,16,19,19
2,2,2167,11,,"[[331, 30], [321, 45], [314, 41], [324, 26], [...","[314, 26, 17, 19]",0,17,19,19
3,3,2167,11,,"[[353, 32], [343, 48], [337, 44], [347, 28], [...","[337, 28, 16, 20]",0,16,20,20
4,4,2167,11,,"[[363, 38], [355, 51], [349, 47], [357, 34], [...","[349, 34, 14, 17]",0,14,17,17


In [63]:
# third, map the category ids to the category names
cats = json_contents['categories']
cats_df = pd.DataFrame.from_dict(cats)
cats_dict = cats_df.set_index('id').to_dict()['name']

anns_df['name'] = anns_df['category_id'].map(cats_dict)

# fourth, keep a set of key columns you actually care about
anns_df = anns_df[['longest_side', 'name']]

anns_df.head()

Unnamed: 0,longest_side,name
0,78,Tennis Court
1,19,Small Car
2,19,Van
3,20,Van
4,17,Van


In [50]:
anns_df = anns_df[anns_df['name'].isin(plane_df["category"].values.tolist())]
anns_df.head()

Unnamed: 0,longest_side,name
244,104,Boeing787
245,67,Boeing737
246,68,Boeing737
247,123,Boeing747
248,70,A321


In [58]:
from scipy import stats as st

In [69]:
cat_list = plane_df["category"].values.tolist()
size_dict = {}
df_list = []
for c in cat_list:
    size_dict[c] = anns_df['longest_side'][anns_df['name']== c].to_list()

for c in cat_list:
    ind_dict = {'name': c, 'max': np.max(size_dict[c]), 'min': np.min(size_dict[c]), 'mean': np.mean(size_dict[c]), 'mode': st.mode(size_dict[c])[0][0], 'std': np.std(size_dict[c])}
    df_list.append(ind_dict)

stats_df = pd.DataFrame(df_list)
stats_df['plane_length_m'] = [34.90, 39.5, 44.5, 70.6, 60, 63.7, 63.7, 60.7, 35]
stats_df['gsd_mean'] = stats_df['plane_length_m']/stats_df['mean']
stats_df['gsd_mode'] = stats_df['plane_length_m']/stats_df['mode']
stats_df['gsd_std'] = stats_df['plane_length_m']/stats_df['std']
stats_df

Unnamed: 0,name,max,min,mean,mode,std,plane_length_m,gsd_mean,gsd_mode,gsd_std
0,A220,132,0,63.183424,64,10.344194,34.9,0.55236,0.545312,3.373873
1,Boeing737,116,8,64.365915,65,9.96032,39.5,0.613679,0.607692,3.965736
2,A321,124,0,70.33014,69,11.162597,44.5,0.63273,0.644928,3.986527
3,Boeing747,215,79,115.303048,116,18.813194,70.6,0.6123,0.608621,3.752686
4,Boeing787,184,63,99.928101,107,17.211116,60.0,0.600432,0.560748,3.486119
5,A330,178,65,104.804878,108,16.853693,63.7,0.607796,0.589815,3.779587
6,Boeing777,178,66,105.741514,107,15.213314,63.7,0.602412,0.595327,4.187122
7,A350,230,77,112.192669,100,17.850399,60.7,0.541034,0.607,3.400484
8,ARJ21,71,43,53.26506,52,6.207427,35.0,0.657091,0.673077,5.638407
