This Notebook filters label chunk files based upon their percentage of pixels that are labeled as rocks. It can generate a file that can be used by a segnet model as a data chunk file for training, validation, and testing. 
To use, specify a path to save the output file, and choose a rock threshold percentage in the third cell.

Configuration cell

In [0]:
output_name = "one_percent" # output path saved to var named "output_path_dir"
percent = 1 # value from 0-100
chunk_height = 512 # pixels
chunk_width = 512 # pixels

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
import numpy as np
import os

In [0]:
chunk_options = "/content/drive/My Drive/Metadata/train.txt"
output_path_dir = "/content/drive/My Drive/Metadata/"

with open(chunk_options, 'r') as file:
  chunk_paths = file.readlines()

In [0]:
"""
Returns the n pixels where n divided by the total number of pixels (defined by input height and width) = the input percent
@param float percent: the desired pixel percentage
@param int chunk_height
@param int chunk_width

@returns int the number of pixels required to represent the desired input percent
"""
def percent_to_pixel_value(percent, chunk_height, chunk_width):
  if percent > 100:
    raise ValueError
  return int(chunk_height * chunk_width * percent / 100.0)

In [10]:
qualifying_chunks = []
chunk_count = 0
qualifying_count = 0
pixel_threshold = percent_to_pixel_value(percent, chunk_height, chunk_width)
for i in chunk_paths:
  chunk_count += 1
  paths = i.split(" /")
  if np.sum(np.load("/" + paths[1][:-1])) > pixel_threshold:
    qualifying_count +=1
    qualifying_chunks.append(i)

print("Chunks searched: {}\nQualifying chunks found: {}".format(chunk_count, qualifying_count))

Chunks searched: 5379
Qualifying chunks found: 970


In [0]:
output_path = os.path.join(output_path_dir, output_name + ".txt")
with open(output_path, 'w+') as out_file:
  out_file.writelines(qualifying_chunks)

Run this cell if you want to verify that the output was saved correctly

In [12]:
with open(output_path, 'r') as test_read:
  test = test_read.readlines()

print(len(test))

970
