<a href="https://colab.research.google.com/github/naguzmans/opportunistic-atm/blob/master/03_metadata_to_week_array.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mount Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Transform Metadata Into a Week Array

The goal of this code is to create an array of shape (weeks, hours, 25 features) for training the model later.

**Lists folders containing data**

In [None]:
import glob

# Home folder
%cd /content/drive/MyDrive/opportunistic-utm

# List of folders with data
folder_list = []
for folder in glob.iglob('dataset/**/**/', recursive=False):
  folder_list.append(folder)

folder_list[:5]

/content/drive/MyDrive/opportunistic-utm


['dataset/2019-01/2019-01-01/',
 'dataset/2019-01/2019-01-02/',
 'dataset/2019-01/2019-01-03/',
 'dataset/2019-01/2019-01-04/',
 'dataset/2019-01/2019-01-05/']

**Gets a list of each week of 7 days**

In [None]:
import datetime
import numpy as np
import os

# Gets date from each folder and appends each Monday of a date to a list
# The goal is to get the number of weeks present in the list
monday_list = []
for folder in folder_list:
  folder_date = os.path.basename(os.path.normpath(folder))
  date = datetime.datetime.strptime(folder_date, '%Y-%m-%d')
  monday_date = date - datetime.timedelta(days=date.weekday())
  monday_list.append(monday_date)

# Filters the list by unique monday values and complete weeks of 7 days
monday_list = np.unique(np.array(monday_list), return_counts=True)
filter = np.where(monday_list[1] == 7, True, False)
monday_list = monday_list[0][filter].tolist()

**Extracts the data from full 7-day weeks**

In [None]:
from tqdm import tqdm

# Data array is created by looping each Monday through 7 days of the week
index = 0
metadata_array = []

# Iterates through Monday list
for monday in tqdm(monday_list):
  hours_array = np.empty((1,25))
  
  # Iterates over 7 days of that Monday week
  for day in range(7):
    day_date = monday + datetime.timedelta(days=day)
    day_str = day_date.strftime('%Y-%m-%d')
    month_str = day_date.strftime('%Y-%m')
    files = np.sort(glob.glob(f'dataset/{month_str}/{day_str}/metadata/*.npy', recursive=False))
    
    # Iterates over 24 hours of every day
    for i in range(0, 24):
      next_hour = np.expand_dims(np.load(files[i]), axis=0)
      hours_array = np.concatenate((hours_array, next_hour), axis=0)
  metadata_array.append(hours_array[1:])

# Transforms to Numpy array
metadata_array = np.array(metadata_array)
metadata_array.shape

100%|██████████| 38/38 [20:21<00:00, 32.14s/it]


(38, 168, 25)

# Save Results

**Save Metadata Array**

In [None]:
np.save(f'dataset/00_results/metadata_array_{metadata_array.shape[0]}.npy', metadata_array)