# Schedule generation

Now that we have simulation data on the interactions between sensors and mules, we want to generate an upload schedule based on sampling frequency and batched updates.

We will collect the following thingomaboobers:

`schedule.csv`

| sensor_id | mule_id | sample_time | pickup_time | batch_time | data_length |
|:--:|:--:|:--:|:--:|:--:|:--:|
| i = 0, ..., 999 | j = 0, ..., 99 | seconds | seconds | seconds | bytes |

where `sample_time` is the time that the sensor generated the sample, `pickup_time` is the time that the mule walked into the sensor's range and picked up the packet (which is also the upload time for a non-privacy-preserving setup), and `batch_time` is the time that the packet would be uploaded if it the upload time is delayed and uploads are batched into constant sized chunks for privacy reasons. 

We will also have the following parameters that can be twiggled:

- `number of mules` - number (integer) of mules that are included in the simulation
- `number of sensors` - number (integer) of sensors that are included in the simulation
- `advertisement period` - time (in seconds) between each sensor's BLE advertisement used to discover nearby mules
- `connection time` - time (in seconds) needed for a connection to form before data can be transferred
- `ble throughput` - rate (in bytes per second) of data transfer from a sensor to a mule
- `sample period` - time (in seconds) between each sample that a sensor takes
- `sample length` - size (in bytes) of the samples a sensor transfers to a mule
- `batch period` - time (in seconds) between each batch the mule uploads
- `batch length` - size (in bytes) of the batches a mule uploads to the cloud

I think this is all we need for now to evaluate the baseline and Express. We assume that each sensor connects to only one mule at a time, while a mule can connect to an arbitrary number of sensors at a time. After a sensor connects to a mule, it stays connected and transfers data as long as the mule is within range. Once the mule leaves the sensor's range, the sensor immediately starts advertisements and looks for a new mule to form a connection with.

In [19]:
# Set parameters.
num_mules = 100
num_sensors = 1000
advertisement_period = 2.0 # seconds
connection_time = 1.5 # seconds
ble_throughput = 125000.0 # bytes per second = 1 Mbps
sample_period = 10.0 # seconds
sample_length = 128 # bytes
batch_period = 600.0 # seconds = 10 minutes
batch_length = 100000 # bytes = 100 KB

# Set save file.
save_file = 'prob_data/schedule.csv'

In [20]:
# Import libraries.
import numpy as np
import pandas as pd
import math

In [21]:
# Read in data files and downsample as necessary.
interaction_df = pd.read_csv('prob_data/interactions.csv')
smol_interactions = interaction_df.loc[(interaction_df['sensor_id'] < num_sensors) & 
                                       (interaction_df['mule_id'] < num_mules)]
# Sort the interactions by interaction time.
smol_interactions = smol_interactions.sort_values('interaction_time')

# Calculate some useful numbers.
time_per_sample = sample_length / ble_throughput 
samples_per_batch = math.floor(batch_length / sample_length)

# Start bookkeeping.
next_samples = [0 for i in range(num_sensors)] # Accumulates samples for each sensor to send.
cur_end_times = [0.0 for i in range(num_sensors)] # Keeps track of the latest action time for each sensor.
next_batches = [0 for i in range(num_mules)] # Keeps track of which batch each mule is on.
next_batch_lengths = [0 for i in range(num_mules)] # Keeps track of the size of each accumulating batch.
schedule = [] # Records our resulting upload schedule.

In [23]:
# Iterate through each row of sensor-mule interactions.
for index, row in smol_interactions.iterrows():
    # Grab the sensor and mule used in this interaction.
    cur_sensor = int(row['sensor_id'])
    cur_mule = int(row['mule_id'])
    
    # Grab some stats for this sensor.
    cur_end_time = cur_end_times[cur_sensor]
    
    # If the sensor has already taken actions beyond this time, ignore this row.
    new_end_time = row['interaction_time'] + row['interaction_duration']
    if cur_end_time >= new_end_time:
        continue
    
    # Otherwise, we advertise and attempt to start a connection.
    time_passed = max(0.0, row['interaction_time'] - cur_end_time)
    new_start_time = cur_end_time + math.ceil(time_passed / advertisement_period) * advertisement_period
    # If there is not enough time for a connection, we waste some time and move on with our lives.
    cur_end_time = new_start_time + connection_time
    if cur_end_time >= new_end_time:
        cur_end_times[cur_sensor] = new_end_time
        continue
    # Otherwise, we successfully connected to the mule and have time to do stuff.
    
    # The first thing we do is dump a bunch of accumulated samples onto the mule as fast as possible
    # as long as the mule is connected.
    next_sample = next_samples[cur_sensor]
    next_batch_length = next_batch_lengths[cur_mule]
    next_batch = next_batches[cur_mule]
    while next_sample * sample_period <= cur_end_time and cur_end_time + time_per_sample <= new_end_time:
        cur_end_time += time_per_sample
        next_batch_length += 1
        # If the current batch of samples is full, move on to the next batch.
        if next_batch_length > samples_per_batch:
            next_batch += 1
            next_batch_length = 1
        # Send the sample.
        schedule.append([cur_sensor,                  # sensor_id
                         cur_mule,                    # mule_id 
                         next_sample * sample_period, # sample_time
                         cur_end_time,                # pickup_time
                         next_batch * batch_period,   # batch_time
                         sample_length])              # data_length
        next_sample += 1
    
    # After we have done that, we continue the connection and send new samples as they come in.
    # !! If sensors are dense, samples are frequent, and samples per batch is small, this may result in
    # !! some sensors having upload priority over other sensors in a way that does not respect the 
    # !! chronological order of samples received by the mule.
    # !! I am not entirely sure how to fix this, so I will ignore it for now. 
    while next_sample * sample_period + time_per_sample <= new_end_time:
        next_batch_length += 1
        # If the current batch of samples is full, move on to the next batch.
        if next_batch_length > samples_per_batch:
            next_batch += 1
            next_batch_length = 1
        # Send the sample.
        schedule.append([cur_sensor,                                    # sensor_id
                         cur_mule,                                      # mule_id 
                         next_sample * sample_period,                   # sample_time
                         next_sample * sample_period + time_per_sample, # pickup_time
                         next_batch * batch_period,                     # batch_time
                         sample_length])                                # data_length
        next_sample += 1
    
    # Finally, once we are done sending all the samples that can be sent, we close out the interaction
    # and update the bookkeeping lists as necessary.
    next_samples[cur_sensor] = next_sample
    cur_end_times[cur_sensor] = new_end_time
    next_batches[cur_mule] = next_batch
    next_batch_lengths[cur_mule] = next_batch_length


In [27]:
# Save our results
with open(save_file, 'w') as f:
    # Record parameters.
    f.write('num_mules,num_sensors,advertisement_period,connection_time,ble_throughput,sample_period,sample_length,batch_period,batch_length\n')
    f.write('{},{},{},{},{},{},{},{},{}\n'.format(num_mules,num_sensors,advertisement_period,connection_time,ble_throughput,sample_period,sample_length,batch_period,batch_length))
    
    # Record schedule.
    f.write('sensor_id,mule_id,sample_time,pickup_time,batch_time,data_length\n')
    np.savetxt(f, schedule, delimiter=',')
    

In [14]:
row['sensor_id']

999.0

In [None]:
num_mules = 100
num_sensors = 1000
advertisement_period = 2.0 # seconds
connection_time = 1.5 # seconds
ble_throughput = 125000.0 # bytes per second = 1 Mbps
sample_period = 10.0 # seconds
sample_length = 128 # bytes
batch_period = 600.0 # seconds = 10 minutes
batch_length = 100000 # bytes = 100 KB