# Region Generation

This notebok will separate the data into all the updates for a particular project.

Take the tile_placements data and separate it out into frames. A frame is a subset of the data.
The data can be split up into frames based on time (e.g. 1 frame is 30 minutes of data) or by number of updates (e.g. 1 frame is 1 million updates). Every update can only belong to one frame.
After creating the frames, create a graph where every pixel is a node. A single pixel will be a vector of all the different updates that happened within that one frame.

We want to do a min-cut on the graph so that every graph partition represents one image. To select the edge weights, we want edges between pixels within the same image to have a large weight and edges between pixels of different images should have small weights. 

After we do graph partitions within one frame, we want to connect the frames together. 


The ultimate goal is to connect the frames that hold all the updates for a single project and to train a CNN on this data

## Progress Updates:
##### May 15, 2019  

To start out, we will try to split based on updates because there is a large surge of updates near the end, so if we split by time, then the frames near the end will have significantly more updates than frames in the beginning.
There are 16 million datapoints, so we will split 100,000 updates per frame, which will result in 160 frames.
Frames will be stored as CSV files into the folder ../data/frames

In [2]:
import csv
import networkx as nx

In [3]:
# All the frames will be stored in data/frames
filecount = 0
framesize = 100000  # Number of updates per frame
filename = "../data/tile_placements.csv"

num_updates = 0
with open(filename) as f:
    num_updates = sum(1 for line in f)

num_updates -= 1 # Subtract one to account for the header
print("Num updates: ", num_updates)
with open(filename,'r') as file_in:
    
    # Skip the header row
    next(file_in, None)
    reader = csv.reader(file_in)
    rows = list(reader)
    while (filecount < int(num_updates / framesize) + 1 ):
        # Skip first line (header row)
        output_filename = "../data/frames/frame" + str(filecount) + ".csv"
        with open(output_filename, 'w') as file_out:
            writer = csv.writer(file_out, delimiter = ",")
            writer.writerow(["ts", "user" ,"x_coordinate" ,"y_coordinate" ,"color"])

            for i in range(filecount * framesize, (filecount * framesize) + framesize):
                if (i < num_updates):
                    writer.writerow(rows[i])
        filecount += 1

print("DONE")

KeyboardInterrupt: 

In [30]:
# Create a graph where every pixel within a frame has an edge to its r nearest neighbors
def create_graph(frame_filename, r = 1):
    
    # First, parse through the frame and create a bunch of vectors, where each vector represents a pixel.
    # Each element of the vector represents one update to the pixel
    # Format of the pixels dictionary:
    '''
        {
            (x, y) : (ts, user, color)
        }
    '''
    pixels = dict()
    num_lines = 0
#     with open(frame_filename) as f:
#         num_lines = sum(1 for line in f)
#     print(num_lines)
    
    with open(frame_filename, 'r') as file:
        
        # Skip the header row
        next(file, None)
        reader = csv.reader(file)
    
        # Each row has format: [ts,user,x_coordinate,y_coordinate,color]
        for r in reader:
#             print(r)
            ts = r[0]
            user = r[1]
            x = int(r[2])
            y = int(r[3])
            color = r[4]
            
            if (pixels.get((x, y)) == None):
                pixels[(x,y)] = list()
            
            pixels[(x,y)].append((ts, user, color))
            
    
    # Now, place all the pixels in a graph where there is an edge between every pixel and its neighbors that are r away
    G = nx.Graph()
    for coordinates in pixels:
        G.add_node(coordinates)
        
    for (x,y) in pixels:
        if pixels.get((x+1,y)) != None:
            G.add_edge((x,y), ((x+1,y)))
            
        if pixels.get((x,y+1)) != None:
            G.add_edge((x,y), ((x,y+1)))
            
        if pixels.get((x-1,y)) != None:
            G.add_edge((x,y), ((x-1,y)))
            
        if pixels.get((x,y-1)) != None:
            G.add_edge((x,y), ((x,y-1)))    
    
    
    
    return G

In [31]:
G = create_graph("../data/frames/frame0.csv")