## Setup
Runs a bash command (using iPython %bash magic) to install all required dependencies

All packages (alongside their corresponding versions) can be found in the setup directory, specifically `requirements.txt`

The cell below assumes a Unix / Mac OS, however if that's not the case the following command can be used from the project's root dir to install all required dependencies

`pip3 install --no-warn-script-location --progress-bar pretty --quiet -r ./setup/requirements.txt `

In [1]:
%%bash
yes | pip3 install --no-warn-script-location --progress-bar pretty --quiet -r ./setup/requirements.txt

Couldn't find program: 'bash'


## Imports
Contains import statements for all the required libraries and frameworks

In [9]:
import bokeh
import plotnine
import altair
import numpy as np
from pprint import pprint

from generators.scatter import generate_scatter_data
from generators.histogram import generate_histogram_data

## Define Hyperparameters
Specifies the maximum number of graphs to be generated

Additionally, the types of libraries and plots can be specified as well

In [10]:
# number of graphs to be generated
max_num_graphs = 35

# set of libraries generating graphs
libraries = [
    'bokeh',
    'altair',
    'plotnine'
]

def placeholder_generator():
    random_int = random.randint(0, 1)
    if (random_int == 1):
        return ([], [])
    else:
        return ([], [], [])

# set of plots being generated
graphs = {
    'scatter': { 'generator': generate_scatter_data },
    # 'bar': { 'generator': TODO },
    'line': { 'generator': placeholder_generator },
    'contour': { 'generator': placeholder_generator },
    'kd': { 'generator': placeholder_generator },
    'histogram': { 'generator': generate_histogram_data },
    'errorbar': { 'generator': placeholder_generator },
    'bubble': { 'generator': placeholder_generator },
    'area': { 'generator': placeholder_generator },
    'box': { 'generator': placeholder_generator },
}

## Data Separation
Generates a dict representing the number of graphs that need to be created for each library/graph pair

The dictionary keys represent `(library, graph)` where the value represents the number of graphs to be generated

In [11]:
import random

def split_number_evenly(n, n_arrays):
    values = np.linspace(n, 0, n_arrays+1).astype(int)
    return [value - values[i+1] for i, value in enumerate(values[:-1])]

def split_number_randomly(n, n_arrays):
    number, numbers = n, []
    while (number > 0):
        random_number = round(random.random()*number)
        numbers.append(random_number)
        number -= random_number
    numbers.extend([0] * (n_arrays - len(numbers)))
    return random.sample(numbers, len(numbers))

def generate_occurences_dict(
    num_graphs,
    libraries,
    graphs,
    equal_library_distribution=True,
    equal_graph_distribution=True,
):
    # dict for storing the number of graphs per library / graph to generate
    # (e.g. occurences['bokeh', 'bar'] might return a value of 5)
    occurences, library_occurences, graph_occurences = {}, [], []  
    # split library occurences (1-d array)
    split_library_func = split_number_evenly if (equal_library_distribution) else split_number_randomly
    library_occurences = split_library_func(num_graphs, len(libraries))
    # split graph occurences (1-d array)
    split_graph_func =  split_number_evenly if equal_graph_distribution else split_number_randomly
    graph_occurences = [split_graph_func(num_graphs_per_library, len(graphs)) for num_graphs_per_library in library_occurences]
    # add values generated above to occurence dict
    for i, num_graphs_per_library in enumerate(graph_occurences):
        for j, num_graph_occurences in enumerate(num_graphs_per_library):
            library, graph = libraries[i], graphs[j]
            occurences[library, graph] = num_graph_occurences
    return occurences

# generate graph occurence dictionary
graph_names = list(graphs.keys())
occurences = generate_occurences_dict(100, libraries, graph_names)
pprint(occurences)

{('altair', 'area'): 4,
 ('altair', 'box'): 3,
 ('altair', 'bubble'): 4,
 ('altair', 'contour'): 3,
 ('altair', 'errorbar'): 3,
 ('altair', 'histogram'): 4,
 ('altair', 'kd'): 4,
 ('altair', 'line'): 4,
 ('altair', 'scatter'): 4,
 ('bokeh', 'area'): 4,
 ('bokeh', 'box'): 3,
 ('bokeh', 'bubble'): 4,
 ('bokeh', 'contour'): 4,
 ('bokeh', 'errorbar'): 4,
 ('bokeh', 'histogram'): 3,
 ('bokeh', 'kd'): 4,
 ('bokeh', 'line'): 4,
 ('bokeh', 'scatter'): 4,
 ('plotnine', 'area'): 4,
 ('plotnine', 'box'): 3,
 ('plotnine', 'bubble'): 4,
 ('plotnine', 'contour'): 3,
 ('plotnine', 'errorbar'): 3,
 ('plotnine', 'histogram'): 4,
 ('plotnine', 'kd'): 4,
 ('plotnine', 'line'): 4,
 ('plotnine', 'scatter'): 4}


## Data Generation
Generates the corresponding data based on graph type (i.e. `generate_bar()`)

Data is stored in a generated_graphs object (i.e. an `X` attribute in said graph object)

In [12]:
# generates an X value (and optional y value) based on graph type
def generate_data(graph):
    data_generator = graphs[graph]['generator']
    data = data_generator()
    return data

# append tuples in the format (library, (X, y)) to each corresponding graph type
# e.g. graphs['bar']['data'] might equal [('bokeh', ([1, 2, 3], ['a', 'b', 'c'])]
for (library, graph) in occurences:
    data = generate_data(graph)
    graphs[graph].setdefault('data', []).append((library, data))

pprint(graphs)

{'area': {'data': [('bokeh', ([], [])),
                   ('altair', ([], [], [])),
                   ('plotnine', ([], [], []))],
          'generator': <function placeholder_generator at 0x000001E61197D670>},
 'box': {'data': [('bokeh', ([], [])),
                  ('altair', ([], [])),
                  ('plotnine', ([], [], []))],
         'generator': <function placeholder_generator at 0x000001E61197D670>},
 'bubble': {'data': [('bokeh', ([], [])),
                     ('altair', ([], [], [])),
                     ('plotnine', ([], [], []))],
            'generator': <function placeholder_generator at 0x000001E61197D670>},
 'contour': {'data': [('bokeh', ([], [], [])),
                      ('altair', ([], [], [])),
                      ('plotnine', ([], [], []))],
             'generator': <function placeholder_generator at 0x000001E61197D670>},
 'errorbar': {'data': [('bokeh', ([], [])),
                       ('altair', ([], [], [])),
                       ('plotnine', ([]

## Define Hyperparameters
1. Define hyperparameters such as number of total graphs, what types of libraries and graphs to be used, etc.
## Data Separation
1. Split the dataset up by library / graph type
1. Allow for distribution flags (so not entirely random)
    - `library_distribution` and `graph_distribution` flags

## Data Generation
1. Generate the corresponding data (i.e. `generate_bar()`)
1. Attach data to graph object (i.e. add `X` attribute to graph object)

## Chart Generation
1. Chart Stylization (i.e. each library will be in charge of their own stylization)
1. Generate graph (i.e. generate graphs using `generate_graph(library, graph)`)