## Setup
Runs a bash command (using iPython %bash magic) to install all required dependencies

All packages (alongside their corresponding versions) can be found in the setup directory, specifically `requirements.txt`

In [3]:
%%bash
yes | pip3 install --no-warn-script-location --progress-bar pretty --quiet -r ./setup/requirements.txt

You are using pip version 18.1, however version 21.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


## Imports
Contains import statements for all the required libraries and frameworks

In [4]:
import bokeh
import plotnine
import altair
import numpy as np
from pprint import pprint

## Define Hyperparameters
Specifies the maximum number of graphs to be generated.

Additionally, the types of libraries and plots can be specified as well.

In [141]:
# number of graphs to be generated
max_num_graphs = 35

# set of libraries generating graphs
libraries = [
    'bokeh',
    'altair',
    'plotnine',
]

# set of plots being generated
graphs = [
    'scatter', 
    'bar', 
    'line', 
    'contour', 
    'kd', 
    'histogram', 
    'errorbar', 
    'bubble', 
    'area', 
    'box'
]

## Data Separation
Generates a dict representing the number of graphs that need to be created for each library/graph pair

The dictionary keys represent `(library, graph)` where the value represents the number of graphs to be generated

In [153]:
import random

def split_number_evenly(n, n_arrays):
    values = np.linspace(n, 0, n_arrays+1).astype(int)
    return [value - values[i+1] for i, value in enumerate(values[:-1])]

def split_number_randomly(n, n_arrays):
    number, numbers = n, []
    while (number > 0):
        random_number = round(random.random()*number)
        numbers.append(random_number)
        number -= random_number
    numbers.extend([0] * (n_arrays - len(numbers)))
    return random.sample(numbers, len(numbers))

def generate_occurences_dict(
    num_graphs,
    libraries,
    graphs,
    equal_library_distribution=True,
    equal_graph_distribution=True,
):
    # dict for storing the number of graphs per library / graph to generate
    # (e.g. occurences['bokeh', 'bar'] might return a value of 5)
    occurences, library_occurences, graph_occurences = {}, [], []  
    # split library occurences
    split_library_func = split_number_evenly if (equal_library_distribution) else split_number_randomly
    library_occurences = split_library_func(num_graphs, len(libraries))
    # split graph occurences
    split_graph_func =  split_number_evenly if equal_graph_distribution else split_number_randomly
    graph_occurences = [split_graph_func(num_graphs_per_library, len(graphs)) for num_graphs_per_library in library_occurences]
    # add values generated above to occurence dict
    for i, num_graphs_per_library in enumerate(graph_occurences):
        for j, num_graph_occurences in enumerate(num_graphs_per_library):
            library, graph = libraries[i], graphs[j]
            occurences[library, graph] = num_graph_occurences
    return occurences

occurences = generate_occurences_dict(100, libraries, graphs)


## Data Generation
Generates the corresponding data based on graph type (i.e. `generate_bar()`)

Data is stored in a generated_graphs object (i.e. an `X` attribute in said graph object)

In [None]:
# TODO: make process for generating each type of graph
# TODO: make generated_graphs dict and store data in X variable
# e.g. dict[bokeh, bar] = [{}, {}, {}]

## Define Hyperparameters
1. Define hyperparameters such as number of total graphs, what types of libraries and graphs to be used, etc.
## Data Separation
1. Split the dataset up by library / graph type
1. Allow for distribution flags (so not entirely random)
    - `library_distribution` and `graph_distribution` flags

## Data Generation
1. Generate the corresponding data (i.e. `generate_bar()`)
1. Attach data to graph object (i.e. add `X` attribute to graph object)

## Chart Generation
1. Chart Stylization (i.e. each library will be in charge of their own stylization)
1. Generate graph (i.e. generate graphs using `generate_graph(library, graph)`)