## Imports

In [225]:
import pandas as pd
import matplotlib.pyplot as plt
import subprocess

%matplotlib inline

## Global Variables

In [226]:
TEST_PROGRAM_PATH = "../custom_ds/main"

## Functions

In [284]:
def run_command(command):
    sproc = subprocess.Popen(command.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    output, err = map(lambda byte: byte.decode('utf-8'), sproc.communicate())
    return output.strip()

def compile_cpp(path, **kwargs):
    command = f"g++ {path}.cpp -o {path}.ignoreme"
    return run_command(command.strip())

def execute_cpp(path, **kwargs):
    command = f"./{path}.ignoreme {kwargs.get('file_path', '')} {kwargs.get('kmer_size', '')} {kwargs.get('hash_map_size', '')} {kwargs.get('fp_size', '')} {kwargs.get('use_buffer', '')} {kwargs.get('run_test', '')}"
    return run_command(command.strip())

## Dataframe Helper Functions

In [275]:
def create_dataframe(columns, data = []):
    return pd.DataFrame(columns = columns, data = data)

def plot_graphic(test_name, dataframe, x_axis, y_axis):
    plt.title(test_name)
    plt.xlabel(x_axis)
    plt.ylabel(y_axis)
    plt.plot(dataframe[x_axis], dataframe[y_axis], '-')
    plt.show()

## Test Parameters
We want to define which tests do we want to do (for example, test the k-mer size influence on this algorithm).
Let's define a test suite structure, defining the parameter we want to test. The key should be the test name, and the value is a dict with the parameters we want to pass to our test function:
```python3
test_suite = {
  <test_name>: {
      'test_file_path': str,
      'kmer_size': int,
      'hash_map_size': int,
      'fp_size': int,
  }
}
```

In [274]:
def run_tests(test):
    df = create_dataframe(columns)
    for file_path in test['file_path']:
        for hash_map_size in test['hash_map_size']:
            for kmer_size in test['kmer_size']:
                for fp_size in test['fp_size']:
                    for use_buffer in test['use_buffer']:
                        out = execute_cpp(
                            TEST_PROGRAM_PATH, 
                            file_path = file_path, 
                            kmer_size = kmer_size, 
                            hash_map_size = hash_map_size,
                            fp_size = fp_size,
                            use_buffer = use_buffer,
                            run_test = test['run_test']
                        )
                        
                        df = df.append(
                            create_dataframe(
                                columns, 
                                [[file_path, 
                                 kmer_size,
                                 hash_map_size,
                                 fp_size,
                                 use_buffer,
                                ] + out.split(' ')]
                            )
                        )
    return df

In [281]:
test_suite = {
    "test_1": {
        "file_path": ["../datasets/dna.5MB"],
        "kmer_size": [8, 10, 12],
        "hash_map_size": [20],
        "fp_size": [3],
        'use_buffer': [0],
        "run_test": 1,
    },
}

In [277]:
columns = [
    "file_path",
    "kmer_size",
    "hash_map_size",
    "fp_size",
    "use_buffer",
    "true_positives", 
    "true_negatives", 
    "false_positives", 
    "false_negatives", 
    "sensibility", 
    "specificity",
    "found_kmers",
]

## Test

In [285]:
compile_cpp(TEST_PROGRAM_PATH)

''

1. testar kmer 8, 16, 24, 32
2. tamanho da tabela 2x tamanho do arquivo
3. tabela também se limita ao tamanho do kmer. Se kmer size = 8, no máximo teremos um valor de 16 bits
4. Tamanho da tablea se limita ao numero de kmer distintos
5. tamanho da tabela x tamanho do numero de kmers distintos (1 << kmer_size)

numero de kmers distintos x tamanho do arquivo (pegar o maximo entre eles)

teste raw 
- tempo apra criar tabela
- quantos slots ocupados
- tamanho da estrutura



## Testar Sensibilidade e Especificidade
Testaremos Spec e Senst variando os valores 