Skip to content

streampref/wcimport

Repository files navigation

Table of Contents

Introduction

WCImport is a set of tools for importing data of 2014 Soccer World Cup and prepare this data for experiments with StreamPref Data Stream Management System (DSMS) prototype. Please see the related publications for more information.

Tools

The first step is to run the tool wcimport.py to download and convert the data. Next, you can use the specific tool to create the environment for experiments.

In addition, WCImport has the following individual tools:

  • bestseqgen.py: tool for evaluation of BESTSEQ operator (temporal preference operator);
  • seqgen.py: tool for evaluation of SEQ operator (sequence extraction);
  • conseggen.py: tool for evaluation of CONSEQ operator (subsequences with consecutive tuples);
  • endseqgen.py: tool for evaluation of ENDSEQ operator (subsequences with the last position);
  • maxseqgen.py: tool for evaluation of MAXSEQ operator (Filtering by maximum length);
  • minseqgen.py: tool for evaluation of MINSEQ operator (Filtering by minimum length);
  • utilgen.py: tool for utility experiments.

Algorithms

Except by the utilgen.py, all tools generate StremPref environments for evaluating their operators. Each operator can be evaluated by one or more algorithms and by a CQL equivalent query. The available algorithms for each operator are the following:

  • SEQ
    • Incremental algorithm
    • CQL Equivalence
  • CONSEQ / ENDSEQ
    • Naive algorithm
    • Incremental algorithm
    • CQL Equivalence
  • MINSEQ / MAXSEQ
    • Direct algorithm
    • CQL Equivalence
  • BESTSEQ
    • Naive algorithm with depth search comparison
    • Incremental algorithm with sequences tree
    • Incremental algorithm with sequences tree and pruning
    • CQL Equivalence

The goal of the utilgen.py is to execute experiments to analyze the utility of the operators. This tool execute experiments using the following combinations of operators:

  • SEQ / BESTSEQ;
  • SEQ / CONSEQ / BESTSEQ;
  • SEQ / CONSEQ / ENDSEQ / BESTSEQ;
  • SEQ / CONSEQ / ENDSEQ / MINSEQ / MAXSEQ / BESTSEQ. During the experiments execution the tool takes informations about the sequences sent to BESTSEQ operator and about the comparisons performed by this operator.

Parameters

The experiments parameters must be updated directly in the source code. The available parameters are the following:

  • ATT: Number of attributes;
  • NSQ: Number of distinct sequences;
  • RAN: Temporal range;
  • SLI: Slide interval;
  • PCT: Percentage of consecutive instants (used only by conseqgen.py);
  • MAX: Maximum valid length (used only by maxseqgen.py);
  • MIN: Maximum valid length (used only by minseqgen.py);
  • RUL: Number of rules (used only by bestseqgen.py);
  • LEV: Maximum preference level (used only by bestseqgen.py);
  • IND: Number of indifferent attributes (used only by bestseqgen.py).

Every parameter is dictionary with the keys VAR (list of values) and DEF (default parameter).

Command Line

Despite StreamPrefGen is composed by many tools, all of them share the same command line options.

gen.py [-h] [-g] [-o] [-r] [-s]
  -h, --help       show the help message and exit
  -g, --gen        Generate files
  -o, --output     Generate query output
  -r, --run        Run experiments
  -s, --summarize  Summarize results

About

Tool to import 2014 Soccer World Cup Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages