Skip to content

How To Create Marcotti Databases

Howard Hamilton edited this page Feb 11, 2016 · 8 revisions

There are two types of databases that can be created with the Marcotti data schemas:

  • Club databases, for matches involving club teams
  • National Team databases, for matches involving national selections.

In order to create a Marcotti database, we follow these steps:

  • Complete the database configuration settings
  • Create an empty database
  • Create tables using one of the Marcotti schemas

Setting the Database Configuration

To create a database configuration file, copy local.skel to a Python file and populate it.

For server-based backends (e.g. MS-SQL, MySQL, PostgreSQL, etc), all of the class attributes must be filled:

from config import Config

class LocalConfig(Config):

    DIALECT = 'postgresql'  # dialect name in SQLAlchemy
    DBNAME = 'my-football-db'  # name of database on server
    
    DBUSER = 'dbuser'  # database user login
    DBPASSWD = 'dbpass' # database user password
    HOSTNAME = 'localhost' # database hostname
    PORT = 5432  # database port (check your specific backend)
    
    # Define start and end years in database.
    START_YEAR = 2000
    END_YEAR = 2018   

If you're using SQLite, which is serverless, only the DIALECT and DBNAME attributes are filled:

class LocalConfig(Config):
    DIALECT = 'sqlite'  # dialect name in SQLAlchemy
    DBNAME = '/path/to/football.db'  # name of database file

The dialect names are set by SQLAlchemy:

Dialect Name Database Backend
firebird Firebird
mssql Microsoft SQL Server
mysql MySQL
oracle Oracle
postgresql PostgreSQL
sybase Sybase DB (SAP ASE)
sqlite SQLite

Start and end years are defined in the configuration with the START_YEAR and END_YEAR variables; these are used to load a range of seasons in the database.

You can create multiple configuration settings by subclassing Config and populating the attributes.

Create an Empty Database

In order to create the database tables, the database must exist first. For example, a PostgreSQL database is created by the following command-line statement:

$ createdb my-football-db

The specific command will vary with the database backend, so consult your preferred database's documentation.

Creating an empty SQLite database is as simple as creating an empty file -- the touch command in Unix, or a New File selection in the file browser of your choice.

Create Database Tables

The Marcotti package comes with an interface module that makes a connection to the database, creates tables in the database, and creates sessions that allows you to make insertions, queries, and other changes.

To create tables in the database, import the database configuration, ClubSchema or NatlSchema objects from models\club.py or models\national.py, and the Marcotti class from the interface module to create tables with the schema:

from local import LocalConfig
from marcotti.models.club import ClubSchema
from marcotti.interface import Marcotti


marcotti = Marcotti(LocalConfig())
marcotti.create_db(ClubSchema)

Most of the entities with pre-defined values, such as Confederations, Naming Order, and Fouls, are defined as custom data types. There are four remaining entities that can't be converted to enumerated types as easily -- Countries, Positions, Surfaces, and Timezones -- so these are implemented as lookup tables and loaded in the create_db() method. A fifth table, the Seasons table, is loaded with a range of years to represent calendar- and split-year seasons.

If the interface module is run from the command line, it will create the tables in the database and populate the remaining validation tables:

$ python interface.py

For reference, the validation data is found in the data folder.

Data File Data Model
countries.[csv,json] Countries
positions.[csv,json] Positions
surfaces.[csv,json] Surfaces
timezones.[csv,json] Timezones

Loading Data into Marcotti Database

Marcotti ships with Extract, Transform, and Load (ETL) tools to preprocess and load match data into the Marcotti databases. They currently load CSV data from club league competitions only, but they will be expanded to the full range of competitions modeled by Marcotti.

It's necessary to set the following ETL-related variables in the local configuration class:

class LocalConfig(Config):
    # ...
    VENUE_EFF_DATE = ''
    COMPETITION_NAME = u''
    SEASON_NAME = ''
    DATA_SUPPLIER = u''

    # Define CSV data files
    CSV_DATA_DIR = '/path/to/data'
    CSV_DATA = {
        'Supplier': [
            ('Supplier', 'Suppliers.csv')
        ],
        'Overview': [
            ('Competitions', 'Competitions.csv'),
            ('Clubs', 'Clubs.csv'),
            ('Venues', 'Venues.csv')
        ],
        'Personnel': [
            ('Positions', 'Positions.csv'),
            ('Players', 'Players.csv'),
            ('Managers', 'Managers.csv'),
            ('Referees', 'Referees.csv')
        ],
        'Match': [
            ('Matches', 'Matches.csv'),
            ('Lineups', 'MatchLineups.csv'),
            ('Goals', 'Goals.csv'),
            ('Penalties', 'Penalties.csv'),
            ('Bookables', 'Bookables.csv'),
            ('Substitutions', 'Substitutions.csv'),
            ('PlayerStats', 'PlayerStats.csv')
        ]
    }

Data must be loaded into the table in order: first Suppliers, then Overview and Personnel, and finally the Match group. Within these groups, the models are loaded in the order listed in the CSV_DATA dictionary.

To run the loader, execute the following command line:

$ python loader.py

More information can be found in the ETL section.