How To Create Marcotti Databases
There are two types of databases that can be created with the Marcotti data schemas:
- Club databases, for matches involving club teams
- National Team databases, for matches involving national selections.
In order to create a Marcotti database, we follow these steps:
- Complete the database configuration settings
- Create an empty database
- Create tables using one of the Marcotti schemas
To create a database configuration file, copy local.skel
to a Python file and populate it.
For server-based backends (e.g. MS-SQL, MySQL, PostgreSQL, etc), all of the class attributes must be filled:
from config import Config
class LocalConfig(Config):
DIALECT = 'postgresql' # dialect name in SQLAlchemy
DBNAME = 'my-football-db' # name of database on server
DBUSER = 'dbuser' # database user login
DBPASSWD = 'dbpass' # database user password
HOSTNAME = 'localhost' # database hostname
PORT = 5432 # database port (check your specific backend)
# Define start and end years in database.
START_YEAR = 2000
END_YEAR = 2018
If you're using SQLite, which is serverless, only the DIALECT
and DBNAME
attributes are filled:
class LocalConfig(Config):
DIALECT = 'sqlite' # dialect name in SQLAlchemy
DBNAME = '/path/to/football.db' # name of database file
The dialect names are set by SQLAlchemy:
Dialect Name | Database Backend |
---|---|
firebird | Firebird |
mssql | Microsoft SQL Server |
mysql | MySQL |
oracle | Oracle |
postgresql | PostgreSQL |
sybase | Sybase DB (SAP ASE) |
sqlite | SQLite |
Start and end years are defined in the configuration with the START_YEAR
and END_YEAR
variables; these are used to
load a range of seasons in the database.
You can create multiple configuration settings by subclassing Config
and populating the attributes.
In order to create the database tables, the database must exist first. For example, a PostgreSQL database is created by the following command-line statement:
$ createdb my-football-db
The specific command will vary with the database backend, so consult your preferred database's documentation.
Creating an empty SQLite database is as simple as creating an empty file -- the touch
command in Unix, or a New File
selection in the file browser of your choice.
The Marcotti package comes with an interface
module that makes a connection to the database, creates tables in the
database, and creates sessions that allows you to make insertions, queries, and other changes.
To create tables in the database, import the database configuration, ClubSchema
or NatlSchema
objects from
models\club.py
or models\national.py
, and the Marcotti
class from the interface
module to create tables
with the schema:
from local import LocalConfig
from marcotti.models.club import ClubSchema
from marcotti.interface import Marcotti
marcotti = Marcotti(LocalConfig())
marcotti.create_db(ClubSchema)
Most of the entities with pre-defined values, such as Confederations, Naming Order, and Fouls, are defined as
custom data types. There are four remaining entities that can't be converted to enumerated types as easily --
Countries, Positions, Surfaces, and Timezones -- so these are implemented as lookup tables and loaded
in the create_db()
method. A fifth table, the Seasons table, is loaded with a range of years to represent
calendar- and split-year seasons.
If the interface
module is run from the command line, it will create the tables in the database and populate the
remaining validation tables:
$ python interface.py
For reference, the validation data is found in the data
folder.
Data File | Data Model |
---|---|
countries.[csv,json] | Countries |
positions.[csv,json] | Positions |
surfaces.[csv,json] | Surfaces |
timezones.[csv,json] | Timezones |
Marcotti ships with Extract, Transform, and Load (ETL) tools to preprocess and load match data into the Marcotti databases. They currently load CSV data from club league competitions only, but they will be expanded to the full range of competitions modeled by Marcotti.
It's necessary to set the following ETL-related variables in the local configuration class:
class LocalConfig(Config):
# ...
VENUE_EFF_DATE = ''
COMPETITION_NAME = u''
SEASON_NAME = ''
DATA_SUPPLIER = u''
# Define CSV data files
CSV_DATA_DIR = '/path/to/data'
CSV_DATA = {
'Supplier': [
('Supplier', 'Suppliers.csv')
],
'Overview': [
('Competitions', 'Competitions.csv'),
('Clubs', 'Clubs.csv'),
('Venues', 'Venues.csv')
],
'Personnel': [
('Positions', 'Positions.csv'),
('Players', 'Players.csv'),
('Managers', 'Managers.csv'),
('Referees', 'Referees.csv')
],
'Match': [
('Matches', 'Matches.csv'),
('Lineups', 'MatchLineups.csv'),
('Goals', 'Goals.csv'),
('Penalties', 'Penalties.csv'),
('Bookables', 'Bookables.csv'),
('Substitutions', 'Substitutions.csv'),
('PlayerStats', 'PlayerStats.csv')
]
}
Data must be loaded into the table in order: first Suppliers
, then Overview
and Personnel
, and finally the
Match
group. Within these groups, the models are loaded in the order listed in the CSV_DATA
dictionary.
To run the loader, execute the following command line:
$ python loader.py
More information can be found in the ETL section.
(c) 2015-2016 Soccermetrics Research, LLC. Distributed under the MIT license.