Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds a testing framework for databases #25

Merged
merged 64 commits into from
May 12, 2020
Merged

adds a testing framework for databases #25

merged 64 commits into from
May 12, 2020

Conversation

havok2063
Copy link
Collaborator

@havok2063 havok2063 commented Mar 24, 2020

This PR adds an initial testing framework that supports testing against databases in either peewee or sqlalchemy. You can write tests against real databases, or create tests using a general test database. You can create fake tables, or insert fake data temporarily into real ones.
Uses factory_boy, faker, and pytest-factoryboy for customizable fake data factories for each db model. It uses pytest-postgresql to generate a test postgres database instance.

Full list:

  • tests against existing real databases, with change rollbacks
  • generate and insert fake data into real tables, with change rollbacks
  • test generic database code on a test database
  • ignores database tests when there is no local database
  • options to run tests only for peewee or sqlalchemy databases
  • option to switch session/transactions from function to module scope
  • adds example tests against the generic sdss database connection
  • adds example tests for peewee and sqla using the test database + fake data
  • adds example tests for peewee and sqla using fake data on real dbs

Still to do:

  • update changelog
  • add documentation (see here)

Optionals:

  • look into easier factory creation for a given ModelClass
  • look into mechanism for mapping existing Models onto the test database
  • better test organization?
  • look into easier fake test data generation

@havok2063 havok2063 added the enhancement New feature or request label Mar 24, 2020
@havok2063 havok2063 requested a review from albireox March 24, 2020 19:44
@coveralls
Copy link

coveralls commented Mar 24, 2020

Pull Request Test Coverage Report for Build 211

  • 0 of 83 (0.0%) changed or added relevant lines in 7 files are covered.
  • 761 unchanged lines in 15 files lost coverage.
  • Overall coverage decreased (-13.08%) to 0.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
python/sdssdb/sqlalchemy/init.py 0 2 0.0%
python/sdssdb/sqlalchemy/archive/init.py 0 3 0.0%
python/sdssdb/sqlalchemy/mangadb/datadb.py 0 3 0.0%
python/sdssdb/sqlalchemy/mangadb/dapdb.py 0 4 0.0%
python/sdssdb/connection.py 0 13 0.0%
python/sdssdb/utils/ingest.py 0 25 0.0%
python/sdssdb/sqlalchemy/archive/sas.py 0 33 0.0%
Files with Coverage Reduction New Missed Lines %
python/sdssdb/misc/init.py 1 0%
python/sdssdb/sqlalchemy/archive/init.py 1 0%
python/sdssdb/utils/init.py 3 0%
python/sdssdb/peewee/sdss5db/init.py 5 0%
python/sdssdb/utils/internals.py 9 0%
python/sdssdb/utils/schemadisplay.py 11 0%
python/sdssdb/init.py 18 0%
python/sdssdb/misc/color_print.py 19 0%
python/sdssdb/core/exceptions.py 20 0%
python/sdssdb/utils/ingest.py 24 0%
Totals Coverage Status
Change from base Build 210: -13.08%
Covered Lines: 0
Relevant Lines: 5154

💛 - Coveralls

Copy link
Member

@albireox albireox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this framework very much. I haven't really tested it so I'm talking mostly from a look at the code and reading the documentation, but it seems robust. A few questions:

  • How do the test run in Travis if they expect a real database (i.e., when not using factory-boy)?
  • Can "fake" data be somehow loaded from CSV files or such? I can see how create totally fake data can require a lot of effort or be ultimately not useful if you're expecting certain data.
  • Can we create a fixture that runs some sanity checks on models automatically? Something very simple such as making sure they import, that they connect to the database, etc. My guess is that something like that would cover 90% of the relevant checks.
  • We probably need some quite thorough testing of SQLADatabaseConnection and PeeweeDatabaseConnection.

The last two items are general comments, not really intended for this PR.

I've added a few comments about style and linting. In general, can you enable the option to remove trailing whitespaces?

python/sdssdb/utils/ingest.py Outdated Show resolved Hide resolved
Comment on lines 24 to 28
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.declarative import declarative_base, DeferredReflection
from sdssdb import log
from sdssdb.connection import SQLADatabaseConnection
from sdssdb.sqlalchemy import BaseModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports in incorrect order.

Copy link
Collaborator Author

@havok2063 havok2063 Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this file using the isort extension, with default settings. It didn't change these imports much. I think it moved the inflect import down into a new block. If there's a preferred setting or method you're using, I'm happy to adopt it. Just let me know what I need to change.

python/sdssdb/utils/ingest.py Outdated Show resolved Hide resolved
@@ -51,3 +51,9 @@ utahdb:
host: db.sdss.utah.edu
port: 5432
domain: db.sdss.utah.edu

slore:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why slore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a profile for the sdss user on the lore host machine. I already had a lore profile for the read-only marvin database user which is only relevant for the manga db. I wasn't really sure what to call this. And what's the policy here on what actually goes in this file? Are we supposed to put one new profile per host machine? Or one new profile per database user per host machine?

Comment on lines +79 to +81
self.dbversion = dbversion or self.dbversion
if self.dbversion:
self.dbname = f'{self.dbname}_{self.dbversion}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dbversion thing seems a bit adhoc and assumes a given format for the versions of the databases. I don't love it.

How is this different from having a default version for a database (supposedly, the latest) and if you want a different one you call connect with the new database name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit wonky I know. I added this to deal with the archive database. Each data release, a new database is created with names like archive_20200203, or archive_20190711 but the models and connection don't change. In principle there's nothing wrong with fixing the database name to the one with the latest version but I didn't like the idea of always having to commit a new change for that and potentially tag a new release.

We don't have a strong policy yet about versioning databases and/or version naming schemes but it might be a good idea to make one. It makes sense to me to somehow separate db names and versions. But I'm open to suggestions.

@albireox
Copy link
Member

Also, could we move the tests outside the package? I've come to realise it's better to not have them as part of the package because they're not code you want to ship your package with, and it ends up being painful to exclude them when packaging.

@havok2063
Copy link
Collaborator Author

@albireox I've moved the tests out into the top-level directory. I've also added some repsonses to your individual comments. Thanks for pointing out the setting for removing trailing whitespace. I've been manually doing that and those little yellow tildes are quite annoying. Regarding your questions.

  • Currently all tests with real databases are skipped when no database is detected, including on Travis. So Travis runs any tests that don't need real databases (e.g. for general database connection code) or for any tests using the temporary test database. I set it up so people could run tests for databases but it wouldn't outright fail for everyone else when they don't have all databases set up locally for example.
  • I don't think factoryboy lets you load fake data from files. It works by building a "fake" class or ORM Model that maps to a real one. For catalogs in catalogdb I can see how creating fake models would be a lot of work since you'd need to specify every column. We might be able to come up with factory to generate fake models based on data defined in a file or perhaps a schema file.
  • Yeah I think that should be possible. What do you mean by "check that the models connects to the database"? Like that it can run a simple query? How is that different than writing a simple test?
  • Yeah I agree on the tests for SQLADatabaseConnection and PeeweeDatabaseConnection. I started some in the test_connection.py files.

@havok2063
Copy link
Collaborator Author

@albireox I've added some code to more dynamically create model factories to generate fake data. It will try to auto-generate fake data for every column on a database table so one doesn't have to do it manually. This can be customized either when you create the factory or from a file definition. See lines 61-65 at https://github.com/sdss/sdssdb/blob/archive/tests/pwdbs/factories.py or the file at https://github.com/sdss/sdssdb/blob/archive/tests/data/models.yml. It currently can only auto-generate fake data for simple column definitions. I haven't yet implemented anything for columns that are actually foreign keys that point to other models. But you can still define those manually.

The test suite passes locally but fails on travis due to some strange issue with importing catalogdb, see #29. I also can't write some sqlalchemy tests for targetdb because of #28

@havok2063 havok2063 requested a review from albireox April 22, 2020 18:50
@albireox
Copy link
Member

albireox commented May 4, 2020

This sounds good to me. I think both blocking issues are now fixed.

@havok2063 havok2063 merged commit 6312337 into master May 12, 2020
@havok2063 havok2063 deleted the archive branch May 12, 2020 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants