-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds a testing framework for databases #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this framework very much. I haven't really tested it so I'm talking mostly from a look at the code and reading the documentation, but it seems robust. A few questions:
- How do the test run in Travis if they expect a real database (i.e., when not using
factory-boy
)? - Can "fake" data be somehow loaded from CSV files or such? I can see how create totally fake data can require a lot of effort or be ultimately not useful if you're expecting certain data.
- Can we create a fixture that runs some sanity checks on models automatically? Something very simple such as making sure they import, that they connect to the database, etc. My guess is that something like that would cover 90% of the relevant checks.
- We probably need some quite thorough testing of
SQLADatabaseConnection
andPeeweeDatabaseConnection
.
The last two items are general comments, not really intended for this PR.
I've added a few comments about style and linting. In general, can you enable the option to remove trailing whitespaces?
python/sdssdb/utils/ingest.py
Outdated
from sqlalchemy.ext.automap import automap_base | ||
from sqlalchemy.ext.declarative import declarative_base, DeferredReflection | ||
from sdssdb import log | ||
from sdssdb.connection import SQLADatabaseConnection | ||
from sdssdb.sqlalchemy import BaseModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports in incorrect order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this file using the isort
extension, with default settings. It didn't change these imports much. I think it moved the inflect
import down into a new block. If there's a preferred setting or method you're using, I'm happy to adopt it. Just let me know what I need to change.
@@ -51,3 +51,9 @@ utahdb: | |||
host: db.sdss.utah.edu | |||
port: 5432 | |||
domain: db.sdss.utah.edu | |||
|
|||
slore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why slore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a profile for the sdss
user on the lore
host machine. I already had a lore
profile for the read-only marvin
database user which is only relevant for the manga
db. I wasn't really sure what to call this. And what's the policy here on what actually goes in this file? Are we supposed to put one new profile per host machine? Or one new profile per database user per host machine?
self.dbversion = dbversion or self.dbversion | ||
if self.dbversion: | ||
self.dbname = f'{self.dbname}_{self.dbversion}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dbversion thing seems a bit adhoc and assumes a given format for the versions of the databases. I don't love it.
How is this different from having a default version for a database (supposedly, the latest) and if you want a different one you call connect
with the new database name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit wonky I know. I added this to deal with the archive
database. Each data release, a new database is created with names like archive_20200203
, or archive_20190711
but the models and connection don't change. In principle there's nothing wrong with fixing the database name to the one with the latest version but I didn't like the idea of always having to commit a new change for that and potentially tag a new release.
We don't have a strong policy yet about versioning databases and/or version naming schemes but it might be a good idea to make one. It makes sense to me to somehow separate db names and versions. But I'm open to suggestions.
Also, could we move the tests outside the package? I've come to realise it's better to not have them as part of the package because they're not code you want to ship your package with, and it ends up being painful to exclude them when packaging. |
@albireox I've moved the tests out into the top-level directory. I've also added some repsonses to your individual comments. Thanks for pointing out the setting for removing trailing whitespace. I've been manually doing that and those little yellow tildes are quite annoying. Regarding your questions.
|
@albireox I've added some code to more dynamically create model factories to generate fake data. It will try to auto-generate fake data for every column on a database table so one doesn't have to do it manually. This can be customized either when you create the factory or from a file definition. See lines 61-65 at https://github.com/sdss/sdssdb/blob/archive/tests/pwdbs/factories.py or the file at https://github.com/sdss/sdssdb/blob/archive/tests/data/models.yml. It currently can only auto-generate fake data for simple column definitions. I haven't yet implemented anything for columns that are actually foreign keys that point to other models. But you can still define those manually. The test suite passes locally but fails on travis due to some strange issue with importing catalogdb, see #29. I also can't write some sqlalchemy tests for targetdb because of #28 |
This sounds good to me. I think both blocking issues are now fixed. |
This PR adds an initial testing framework that supports testing against databases in either
peewee
orsqlalchemy
. You can write tests against real databases, or create tests using a general test database. You can create fake tables, or insert fake data temporarily into real ones.Uses factory_boy, faker, and pytest-factoryboy for customizable fake data factories for each db model. It uses pytest-postgresql to generate a test postgres database instance.
Full list:
peewee
andsqla
using the test database + fake datapeewee
andsqla
using fake data on real dbsStill to do:
Optionals: