Unexpected and Much Slower Performace on a Query Using 1.4 vs 1.3 #6302
-
Description of ProblemSome of our application's queries using sqlalchemy have gone from running in under a second and using very little RAM to making taking over a minute each and using gigabytes of ram. I expect we're doing something that violates sqlalchemy's expectation on how queries are built out and used, but I haven't able to find a clear reason for the large performance difference between 1.3 and 1.4. I'm hoping for some clarity on if this performance difference is expected between 1.3 and 1.4 and if so, what changes I need to make in order to get things running again, since the increased time and memory usage is too much for our app or CI/CD pipeline to handle. For background, our application is a custom built Human Resource Management System that has higher education specific features missing in most off the shelf software and most of our reports are built in a similar example to the code samples below. EnvironmentI've ran the example below in a python3 docker container and my laptop running Archlinux using python 3.9. The database is a postgres db ran in a docker container. Running the ExamplePrepare a postgres db in a docker container:
Create a virutalenv with 1.3 so we can confirm the original speed and resource use
Run example inside virtual environment
Create a virutalenv with 1.4 so we can reproduce my query problem
Run example inside virtual environment
Example Python CodeI've shortened this down quite a bit from production, my apologies if it is it is longer than needed. I'm afraid the setup is larger then the query in import flask
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.ext.mutable import MutableDict, MutableList
from sqlalchemy.dialects.postgresql import (JSONB, UUID, ARRAY)
from sqlalchemy_utils import(ArrowType)
from sqlalchemy.ext.hybrid import hybrid_property, hybrid_method
from sqlalchemy.ext.associationproxy import association_proxy
import arrow
from dataclasses import dataclass
from datetime import date, datetime, timedelta
@dataclass
class Config:
TZ: str = 'US/Pacific'
SQLALCHEMY_DATABASE_URI = "postgresql://postgres@localhost/postgres"
config = Config()
app = Flask(__name__)
app.config.from_object(config)
db = SQLAlchemy(app, engine_options={'json_serializer': flask.json.dumps})
#########
#
# Date Helper functions
#
#########
def to_local(column, tz=config.TZ):
# UTC fix for timestamp without timezone
result = db.func.timezone('UTC', column)
return db.func.timezone(tz, result)
def day_of_week(the_date, tz=config.TZ):
local_time = to_local(the_date, tz)
return db.func.extract('ISODOW', local_time).cast(db.Integer)
def is_weekend(the_date, tz=config.TZ):
SATURDAY = 6
SUNDAY = 7
day = day_of_week(the_date, tz)
return db.or_(day == SATURDAY, day == SUNDAY)
def is_weekday(the_date, tz=config.TZ):
return db.not_(is_weekend(the_date, tz))
def day_of_week_from_date(the_date):
return db.func.extract('ISODOW', the_date).cast(db.Integer)
def start_of_week_from_date(the_date, start_day):
day_interval = db.text("Interval '1 days'")
start = the_date
dow = day_of_week_from_date(the_date)
before = dow < start_day
before_math = (start_day - dow).cast(db.Integer)
expr = (
db.case(
[
(
before, start + before_math.op("*")(day_interval)
)
],
else_= (start + (start_day + 7 - dow).cast(db.Integer).op("*")(day_interval))
) - db.text("Interval '7 days'")
)
return db.func.date_trunc('day', expr)
def start_of_week_from_date_str(the_date, start_day, tz=config.TZ):
return db.func.to_char(start_of_week_from_date(the_date, start_day), "YYYY-MM-DD")
class WorkSchedule(db.Model):
__tablename__ = 'work_schedule'
__table_args__ = (db.UniqueConstraint('emplid', 'empl_rcd', name='_one_schedule_per_job'),)
id = db.Column(db.Integer, primary_key=True)
emplid = db.Column(db.String, nullable=False)
empl_rcd = db.Column(db.Integer, nullable=False)
weekly_hours = db.Column(db.Numeric)
########
#
# We use materialized views on postgres to integrate with data from our ERP
#
########
class Department(db.Model):
__tablename__ = 'department_mv'
id = db.Column(db.Text, primary_key=True)
deptid = db.Column(db.Text)
name = db.Column(db.Text)
def __repr__(self):
return f'<Department {self.id}/{self.name}>'
class Job(db.Model):
__tablename__ = 'job_mv'
__table_args__ = (db.PrimaryKeyConstraint('emplid', 'empl_rcd'),
)
emplid = db.Column(db.String )
empl_rcd = db.Column(db.Integer)
exempt = db.Column(db.Boolean)
department_id = db.Column(db.String, db.ForeignKey("department_mv.id"))
title = db.Column(db.String)
jobcode = db.Column(db.String)
position_number = db.Column(db.String)
pay_group = db.Column('paygroup', db.String)
standard_hours = db.Column('Standard Hours', db.Float)
standard_hours_frequency = db.Column('Standard Hours Frequency', db.String)
department = db.relationship('Department', viewonly=True, uselist=False)
schedule = db.relationship('WorkSchedule',
backref=db.backref('job', uselist=False),
viewonly=True,
uselist=False,
foreign_keys=[WorkSchedule.emplid, WorkSchedule.empl_rcd],
primaryjoin="and_(Job.emplid == WorkSchedule.emplid,"
"Job.empl_rcd == WorkSchedule.empl_rcd,)",
)
@hybrid_property
def id(self):
id_ = f"{self.emplid}_{self.empl_rcd}"
return id_
@id.expression
def id(cls):
return db.func.concat(cls.emplid, '_', cls.empl_rcd)
def __repr__(self):
return f'<Job {self.emplid}/{self.empl_rcd}>'
#########
#
# Back to regular tables
#
#########
class PayPeriod(db.Model):
__tablename__ = 'pay_period'
id = db.Column(db.Integer, primary_key = True)
description = db.Column(db.Text, nullable = False)
start = db.Column(db.Date, nullable = False)
end = db.Column(db.Date, nullable = False)
open_ = db.Column('open', db.Boolean, nullable = False, default=False)
# ISO day of week 1=Monday 7=Sunday, Needed for overtime
start_of_week = db.Column(db.Integer, nullable = False)
pay_groups = db.Column(MutableList.as_mutable(ARRAY(db.String)))
@hybrid_property
def is_current(self):
today = arrow.utcnow().to(config.TZ).date()
return today >= self.start and today <= self.end
@is_current.expression
def is_current(cls):
return db.and_(db_date_now >= cls.start, db_date_now <= cls.end)
@hybrid_property
def is_past(self):
today = arrow.utcnow().date()
return today > self.end
@is_past.expression
def is_past(cls):
return db_date_now > cls.end
@hybrid_method
def start_day_of_week(self, tz=config.TZ):
return self.start.isoweekday()
@start_day_of_week.expression
def start_day_of_week(cls, tz=config.TZ):
local_time = db.func.timezone(tz, cls.start)
return db.func.extract('ISODOW', local_time).cast(db.Integer)
@hybrid_method
def end_day_of_week(self, tz=config.TZ):
return self.end.isoweekday()
@end_day_of_week.expression
def end_day_of_week(cls, tz=config.TZ):
local_time = db.func.timezone(tz, cls.end)
return db.func.extract('ISODOW', local_time).cast(db.Integer)
@hybrid_method
def first_week(self, tz=config.TZ):
week_start = None
start = arrow.get(self.start, tz)
dow = self.start_day_of_week(tz)
start_day = self.start_of_week
if dow < start_day:
shift_days = start_day - dow
week_start = start.shift(days=shift_days)
else:
shift_days = (start_day + 7) - dow
week_start = start.shift(days=shift_days)
week_start = week_start.shift(days=-7)
return arrow.get(week_start.date(), tz)
@first_week.expression
def first_week(cls, tz=config.TZ):
day_interval = db.text("Interval '1 days'")
start = to_local(cls.start, tz)
dow = cls.start_day_of_week(tz)
start_day = cls.start_of_week
before = dow < start_day
before_math = (start_day - dow).cast(db.Integer)
expr = (
db.case(
[
(
before, start + before_math.op("*")(day_interval)
)
],
else_= (start + (start_day + 7 - dow).cast(db.Integer).op("*")(day_interval))
) - db.text("Interval '7 days'")
)
return db.func.date_trunc('day',
db.func.timezone(tz, db.func.date_trunc('day', expr))
)
@hybrid_method
def first_week_str(self, tz=config.TZ):
return self.first_week(tz).format("YYYY-MM-DD")
@first_week_str.expression
def first_week_str(cls, tz=config.TZ):
return db.func.to_char(cls.first_week(tz), "YYYY-MM-DD")
@hybrid_method
def last_week(self, tz=config.TZ):
return start_of_week_from_date_object(self.end, self.start_of_week, tz=tz)
@last_week.expression
def last_week(cls, tz=config.TZ):
return start_of_week_from_date(cls.end, cls.start_of_week)
@hybrid_method
def last_week_is_full(self, tz=config.TZ):
days_in_week = arrow.get(self.end, tz) - self.last_week(tz)
return not (days_in_week < timedelta(6))
@last_week_is_full.expression
def last_week_is_full(cls, tz=config.TZ):
return ~((cls.end - cls.last_week(tz)) < timedelta(days=6))
class Timesheet(db.Model):
__tablename__ = 'timesheet'
__table_args__ = (db.UniqueConstraint('pay_period_id', 'emplid', 'empl_rcd', 'employment_id', name='_one_timesheet_per_pay_period'),
)
id = db.Column(db.Integer, primary_key=True)
pay_period_id = db.Column(db.Integer, db.ForeignKey(PayPeriod.id), nullable = False)
type_ = db.Column('type', db.Text, nullable=False)
emplid = db.Column(db.Text, nullable=False)
empl_rcd = db.Column(db.Integer, nullable=True)
employment_id = db.Column(db.Integer, nullable=True)
department_id = db.Column(db.String, nullable=False)
created_at = db.Column(ArrowType, default=arrow.utcnow)
payroll_info = db.Column(MutableDict.as_mutable(JSONB), nullable = False, default = {})
pay_period = db.relationship('PayPeriod', backref = "timesheets", uselist=False)
pay_period_start = association_proxy('pay_period', 'start')
pay_period_end = association_proxy('pay_period', 'end')
job = db.relationship('Job',
backref="timesheets",
foreign_keys=[emplid, empl_rcd],
primaryjoin="and_(Job.emplid == Timesheet.emplid,"
"Job.empl_rcd == Timesheet.empl_rcd)",
uselist=False)
days = db.relationship('Day', backref="timesheet")
department = db.relationship('Department',
foreign_keys=[department_id],
primaryjoin = 'Department.id == Timesheet.department_id')
class Day(db.Model):
__tablename__ = 'day'
id = db.Column(db.Integer, primary_key=True)
timesheet_id = db.Column(db.Integer, db.ForeignKey(Timesheet.id), nullable=False)
date = db.Column(db.Date, nullable = False)
hours = db.relationship('Hour', backref="day")
@hybrid_property
def total_hours(self):
all_hours = sum([i.hours for i in self.hours])
return all_hours
@total_hours.expression
def total_hours(cls):
return db.select([db.func.sum(Hour.hours)]).where(Hour.day_id==cls.id).label('total_hours')
@hybrid_method
def day_of_week(self, tz=config.TZ):
"""
tz is not used since this is naive date, but is kept to keep the api
the same with the other methods of the same name
"""
return self.date.isoweekday()
@day_of_week.expression
def day_of_week(cls, tz=config.TZ):
"""
tz is not used since this is naive date, but is kept to keep the api
the same with the other methods of the same name
"""
return db.func.extract('ISODOW', cls.date).cast(db.Integer)
@hybrid_method
def is_weekend(self, tz=config.TZ):
SATURDAY = 6
SUNDAY = 7
day = self.day_of_week(tz)
return day in [SATURDAY, SUNDAY]
@is_weekend.expression
def is_weekend(cls, tz=config.TZ):
SATURDAY = 6
SUNDAY = 7
day = cls.day_of_week(tz)
return db.or_(day == SATURDAY, day == SUNDAY)
@hybrid_method
def is_weekday(self, tz=config.TZ):
return not self.is_weekend(tz)
@is_weekday.expression
def is_weekday(cls, tz=config.TZ):
return db.not_(cls.is_weekend(tz))
@hybrid_method
def start_of_week(self, start_day, tz=config.TZ):
week_start = None
start = arrow.get(self.date, tz)
dow = self.day_of_week(tz)
if dow < start_day:
shift_days = start_day - dow
week_start = start.shift(days=shift_days)
else:
shift_days = (start_day + 7) - dow
week_start = start.shift(days=shift_days)
week_start = week_start.shift(days=-7)
return arrow.get(week_start.date(), tz)
@start_of_week.expression
def start_of_week(cls, start_day, tz=config.TZ):
"""
tz is not used since this is naive date, but is kept to keep the api
the same with the other methods of the same name
"""
return start_of_week_from_date(cls.date, start_day)
@hybrid_method
def start_of_week_str(self, start_day, tz=config.TZ):
return self.start_of_week(start_day, tz).format("YYYY-MM-DD")
@start_of_week_str.expression
def start_of_week_str(cls, start_day, tz=config.TZ):
return db.func.to_char(cls.start_of_week(start_day, tz), "YYYY-MM-DD")
class HourType(db.Model):
__tablename__ = 'hour_type'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, nullable = False)
leave = db.Column(db.Boolean)
#new
leave_unpaid = db.Column(db.Boolean, nullable=False, default=False)
day_ok = db.Column(db.Boolean, nullable=False, default=True)
shift_ok = db.Column(db.Boolean, nullable=False, default=True)
#new
exempt_ok = db.Column(db.Boolean, nullable=False, default=True)
non_exempt_ok = db.Column(db.Boolean, nullable=False, default=True)
weight = db.Column(db.Float, nullable=False, default=10.0) # Used for sorting
disabled = db.Column(db.Boolean, nullable=False, default=False)
external_code = db.Column(db.Text, nullable=True)
is_work = db.Column(db.Boolean, nullable=False, default=True)
adds_additional_time = db.Column(db.Boolean, nullable=False, default=True)
class Hour(db.Model):
__tablename__ = 'hour'
id = db.Column(db.Integer, primary_key=True)
day_id = db.Column(db.Integer, db.ForeignKey(Day.id), nullable = False)
hour_type_id = db.Column(db.Integer, db.ForeignKey(HourType.id), nullable = False)
hours = db.Column(db.Float, nullable = False)
note = db.Column(db.Text, nullable = True)
# Do I need to make a type model?
type_ = db.relationship('HourType')
def setup_db():
###########
#
# Setup the materialized views with testing data
#
###########
db.session.execute("""
drop materialized view if exists public.job_mv cascade;
create materialized view public.job_mv as
(
SELECT
*
FROM
(
SELECT
'Executive' as flsa_status,
true as exempt,
true as non_faculty_exempt,
'0000101' as emplid,
0 as empl_rcd,
'TESTS_001' as department_id,
'President' as title,
'TESTS_001' as job_deptid,
'President''s Office' as "Job Department",
'JOB_114' as jobcode,
'President' as "Jobcode Title",
'10000001' as position_number,
'President' as position,
'President''s Office' as "Position Department",
'TESTS_001' as position_deptid,
'Active' as "Payroll Status",
'SOMEPLACE' as location,
'Full-Time' as "Full/Part Time",
'EMP' as paygroup,
'R' as reg_temp,
'Active' as "Benefits Employee Status",
'Salaried' as "Employee Type",
1.000000 as fte,
' ' as reports_to,
' ' as supervisor_id,
'2080.00' as "Standard Hours",
'Annually' as "Standard Hours Frequency"
Union select 'Executive', true, true, '0000100' , 0, 'TESTS_002',
'Vice President for Awsome', 'TESTS_002', 'Bureau of Awesome',
'JOB_000', 'Executive Vice President', '10000002',
'Vice President for Awesome', 'Bureau of Awesome', 'TESTS_002' ,
'Active', 'SOMEPLACE', 'Full-Time', 'EMP', 'R', 'Active',
'Salaried', 1.000000, '10000001',' ',
2080.00, 'Annually'
) a
order by emplid
); """)
db.session.execute(""" drop materialized view if exists public.department_mv;
create materialized view public.department_mv as
(
SELECT
*
FROM
(
SELECT 'TESTS_001' AS id, 'TESTS_001' AS deptid, 'President''s Office' as name
UNION SELECT 'TESTS_002','TESTS_002', 'Bureau of Awesome'
) a
order by deptid)
""")
db.session.commit()
# Create the tables for the Acutal Tables
db.create_all()
if __name__ == "__main__":
print("Example")
day_week_starts = 6 # Saturday - ISO day of week
TZ = config.TZ
setup_db()
# An pay period is required for our report
pay_period = PayPeriod( description = "Current",
start=arrow.utcnow().shift(days=-15).to(TZ).date(),
end=arrow.utcnow().shift(days=+15).to(TZ).date(),
open_=True,
pay_groups=['EMP'],
start_of_week=day_week_starts
)
db.session.add(pay_period)
db.session.commit()
# Job record for the timesheet we are setting up
job = Job.query.get(['0000100', 0])
# Create the timesheet
with db.session.no_autoflush:
timesheet = Timesheet()
timesheet.pay_period = pay_period
timesheet.type_ = 'Day'
timesheet.job = job
timesheet.department = job.department
db.session.add(timesheet)
db.session.commit()
########
#
# The actual query that has radically different performace characteristics.
#
# Note: We use the query -> subquery for query reuse a lot in our app
#
########
# Job Standard hours (source depends on if a seperate schedule is setup
has_schedule = db.session.query(WorkSchedule.emplid, WorkSchedule.empl_rcd)
job_standard_weekly_hours = db.session.query(Job.id.label('id'),
Job.emplid.label('emplid'),
Job.empl_rcd.label('empl_rcd'),
Job.pay_group.label('pay_group'),
Job.exempt.label('exempt'),
Job.standard_hours.label('standard_hours'),
Job.standard_hours_frequency.label('standard_hours_frequency'))\
.filter(~db.tuple_(Job.emplid, Job.empl_rcd).in_(has_schedule.subquery()))
schedule_standard_weekly_hours = db.session.query( Job.id.label('id'),
Job.emplid.label('emplid'),
Job.empl_rcd.label('empl_rcd'),
Job.pay_group.label('pay_group'),
Job.exempt.label('exempt'),
WorkSchedule.weekly_hours.label('standard_hours'),
db.literal('Weekly').label('standard_hours_frequency')
).join(WorkSchedule, db.and_(Job.emplid == WorkSchedule.emplid,
Job.empl_rcd == WorkSchedule.empl_rcd))
# Combine the two possible standard hour sources
standard_weekly_hours = job_standard_weekly_hours.union(schedule_standard_weekly_hours)
standard_weekly_hours = standard_weekly_hours.filter(Job.standard_hours_frequency == 'Weekly')
standard_weekly_hours = standard_weekly_hours.filter(Job.exempt == False)
# Speed of query seems okay in both versions
print("Standard weekly hours", standard_weekly_hours.count())
#
# Hours report query
#
hours = db.session.query(Hour.id.label('hour_id'),
Day.id.label('day_id'),
Timesheet.id.label('timesheet_id'),
Timesheet.emplid.label('emplid'),
Timesheet.empl_rcd.label('empl_rcd'),
PayPeriod.id.label('pay_period_id'),
Day.date.label('day_date'),
Day.is_weekend().label('is_weekend'),
Day.is_weekday().label('is_weekday'),
HourType.name.label('type'),
HourType.id.label('type_id'),
HourType.is_work.label('is_work'),
HourType.adds_additional_time.label('adds_additional_time'),
HourType.leave.label('leave'),
HourType.leave_unpaid.label('leave_unpaid'),
Hour.hours.label('hours'),
Day.start_of_week_str(day_week_starts, tz=TZ).label('start_of_week_str'),
Day.start_of_week(day_week_starts, tz=TZ).label('start_of_week'),
db.and_(PayPeriod.last_week() == Day.start_of_week(day_week_starts, tz=TZ),
~PayPeriod.last_week_is_full()).label('partial_last_week'),
)
hours = hours.join(Hour, Hour.day_id == Day.id)\
.join(HourType, Hour.hour_type_id == HourType.id)\
.join(Timesheet, Timesheet.id == Day.timesheet_id)\
.join(PayPeriod, PayPeriod.id == Timesheet.pay_period_id)
# Add padding so there is a day for every day of the pay period
pay_period_days = db.session.query(
db.func.generate_series(pay_period.start,
pay_period.end,
timedelta(days=1)
).cast(db.Date).label('date')).subquery()
padding_days = db.session.query(
db.literal(-1).label('hour_id'),
db.literal(-1).label('day_id'),
pay_period_days.c.date.label('day_date'),
is_weekend(pay_period_days.c.date).label('is_weekend'),
is_weekday(pay_period_days.c.date).label('is_weekday'),
db.literal('Query Padding').label('type'),
db.literal(-1).label('type_id'),
db.literal(False).label('is_work'),
db.literal(False).label('adds_additional_time'),
db.literal(True).label('leave'),
db.literal(False).label('leave_unpaid'),
db.literal(0).label('hours'),
start_of_week_from_date(pay_period_days.c.date, day_week_starts).label('start_of_week'),
start_of_week_from_date_str(pay_period_days.c.date, day_week_starts).label('start_of_week_str'),
)
timesheet_info = db.session.query(Timesheet.id.label('timesheet_id'),
Timesheet.emplid.label('emplid'),
Timesheet.empl_rcd.label('empl_rcd'),
Timesheet.pay_period_id.label('pay_period_id'),
PayPeriod.last_week().label('pay_period_last_week'),
PayPeriod.last_week_is_full().label('pay_period_last_week_is_full'),
).join(Timesheet.pay_period).filter(Timesheet.pay_period_id == pay_period.id)
d = padding_days.subquery()
t = timesheet_info.subquery()
# Cartesian product on purpose wo we can make sure there is a day in our
# query for each day in the pay period.
timesheet_padding = db.session.query(d.c.hour_id,
d.c.day_id,
t.c.timesheet_id,
t.c.emplid,
t.c.empl_rcd,
t.c.pay_period_id,
d.c.day_date,
d.c.is_weekend,
d.c.is_weekday,
d.c.type,
d.c.type_id,
d.c.is_work,
d.c.adds_additional_time,
d.c.leave,
d.c.leave_unpaid,
d.c.hours,
d.c.start_of_week_str,
d.c.start_of_week,
db.and_(t.c.pay_period_last_week == d.c.start_of_week,
~t.c.pay_period_last_week_is_full).label('partial_last_week'),
).join(t, db.true())
hours = hours.union(timesheet_padding)
hours = hours.order_by(Timesheet.emplid,
Timesheet.id,
Day.start_of_week_str(day_week_starts).desc(),
Day.date.desc(),
HourType.leave)
# Speed of query seems okay in both versions
print("Hours", hours.count())
h = hours.subquery()
sh = standard_weekly_hours.subquery()
hours_prep = db.session.query(h.c.emplid,
h.c.pay_period_id,
h.c.timesheet_id,
h.c.timesheet_id,
h.c.empl_rcd,
h.c.start_of_week_str,
h.c.start_of_week,
db.case(
[((pay_period.end - h.c.start_of_week) < timedelta(days=6), False)], # 6 days - 31-25 is 6 even though it is 7 days
else_=True).label('whole_week'), # If the last week of the time period was a full one
h.c.is_work,
h.c.adds_additional_time,
h.c.leave,
h.c.leave_unpaid,
h.c.hour_id,
h.c.hours,
h.c.day_date,
sh.c.standard_hours
)
hours_prep = hours_prep.outerjoin(sh, db.and_(sh.c.emplid == h.c.emplid,
sh.c.empl_rcd == h.c.empl_rcd)
)
print("Hours Prep", hours_prep.count())
h = hours_prep.subquery()
weekly = db.session.query(h.c.emplid,
db.func.string_agg(h.c.pay_period_id.cast(db.Text).distinct(), '; ').label('pay_period_ids'),
db.func.array_agg(h.c.pay_period_id.distinct()).label('pay_period_ids_list'),
db.func.string_agg(h.c.timesheet_id.cast(db.Text).distinct(), '; ').label('timesheet_ids'),
db.func.array_agg(h.c.timesheet_id.distinct()).label('timesheet_ids_list'),
h.c.empl_rcd,
h.c.start_of_week_str,
h.c.start_of_week,
h.c.whole_week,
h.c.standard_hours,
db.func.count(h.c.hour_id).label('hours_count'),
db.func.sum(h.c.hours).label('total_hours'),
(db.func.sum(db.case([(h.c.adds_additional_time == True, h.c.hours)], else_=0)) - h.c.standard_hours).label("additional"),
(db.func.sum(db.case([(db.and_(h.c.adds_additional_time == True, h.c.is_work == False), h.c.hours)], else_=0))).label("additional_hours_not_work"),
(h.c.standard_hours - db.func.sum(h.c.hours) ).label("short"),
(db.func.sum(db.case([(h.c.leave == True and h.c.leave_unpaid == False, h.c.hours)], else_=0))).label('paid_leave_hours'),
(db.func.sum(db.case([(h.c.leave_unpaid == True, h.c.hours)], else_=0))).label('unpaid_leave_hours'),
(db.func.sum(db.case([(h.c.is_work == True, h.c.hours)], else_=0))).label('work_hours'),
)
weekly = weekly.filter( h.c.day_date >= pay_period.first_week(tz=TZ).date(), # not the first day of the pay period, but of first day of the first week of the pay period.
h.c.day_date <= pay_period.end,
)
weekly = weekly.group_by(h.c.emplid,
h.c.empl_rcd,
h.c.start_of_week_str,
h.c.start_of_week,
h.c.whole_week,
h.c.standard_hours,
)
# Only include rows with short or additional time,
weekly = weekly.having( db.and_(
db.or_(
(db.func.sum(h.c.hours) - h.c.standard_hours) > 0,
(h.c.standard_hours - db.func.sum(h.c.hours) ) > 0
)
)
)
# Speed of query seems slightly slower in 1.4
# About 2s and 120MB of ram in version 1.4 if I end at this point compared to
# about 0.7s and about 20MB of ram
print("Weekly", weekly.count())
####
#
# This seems to be what is causing the slow down and memory usage
#
####
# Clean up the output
w = weekly.subquery()
cleanup = db.session.query(
w.c.emplid,
w.c.empl_rcd,
w.c.pay_period_ids,
w.c.pay_period_ids_list,
w.c.timesheet_ids,
w.c.timesheet_ids_list,
#w.c.timesheet_jobs,
w.c.start_of_week_str,
w.c.start_of_week,
w.c.whole_week,
w.c.standard_hours,
w.c.hours_count,
w.c.total_hours,
w.c.work_hours,
db.case([(w.c.additional > 0, w.c.additional)], else_=0).label('additional'),
db.case([(w.c.additional_hours_not_work > 0, w.c.additional_hours_not_work)], else_=0).label('additional_hours_not_work'),
db.case([(w.c.additional > 0, w.c.paid_leave_hours)], else_=0).label('paid_leave_hours'),
db.case([(w.c.additional > 0, w.c.unpaid_leave_hours)], else_=0).label('unpaid_leave_hours'),
#db.case([((w.c.additional - w.c.unpaid_leave_hours) > w.c.standard_hours, (w.c.additional - w.c.unpaid_leave_hours) )], else_=0).label('additional_without_unpaid_leave_hours'),
db.case([ ( db.and_(w.c.unpaid_leave_hours == 0, w.c.additional > 0), w.c.additional ),
( (w.c.additional - w.c.unpaid_leave_hours) >= 0, (w.c.additional - w.c.unpaid_leave_hours) )
], else_=0).label('additional_without_unpaid_leave_hours'),
db.case([(w.c.short > 0, w.c.short)], else_=0).label('short'),
)
# Only include whole weeks so we don't show people as having short time
# just because it is the end of the month.
cleanup = cleanup.filter(w.c.whole_week == True)
print("Cleanup", cleanup.count()) Thank you for your help and I look forward to migrating to v1.4. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
hi there - this is a very long example that also uses external dependencies, are you able to provide any profiling data using the linked recipe? This will provide any details on whether there is some Python codepath that is much more intensive. Based on the memory results it seems likely however that something is either leaking memory or using way too much of it, and the time spent could be just the Python interpreter growing to that 2G size. For the memory situation, can you try disabling caching . The main thing SQLAlchemy 1.4 does differently with memory is that it's storing query structures in a cache. I can try running later but if you can just experiment with those two things that may provide some clues. |
Beta Was this translation helpful? Give feedback.
hi there -
this is a very long example that also uses external dependencies, are you able to provide any profiling data using the linked recipe? This will provide any details on whether there is some Python codepath that is much more intensive.
Based on the memory results it seems likely however that something is either leaking memory or using way too much of it, and the time spent could be just the Python interpreter growing to that 2G size. For the memory situation, can you try disabling caching . The main thing SQLAlchemy 1.4 does differently with memory is that it's storing query structures in a cache.
I can try running later but if you can just experiment with those two things that m…