Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DailyJobMetrics #1215

Merged

Conversation

@glogiotatidis
Copy link
Member

glogiotatidis commented Oct 30, 2019

This still needs tests but it appears to be working fine on local runs.

Posting here for a code review while I'm writing the tests, to speed up delivery

@glogiotatidis glogiotatidis requested a review from jgmize Oct 30, 2019
@glogiotatidis glogiotatidis force-pushed the glogiotatidis:issue-1208-dailyjobperformance branch from 5573e9d to dd90378 Oct 30, 2019
Copy link
Member

jgmize left a comment

This is a great start @glogiotatidis, thanks for working on this!

snippets/base/models.py Outdated Show resolved Hide resolved
'message_id': job.id,
}
try:
result = redash.query(settings.REDASH_QUERY_ID, bind_data)

This comment has been minimized.

Copy link
@jgmize

jgmize Oct 30, 2019

Member

I'm wondering about the performance implications of doing a separate redash query for each job. For the small number of jobs published each day this probably doesn't matter, but for loading all of the data going back to when we started publish jobs we might want to use a different approach. We can follow up with a separate script for that later, or we may be able to connect to the Redshift instance directly.

This comment has been minimized.

Copy link
@glogiotatidis

glogiotatidis Oct 31, 2019

Author Member

We're going to load historical data starting from Q4, since we have incomplete data for Sept and Q3 anyway. I constructed the command in this way so we can import which day separately. With a loop we will fetch all data within 24 hours which is ok for a one-off.

For the small number of jobs published each day

Each day we need to fetch data for more than 1000 Jobs and we need about 90 minutes to do so.

This comment has been minimized.

Copy link
@glogiotatidis

glogiotatidis Oct 31, 2019

Author Member

As discussed in our 1:1 we can optimize those queries and build a query with all jobs id and iterate through this in python. we'll eval later if needed

This comment has been minimized.

Copy link
@glogiotatidis

glogiotatidis Oct 31, 2019

Author Member

Actually what @jgmize proposed was way smarter and implemented things that way. Queries only take a few seconds per day now.

@glogiotatidis glogiotatidis force-pushed the glogiotatidis:issue-1208-dailyjobperformance branch 2 times, most recently from 6d2d7fd to 0108be3 Oct 31, 2019
@glogiotatidis glogiotatidis marked this pull request as ready for review Oct 31, 2019
@glogiotatidis

This comment has been minimized.

Copy link
Member Author

glogiotatidis commented Oct 31, 2019

Also added a cron job to fetch daily.

Ready for review @jgmize

@glogiotatidis glogiotatidis force-pushed the glogiotatidis:issue-1208-dailyjobperformance branch from 0108be3 to 640108d Oct 31, 2019
@jgmize
jgmize approved these changes Oct 31, 2019
Copy link
Member

jgmize left a comment

Great work @glogiotatidis! I had a tiny readability nit on your latest iteration, but this looks great to merge as-is.

blocks = 0

job_rows = [row for row in rows if row['message_id'] == str(job.id)]
for row in job_rows:

This comment has been minimized.

Copy link
@jgmize

jgmize Oct 31, 2019

Member

NIT: moving the conditional out of the list comprehension would be slightly more readable:

for row in rows:
    if row['message_id'] == str(job.id):
@glogiotatidis

This comment has been minimized.

Copy link
Member Author

glogiotatidis commented Nov 1, 2019

Thanks!

@glogiotatidis glogiotatidis merged commit 3047cfc into mozmeao:master Nov 1, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@@ -1019,7 +1019,7 @@ class JobAdmin(admin.ModelAdmin):
]
fieldsets = [
('ID', {
'fields': ('id', 'job_status', 'snippet_name_linked', 'creator')
'fields': ('id', ('job_status', 'completed_on'), 'snippet_name_linked', 'creator')

This comment has been minimized.

Copy link
@jgmize

jgmize Nov 1, 2019

Member

I missed this while reviewing: #1220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.