-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompute job keywords and scores on skills update #62
Conversation
const jobs = await Jobs.find({}); | ||
await forEachAsync(jobs, async (_, jobIdx) => { | ||
this.jobAnalyzer.computeJobKeywordCount(jobs[jobIdx], newSkills); | ||
await jobs[jobIdx].save(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make this synchronous and save after the for loop (may have better performance, especially when jobs is much larger)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd still need to loop through all jobs and save each, so don't see the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time googling about how .save()
is implemented, but couldn't find an answer. The reason I was concerned is because I imagine it is saving everything in jobs[jobIdx], every time you execute .save()
, so you would be saving everything job.length times... never mind you would have to do jobs.save() if you did it outside, which could be longer cause the .saves() on the inside would be asynchronous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jobs
is a Document[]
and .save()
operates on a Document
.
Doing
loop {
//stuff
await .save()
}
shouldn't have any performance difference compared to
loop {
//stuff
}
loop {
await .save()
}
// Update keyword counts of each job | ||
const jobs = await Jobs.find({}); | ||
await forEachAsync(jobs, async (_, jobIdx) => { | ||
this.jobAnalyzer.computeJobKeywordCount(jobs[jobIdx], newSkills); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a side note: if new jobs are added while new skills are added, this is still broken lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create an issue, author?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.logger.info('Starting to compute job scores...'); | ||
|
||
const jobs = await Jobs.find({}); | ||
const skills = await User._getAllSkills(); | ||
const offset = skillsStart || 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does offset = skillsStart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If skillsStart is not passed in, this evaluates to undefined || 0 = 0 instead of undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it ever be undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that you can call this with or without an argument.
i.e.
computeJobScores() // computes for all skills
computeJobScores(5, 12) // computes skills 5 to 11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are trying to do now. Could we set the default of skillsStart
and skillsEnd
to 0 then? I feel like that is easier to read.
async computeJobScores(skillsStart=0, skillsEnd=0)
I am not sure how it works in javascript, but if we do the above, it may be possible to just specify skillsStart and that would lead to an error when we try to split the list. Maybe
async computeJobScores(skillsStart=0, skillsEnd=skillsStart)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as optional arguments in JS.
.slice(skillsStart, undefined)
actually works, because this is the same as .slice(skillsStart)
, which by default will slice until the end.
} else { | ||
jobs[i].keywords[keywordIdx].tfidf = tfidf; | ||
} | ||
jobs[jobIdx].keywords[keywordIdx].tfidf = tf * idf; | ||
|
||
await job.save(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can save all jobs outside the async block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could prob do job.keywords[keywordIdx].save()
. This is very nitpicking, but may make it faster with a lot of users. Would test that there are no errors AND it actually saves though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried, got
mongoose: calling save()
on a subdoc does not save the document to MongoDB, it only runs save middleware.
* Recompute job keywords and scores on skills update * Renamed vars for better clarity
This adds functionality so that when new skills are added, job keyword counts and job scores will be updated.
This will not be merged into master yet, as this computation takes a long time and currently causes the client to wait a long time for it to complete on resume upload. This will need to be solved in a next PR.