Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompute job keywords and scores on skills update #62

Merged
merged 2 commits into from
Nov 3, 2019

Conversation

wchang22
Copy link
Collaborator

@wchang22 wchang22 commented Nov 2, 2019

This adds functionality so that when new skills are added, job keyword counts and job scores will be updated.

This will not be merged into master yet, as this computation takes a long time and currently causes the client to wait a long time for it to complete on resume upload. This will need to be solved in a next PR.

const jobs = await Jobs.find({});
await forEachAsync(jobs, async (_, jobIdx) => {
this.jobAnalyzer.computeJobKeywordCount(jobs[jobIdx], newSkills);
await jobs[jobIdx].save();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make this synchronous and save after the for loop (may have better performance, especially when jobs is much larger)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd still need to loop through all jobs and save each, so don't see the difference?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time googling about how .save() is implemented, but couldn't find an answer. The reason I was concerned is because I imagine it is saving everything in jobs[jobIdx], every time you execute .save(), so you would be saving everything job.length times... never mind you would have to do jobs.save() if you did it outside, which could be longer cause the .saves() on the inside would be asynchronous?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs is a Document[] and .save() operates on a Document.

Doing

loop {
  //stuff
  await .save()
}

shouldn't have any performance difference compared to

loop {
  //stuff
}
loop {
  await .save()
}

// Update keyword counts of each job
const jobs = await Jobs.find({});
await forEachAsync(jobs, async (_, jobIdx) => {
this.jobAnalyzer.computeJobKeywordCount(jobs[jobIdx], newSkills);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a side note: if new jobs are added while new skills are added, this is still broken lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue, author?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#71

this.logger.info('Starting to compute job scores...');

const jobs = await Jobs.find({});
const skills = await User._getAllSkills();
const offset = skillsStart || 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does offset = skillsStart

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If skillsStart is not passed in, this evaluates to undefined || 0 = 0 instead of undefined.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert statement?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it ever be undefined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that you can call this with or without an argument.

i.e.

computeJobScores() // computes for all skills
computeJobScores(5, 12) // computes skills 5 to 11

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are trying to do now. Could we set the default of skillsStart and skillsEnd to 0 then? I feel like that is easier to read.
async computeJobScores(skillsStart=0, skillsEnd=0)
I am not sure how it works in javascript, but if we do the above, it may be possible to just specify skillsStart and that would lead to an error when we try to split the list. Maybe
async computeJobScores(skillsStart=0, skillsEnd=skillsStart)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as optional arguments in JS.

.slice(skillsStart, undefined) actually works, because this is the same as .slice(skillsStart), which by default will slice until the end.

} else {
jobs[i].keywords[keywordIdx].tfidf = tfidf;
}
jobs[jobIdx].keywords[keywordIdx].tfidf = tf * idf;

await job.save();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can save all jobs outside the async block

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could prob do job.keywords[keywordIdx].save(). This is very nitpicking, but may make it faster with a lot of users. Would test that there are no errors AND it actually saves though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried, got
mongoose: calling save() on a subdoc does not save the document to MongoDB, it only runs save middleware.

@jacksonx9 jacksonx9 merged commit c0fadd0 into complex-logic-updates Nov 3, 2019
@jacksonx9 jacksonx9 deleted the new-skills branch November 3, 2019 03:04
wchang22 added a commit that referenced this pull request Nov 5, 2019
* Recompute job keywords and scores on skills update

* Renamed vars for better clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants