Feature/c 867 organization enrichment#825
Conversation
joanreyero
left a comment
There was a problem hiding this comment.
Great job on this! 💪🏼 It's starting to get there.
I left a few comments of things that should be changed, and some questions 🙂
| FROM "tenants" | ||
| WHERE tenants."plan" IN ('Growth') | ||
| OR (tenants."isTrialPlan" is true AND tenants."plan" = 'Growth') | ||
| ; |
There was a problem hiding this comment.
Let's try to keep all the DB queries in the repository later
| type: DataTypes.ARRAY(DataTypes.TEXT), | ||
| allowNull: true, | ||
| }, | ||
| headline: { |
There was a problem hiding this comment.
What is this? And how is it different from the description? If it's semantically the same (or very similar), we should just use description.
There was a problem hiding this comment.
Could you also add comments for all the new fields that are not obvious? Like the ones where I left a comment, and naics, ticker, type...
There was a problem hiding this comment.
The headline field according to PDL is a piece of brief information about the company. More detailed info can be found in the company's summary. This is what I am using to update our description field. naics is the company's classification according to the NAIC standard. ticker is the company's trading symbol. And type is the kind of entity the company is, ex an NGO. I have commented on the fields that need it.
| type: DataTypes.TEXT, | ||
| allowNull: true, | ||
| }, | ||
| size: { |
There was a problem hiding this comment.
What is this? And how is it different from employees? If it's semantically the same (or very similar), we should use employeeCount.
There was a problem hiding this comment.
employees is the number of workers in the company. size is the well, size of the company that's small, medium, large, etc. Size is represented as a range.
| type: DataTypes.ARRAY(DataTypes.JSONB), | ||
| allowNull: true, | ||
| }, | ||
| profiles: { |
There was a problem hiding this comment.
How does this differ from github, linkedin, crunchbase etc?
There was a problem hiding this comment.
The profile holds all the social networks a company has. It is designed to hold multiple links to the same network. It is part of the requirement in the task document.
| const isValid = new Set(data.filter((org) => org.id).map((org) => org.id)).size !== data.length | ||
| if (isValid) return [] as T | ||
|
|
||
| const orgs = await options.database.organization.bulkCreate(data, { |
There was a problem hiding this comment.
I might be misunderstanding something, but the function name is bulkUpdate and the operation here is bulkCreate. Is this correct? If so, please add a comment explaining why 🙂
There was a problem hiding this comment.
bulkCreate can be used to create/update bulk. It updates on conflict with primary key or unique field when the updateOnDuplicate args is used. Comments have been added to the code to reflect this behavior.
|
|
||
| async queryTenancyOrganizations(): Promise<IEnrichableOrganization[]> { | ||
| const options = await SequelizeRepository.getDefaultIRepositoryOptions() | ||
| const query = ` |
There was a problem hiding this comment.
Same as before, let's keep queries in the repo layer. Feel free to use organizations repo.
| }, | ||
| }, | ||
| ) | ||
| return orgs.map(org => this.selectFieldsForEnrichment(org)) |
There was a problem hiding this comment.
From PDL's docs it looks like we obtain:
employee_count(we should have this asemployees)location: it's an object with several interesting fields. I think we should store all of them as a JSONB. Except for the GEO code. Let's make a new column for it.
There was a problem hiding this comment.
We are not currently updating employees with employee_count from PDL as it's not in the task requirement document.
816cab4 to
70db9ad
Compare
b27eb60 to
bc6c7ce
Compare
bc6c7ce to
fee472b
Compare
…n-enrichment-qa-1351-prioritize-organizations-website
…n-enrichment-qa-1351-always-fill-lastEnriched
…n-enrichment-qa-1349-fill-employees-from-employeesByCountry
…d' into feature/c-867-organization-enrichment
…loyeesByCountry' into feature/c-867-organization-enrichment
Changes proposed ✍️
What
🤖 Generated by Copilot at 5415868
This pull request implements the organization enrichment feature, which allows enriching the organization data of the users using the People Data Labs (PDL) API. It adds and updates the configuration, models, migrations, repositories, feature flags, message types, and services for the feature. It also adds a new worker function for the bulk organization enrichment operation, which is triggered by a service message of type
OrganizationBulkEnrichMessage.
🤖 Generated by Copilot at 5415868
Why
How
🤖 Generated by Copilot at 5415868
OrganizationEnrichmentConfigurationfor the configuration object for the feature inconfigTypes.ts(link)ORGANIZATION_ENRICHMENT_CONFIGobject fromindex.tsin the config folder, which assigns theapiKeyproperty based on theKUBE_MODEenvironment variable (link, link)ORGANIZATION_ENRICHMENTin thePLAN_LIMITSobject inisFeatureEnabled.ts, which sets the limit of organization enrichment operations per month for each plan (link)OrganizationBulkEnrichMessageinmessageTypes.ts, which holds the service and tenant id for the bulk organization enrichment operation (link)OrganizationEnrichmentServiceinorganizationEnrichmentService.ts, which provides the logic for the feature, such as validating the plan limit, fetching the data from the PDL API, and updating the database models (link)organizationEnrichmentTypes.ts, such asOrganizationEnrichmentData,OrganizationEnrichmentResponse, andOrganizationEnrichmentError(link)ORGANIZATION_ENRICHMENTin theFeatureFlagenum and a new propertyORGANIZATION_ENRICHMENT_COUNTin theFeatureFlagRedisKeyenum incommon.ts(link, link)BulkorganizationEnrichmentWorkerinbulkOrganizationEnrichmentWorker.ts, which performs the bulk organization enrichment operation for a given tenant using theOrganizationEnrichmentServiceclass (link)workerFactoryfunction inworkerFactory.ts, which handles the'enrich-organizations'service message and calls theBulkorganizationEnrichmentWorkerfunction (link, link, link)organizationsandorganizationCachestables in the database, which store the data obtained from the PDL API, such aslastEnrichedAt,employeeCountByCountry,type,ticker,headline,profiles,naics,industry, andfounded(link, link, link, link)OrganizationandOrganizationCachemodels in the database models folder, which have the appropriate data types and constraints (link, link)bulkUpdateto theOrganizationRepositoryandOrganizationCacheRepositoryclasses in the database repositories folder, which allow updating multiple organization instances in bulk using thebulkCreatemethod with theupdateOnDuplicateandreturningoptions (link, link)Checklist ✅
Feature,Improvement, orBug.