Skip to content

Feature/c 867 organization enrichment#825

Merged
joanagmaia merged 22 commits intomainfrom
feature/c-867-organization-enrichment
May 16, 2023
Merged

Feature/c 867 organization enrichment#825
joanagmaia merged 22 commits intomainfrom
feature/c-867-organization-enrichment

Conversation

@elayira
Copy link
Copy Markdown
Contributor

@elayira elayira commented May 2, 2023

Changes proposed ✍️

What

🤖 Generated by Copilot at 5415868

This pull request implements the organization enrichment feature, which allows enriching the organization data of the users using the People Data Labs (PDL) API. It adds and updates the configuration, models, migrations, repositories, feature flags, message types, and services for the feature. It also adds a new worker function for the bulk organization enrichment operation, which is triggered by a service message of type OrganizationBulkEnrichMessage.

🤖 Generated by Copilot at 5415868

Sing, O Muse, of the mighty pull request
That enriched the data of organizations
With the secret key of PDL, the best
Of all the APIs of the nations.

Why

How

🤖 Generated by Copilot at 5415868

  • Add a new feature for enriching organization information using the People Data Labs (PDL) API (link, link, link, link, link, link, link, link, link)
  • Define a new interface OrganizationEnrichmentConfiguration for the configuration object for the feature in configTypes.ts (link)
  • Export the ORGANIZATION_ENRICHMENT_CONFIG object from index.ts in the config folder, which assigns the apiKey property based on the KUBE_MODE environment variable (link, link)
  • Define a new property ORGANIZATION_ENRICHMENT in the PLAN_LIMITS object in isFeatureEnabled.ts, which sets the limit of organization enrichment operations per month for each plan (link)
  • Define a new type OrganizationBulkEnrichMessage in messageTypes.ts, which holds the service and tenant id for the bulk organization enrichment operation (link)
  • Define a new class OrganizationEnrichmentService in organizationEnrichmentService.ts, which provides the logic for the feature, such as validating the plan limit, fetching the data from the PDL API, and updating the database models (link)
  • Define some types for the feature in organizationEnrichmentTypes.ts, such as OrganizationEnrichmentData, OrganizationEnrichmentResponse, and OrganizationEnrichmentError (link)
  • Define a new property ORGANIZATION_ENRICHMENT in the FeatureFlag enum and a new property ORGANIZATION_ENRICHMENT_COUNT in the FeatureFlagRedisKey enum in common.ts (link, link)
  • Add a new function BulkorganizationEnrichmentWorker in bulkOrganizationEnrichmentWorker.ts, which performs the bulk organization enrichment operation for a given tenant using the OrganizationEnrichmentService class (link)
  • Add a new case to the switch statement in the workerFactory function in workerFactory.ts, which handles the 'enrich-organizations' service message and calls the BulkorganizationEnrichmentWorker function (link, link, link)
  • Add some new columns to the organizations and organizationCaches tables in the database, which store the data obtained from the PDL API, such as lastEnrichedAt, employeeCountByCountry, type, ticker, headline, profiles, naics, industry, and founded (link, link, link, link)
    • Add the corresponding migration files to the database migrations folder, which drop some unused columns and add the new columns (link, link, link, link)
    • Add the corresponding properties to the Organization and OrganizationCache models in the database models folder, which have the appropriate data types and constraints (link, link)
  • Add a new static method bulkUpdate to the OrganizationRepository and OrganizationCacheRepository classes in the database repositories folder, which allow updating multiple organization instances in bulk using the bulkCreate method with the updateOnDuplicate and returning options (link, link)

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.
  • Add screehshots to the PR description for relevant FE changes
  • New backend functionality has been unit-tested.
  • API documentation has been updated (if necessary) (see docs on API documentation).
  • Quality standards are met.

@elayira elayira requested a review from epipav May 2, 2023 11:42
@elayira elayira added the Feature Created by Linear-GitHub Sync label May 3, 2023
@elayira elayira requested a review from joanreyero May 3, 2023 15:29
Copy link
Copy Markdown
Contributor

@joanreyero joanreyero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on this! 💪🏼 It's starting to get there.

I left a few comments of things that should be changed, and some questions 🙂

FROM "tenants"
WHERE tenants."plan" IN ('Growth')
OR (tenants."isTrialPlan" is true AND tenants."plan" = 'Growth')
;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to keep all the DB queries in the repository later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

type: DataTypes.ARRAY(DataTypes.TEXT),
allowNull: true,
},
headline: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? And how is it different from the description? If it's semantically the same (or very similar), we should just use description.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add comments for all the new fields that are not obvious? Like the ones where I left a comment, and naics, ticker, type...

Copy link
Copy Markdown
Contributor Author

@elayira elayira May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headline field according to PDL is a piece of brief information about the company. More detailed info can be found in the company's summary. This is what I am using to update our description field. naics is the company's classification according to the NAIC standard. ticker is the company's trading symbol. And type is the kind of entity the company is, ex an NGO. I have commented on the fields that need it.

type: DataTypes.TEXT,
allowNull: true,
},
size: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? And how is it different from employees? If it's semantically the same (or very similar), we should use employeeCount.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

employees is the number of workers in the company. size is the well, size of the company that's small, medium, large, etc. Size is represented as a range.

type: DataTypes.ARRAY(DataTypes.JSONB),
allowNull: true,
},
profiles: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from github, linkedin, crunchbase etc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile holds all the social networks a company has. It is designed to hold multiple links to the same network. It is part of the requirement in the task document.

const isValid = new Set(data.filter((org) => org.id).map((org) => org.id)).size !== data.length
if (isValid) return [] as T

const orgs = await options.database.organization.bulkCreate(data, {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding something, but the function name is bulkUpdate and the operation here is bulkCreate. Is this correct? If so, please add a comment explaining why 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bulkCreate can be used to create/update bulk. It updates on conflict with primary key or unique field when the updateOnDuplicate args is used. Comments have been added to the code to reflect this behavior.


async queryTenancyOrganizations(): Promise<IEnrichableOrganization[]> {
const options = await SequelizeRepository.getDefaultIRepositoryOptions()
const query = `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, let's keep queries in the repo layer. Feel free to use organizations repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

},
},
)
return orgs.map(org => this.selectFieldsForEnrichment(org))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From PDL's docs it looks like we obtain:

  • employee_count (we should have this as employees)
  • location: it's an object with several interesting fields. I think we should store all of them as a JSONB. Except for the GEO code. Let's make a new column for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not currently updating employees with employee_count from PDL as it's not in the task requirement document.

@elayira elayira requested a review from joanreyero May 6, 2023 01:33
@elayira elayira marked this pull request as ready for review May 8, 2023 00:17
@elayira elayira force-pushed the feature/c-867-organization-enrichment branch from 816cab4 to 70db9ad Compare May 14, 2023 18:31
@elayira elayira changed the base branch from main to staging/cycle-25 May 14, 2023 20:03
@elayira elayira changed the base branch from staging/cycle-25 to main May 14, 2023 20:07
@elayira elayira force-pushed the feature/c-867-organization-enrichment branch from b27eb60 to bc6c7ce Compare May 14, 2023 20:19
@elayira elayira force-pushed the feature/c-867-organization-enrichment branch from bc6c7ce to fee472b Compare May 14, 2023 20:24
@elayira elayira changed the base branch from main to staging/cycle-25 May 14, 2023 20:36
@elayira elayira changed the base branch from staging/cycle-25 to main May 14, 2023 20:40
@joanagmaia joanagmaia merged commit 789faf8 into main May 16, 2023
@joanagmaia joanagmaia deleted the feature/c-867-organization-enrichment branch May 16, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Created by Linear-GitHub Sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants