W-13656292 feat: add `import bulk/resume` commands by cristiand391 · Pull Request #1091 · salesforcecli/plugin-data

cristiand391 · 2024-10-14T16:59:38Z

What does this PR do?

Adds 2 new commands:

`data import bulk`

take a CSV with fields of an object and bulk insert them into the org:

happy path

Screen.Recording.2024-10-16.at.10.57.01.AM.mov

failed to import all records

Screen.Recording.2024-10-16.at.11.07.10.AM.mov

job failure

job aborted

Screen.Recording.2024-10-16.at.11.12.26.AM.mov

`data import resume`

resume an async/timed out import

Screen.Recording.2024-10-16.at.11.17.31.AM.mov

testing instructions

checkout PR locally and sf plugins link it, you can use the following CSV files available in plugin-data for testing:
test/test-files/data-project/data/bulkUpsertLarge.csv (big)
test/test-files/data-project/data/bulkUpsert.csv (smol)

What issues does this PR fix or reference?

@W-13656292@

`data import bulk` will cache the API used when it created the job

cristiand391 · 2024-10-16T12:43:01Z

src/bulkDataRequestCache.ts

+      isState: true,
+      filename: BulkImportRequestCache.getFileName(),
+      stateFolder: Global.SF_STATE_FOLDER,
+      ttl: Duration.days(7),


bulk ingest/query job results are available for 7 days after being created:
https://developer.salesforce.com/docs/atlas.en-us.252.0.salesforce_app_limits_cheatsheet.meta/salesforce_app_limits_cheatsheet/salesforce_app_limits_platform_bulkapi.htm

cristiand391 · 2024-10-16T12:44:22Z

src/commands/data/import/bulk.ts

+  jobId: string;
+  processedRecords?: number;
+  successfulRecords?: number;
+  failedRecords?: number;


JSON output:

async: jobID
sync: jobID and num of processed/successful/failed records

cristiand391 · 2024-10-16T13:03:38Z

src/commands/data/import/bulk.ts

+    });
+
+    try {
+      await job.poll(5000, timeout.milliseconds);


5s for the polling interval, we could add a --poll-interval flag if users want to customize this later.

cristiand391 · 2024-10-16T13:07:18Z

src/commands/data/import/bulk.ts

+        jobId: jobInfo.id,
+        processedRecords: jobInfo.numberRecordsProcessed,
+        successfulRecords: jobInfo.numberRecordsProcessed - (jobInfo.numberRecordsFailed ?? 0),
+        failedRecords: jobInfo.numberRecordsFailed,


The API only gives us the total of records processed (that includes success/failure) and failed ones so we need calculate successful qty.
https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_info.htm

cristiand391 · 2024-10-16T13:09:56Z

src/commands/data/import/bulk.ts

+        ms.stop('failed');
+        throw messages.createError(
+          'error.jobFailed',
+          [jobInfo.errorMessage, conn.getUsername(), job.id],


jobInfo.errorMessage is guaranteed to be present on state=Failed:
https://github.com/jsforce/jsforce/blob/160426335c3d6f8efd1c3244eacb0454e755c988/src/api/bulk2.ts#L802
https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_info.htm

cristiand391 · 2024-10-16T13:10:53Z

src/commands/data/import/bulk.ts

+      if (jobInfo.state === 'Aborted') {
+        ms.stop('failed');
+        // TODO: replace this msg to point to `sf data bulk results` when it's added (W-12408034)
+        throw messages.createError('error.jobAborted', [conn.getUsername(), job.id], [], err as Error);


aborted jobs don't include an error msg, we log the sf org open .. msg so they can view more details in the org.

cristiand391 · 2024-10-16T13:39:16Z

src/commands/data/import/resume.ts

+      char: 'w',
+      unit: 'minutes',
+      summary: messages.getMessage('flags.wait.summary'),
+      defaultValue: 5,


data import resume has a default wait time of 5 minutes.
This is on purpose so that if you don't specify --wait you still have a good chance the job finishes on time.

data upsert resume and data delete resume have --wait with 0 as a default, so on a first run without --wait you have to re-run them again with a bigger timeout if the job is still running.

cristiand391 · 2024-10-16T13:41:06Z

src/commands/data/import/resume.ts

+      char: 'i',
+      length: 18,
+      startsWith: '750',
+      exactlyOne: ['use-most-recent'],


following @VivekMChawla's comments about specificity:
https://salesforce-internal.slack.com/archives/G02K6C90RBJ/p1722277847033949

sf data import resume is invalid, you have to pass either --job-id or --use-most-recent

cristiand391 · 2024-10-16T13:42:13Z

src/commands/data/import/resume.ts

+      flags['job-id'],
+      flags['use-most-recent'],
+      undefined,
+      undefined


the 4th arg is supposed to be api-version but I didn't add that flag here because the API version used when creating the job is cached by data import bulk so we don't need to pass it here.

cristiand391 · 2024-10-16T13:45:57Z

src/commands/data/import/resume.ts

+            const numberRecordsFailed = data?.numberRecordsFailed ?? 0;
+
+            if (data?.numberRecordsProcessed) {
+              return (data.numberRecordsProcessed - numberRecordsFailed).toString();


we default to 0 on L84 for failed records if no API info is available yet (this state update happens while job is being processed) but only render the Successful records block if the API returns numberRecordsProcessed.
This makes oclif/mso render a spinner instead of 0 when there's no processed data (first seconds of a job run or if it failed).

cristiand391 · 2024-10-16T13:48:11Z

src/commands/data/import/resume.ts

+      const jobInfo = await job.check();
+
+      // send last data update so job status/num. of records processed/failed represent the last update
+      ms.goto('Processing the job', jobInfo);


if job.poll on L138 throws then the state rendered is from the previous poll update so we do one last update with fresh data from the org (job.check()) so the last state rendered is accurate (job state, record counter).

src/commands/data/import/bulk.ts

src/commands/data/import/resume.ts

mdonnalley · 2024-10-16T19:03:35Z

QA

🔴 bulk import with --wait

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --wait 10

 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 4ms
 ✔ Processing the job 6.01s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: ◝

 Status: JobComplete
 Job Id: 750Oy00000Dh2gcIAB
 Elapsed Time: 7.61s

It works as expected but Failed records is left with an unresolved spinner

🟡 bulk import

Question: is it supposed to be async by default?

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 4.05s

 Status: UploadComplete
 Job Id: 750Oy00000DgnMcIAJ
 Elapsed Time: 4.06s

Run "sf data import resume --job-id 750Oy00000DgnMcIAJ" to resume the operation.

🟢 bulk import with --async

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --async
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 1.41s

 Status: UploadComplete
 Job Id: 750Oy00000Dgu7yIAB
 Elapsed Time: 1.41s

Run "sf data import resume --job-id 750Oy00000Dgu7yIAB" to resume the operation.

🟢 resume an async import with --job-id

❯ sf data import resume --job-id 750Oy00000DgnMcIAJ
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 1.11s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000DgnMcIAJ
 Elapsed Time: 1.13s

🟢 resume an async import with --use-most-recent

❯ sf data import resume --use-most-recent
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 1.11s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000Dh2IPIAZ
 Elapsed Time: 1.14s

🟡 use invalid id for resume

I would have expected an error about the id not being valid but instead got this. Not sure if that's right or not

❯ sf data import resume --job-id 750Oy00000Dh2IPIAA
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.
Error (CannotCreateResumeOptionsWithoutAnOrgError): Cannot create a cache entry without a valid org.

🔴 links in terminal that doesn't support links

you need to provide your own fallback so that terminal-link doesn't insert non-visible whitespace characters, which is what I think causes this issue. See https://github.com/salesforcecli/plugin-deploy-retrieve/blob/main/src/utils/deployStages.ts#L78

🟢 async insert and resume with large csv

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsertLarge.csv --sobject account --async
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 3.29s

 Status: UploadComplete
 Job Id: 750Oy00000DgjAyIAJ
 Elapsed Time: 3.32s

Run "sf data import resume --job-id 750Oy00000DgjAyIAJ" to resume the operation.

~/repos/trailheadapps/dreamhouse-lwc on  main via ⬢ v20.15.0 took 5.7s
❯ sf data import resume --job-id 750Oy00000DgjAyIAJ
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 2m 30.23s
   ▸ Processed records: 76380
   ▸ Successful records: 76380
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000DgjAyIAJ
 Elapsed Time: 2m 30.79s

🟢 handles failures

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --line-ending CRLF --wait 10
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 3ms
 ✘ Processing the job 11.83s
   ▸ Processed records: ✘
   ▸ Successful records: ✘
   ▸ Failed records: ✘

 Status: Failed
 Job Id: 750Oy00000Dh2gdIAB
 Elapsed Time: 13.42s

Error (JobFailedError): Job failed to be processed due to:
ClientInputError : LineEnding is invalid on user data. Current LineEnding setting is CRLF

To review the details of this job, run:
sf org open --target-org test-f7zxeasp0pzx@example.com --path "/lightning/setup/AsyncApiJobStatus/page?address=%2F750Oy00000Dh2gdIAB"

cristiand391 · 2024-10-17T12:24:39Z

It works as expected but Failed records is left with an unresolved spinner

should be fixed now (data import resume was handling it correctly, all MSO stuff is merged into one class now)

🟡 bulk import
Question: is it supposed to be async by default?

yes, same as data bulk delete/upsert. I can't find why that decision was made.
data import resume does have a default wait time on purpose:
#1091 (comment)

🟡 use invalid id for resume

fixed by writing a separate cache resolver for data import like we did for data export.

🔴 links in terminal that doesn't support links

added the fallback

cristiand391 · 2024-10-17T12:27:05Z

@mdonnalley this is ready for review/qa again, thanks!

also, please squash-merge the PR when it's good to go (bunch of commits in my branch)

EDIT:
⚠️ don't merge until Juliet reviews the msgs/help text.

`job.poll` is in a try/catch, when a failure happens it throws but also emits `error`. We avoid stopping MSO on `error` because we want to send a last update in the `catch` block

mdonnalley · 2024-10-17T16:09:35Z

🟢 bulk import with --wait

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --wait 10

 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 1.49s
 ✔ Processing the job 11.32s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Ei00000EQhxKIAT
 Elapsed Time: 12.93s

🟢 use invalid id for resume

❯ sf data import resume --job-id 750Ei00000EQjmGIAz
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.
Error (BulkRequestIdNotFoundError): Could not find a cache entry for job ID 750Ei00000EQjmGIAz.

🟢 links in terminal that doesn't support links

cristiand391 added 14 commits October 14, 2024 13:58

feat: add import bulk/resume commands

f154d3e

feat: add --line-endingfor import bulk

d58b593

fix: failedRecords counter only on API data

5fe987c

fix: default wait to 5min

414776c

test: add import resume NUTs

c342422

test: export bulk NUTs use import bulk

b233685

test: use bin/dev

dcc8db6

test: off-by-one

db0e194

chore: remove api-version flag from resume

3dc1720

`data import bulk` will cache the API used when it created the job

chore: update messages

2477652

fix: exclusive flags

d72aca0

fix: exactlyOne instead of exclusive, we need 1 id

3916416

test: update import NUTs

ad86f63

test: better failure

cde122d

cristiand391 commented Oct 16, 2024

View reviewed changes

fix: capitalize MSO stages

be3e668

cristiand391 commented Oct 16, 2024

View reviewed changes

cristiand391 marked this pull request as ready for review October 16, 2024 14:19

cristiand391 requested a review from a team as a code owner October 16, 2024 14:19

cristiand391 requested a review from jshackell-sfdc October 16, 2024 14:23

mdonnalley requested changes Oct 16, 2024

View reviewed changes

fix: properly detect JSON mode

97de819

cristiand391 added 6 commits October 16, 2024 16:08

chore: use ms.error() method

def1280

chore: refactor

7d2fe3f

fix: set correct baseUrl

e221a58

fix: add return type

32caec2

fix: refactor bulk import cache resolver

afb21ce

fix: add fallback for terminal-link

98e13f9

cristiand391 added 2 commits October 17, 2024 09:58

chore: ci-rerun

de841e0

fix: do not stop MSO on error event

86d7f1f

`job.poll` is in a try/catch, when a failure happens it throws but also emits `error`. We avoid stopping MSO on `error` because we want to send a last update in the `catch` block

mdonnalley approved these changes Oct 17, 2024

View reviewed changes

fix: edit messages for new "data import bulk|resume" commands (#1093)

9d9532f

mdonnalley merged commit b379335 into main Oct 18, 2024

mdonnalley deleted the cd/data-import-bulk branch October 18, 2024 17:09

iowillhoit changed the title ~~feat: add import bulk/resume commands~~ W-13656292 feat: add import bulk/resume commands Jan 27, 2025

Conversation

cristiand391 commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

data import bulk

happy path

failed to import all records

job failure

job aborted

data import resume

testing instructions

What issues does this PR fix or reference?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristiand391 Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdonnalley commented Oct 16, 2024

Uh oh!

cristiand391 commented Oct 17, 2024

Uh oh!

cristiand391 commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdonnalley commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cristiand391 commented Oct 14, 2024 •

edited

Loading

`data import bulk`

`data import resume`

cristiand391 Oct 16, 2024 •

edited

Loading

cristiand391 commented Oct 17, 2024 •

edited

Loading