Skip to content

Bulk Uploader Design

Sandeep Dolia edited this page Apr 24, 2020 · 21 revisions

Bulk Upload Design

Introduction

As part of the bulk uploader process (XML,CSV zip)files are uploaded using the HMIS Admin application with the help of bulk upload microservice.

Bulk Upload Service

The bulk upload service is a microservice which has APIs to allow a user to upload a HUD file for processing. API documentation for bulk upload API https://docs.hslynk.com/?urls.primaryName=Bulk%20Upload%20API

S3 Bucket

The uploaded files are saved in a private secure S3 bucket and can be used by the bulk upload worker process.

Bulk uploader worker process

 The bulk uploader worker process takes care of validating the file and persisting the data into the HMIS Postgres database. It also has checks for deduping the data and also tracks validation errors which can be viewed via the Hmis Admin application. Below are the different workflow stages to efficiently track the status of a bulk upload. Once the data is completely loaded the data will be ready to access via the HMIS Apis.

Bulk Workflow Upload Stages

  • S3 = When a file is about to be pushed to S3.
  • INITIAL = The HUD CSV/XML is in S3 and is ready for the bulkupload worker.
  • INPROGRESS = Worker process is processing client records.
  • ENROLLMENT = Worker process has is processing enrollment reords.
  • C_CLIENT = Worker process has is processing records for all children elements for client.
  • C_EMENT = Worker process has is processing records for all children elements for enrollment.
  • EXIT = Worker process has is processing exit records.
  • C_EXIT = Worker process has is processing records for all children elements for exit.
  • DISAB = Worker process has is processing records for all the disabilities records.
  • LIVE = Worker process has completed process of the entire file.
  • ERROR = Worker process failed due to invalid fail format.

Open EMPI

The worker process also makes REST calls to the "Client Dedup Microservice" which uses locally hosted "OPEN EMPI" application to determine a unique client(homeless person)

Hmis Transaction Database

 Database will which is accessible via the APIS.

Sync Worker

 Worker process which syncs the data from the Hmis transactional database to the Big Data warehouse (HBASE).

Big Data Warehouse

 The data will be available for reporting once the data is in the Big Data warehouse. Typically, we can expect the data to be available in the Big data warehouse within 2 hours after the upload was successfully processed.

HUD versions supported

  • 4.10
  • 4.11
  • 5.1
  • 6.12
  • FY2020

Bulk Upload validations/Tracking

The HMIS bulk uploader process is designed to be fault tolerant. Typically when systems get data from an external source, it is highly possible that there may be various abnormalities with the data.Hence the data needs to staged and validated before processing.

  • Duplicate file validation, the process is terminated immediately when a duplicate file is received.
  • Each HMIS table has an Export_ID associated with it which makes rolling back very easy for a bulk upload if errors were encountered.
Clone this wiki locally