Skip to content

Bulk Uploader Design

Sandeep Dolia edited this page Apr 24, 2020 · 21 revisions

Bulk Upload Design

Introduction

HUD(XML, CSV zip)files can be uploaded using the HMIS Admin application with the help of bulk upload microservice.

Bulk Upload Service

The bulk upload service is a microservice which has APIs to allow a user to upload a HUD file for processing. API documentation for bulk upload API can be found below. https://docs.hslynk.com/?urls.primaryName=Bulk%20Upload%20API

S3 Bucket

The uploaded files are saved in a private secure S3 bucket to be used by the bulk upload worker process.

Bulk uploader worker process

The bulk uploader worker process takes care of validating the file and persisting the data into the HMIS Postgres database. It also has checks for deduping the data and also tracks validation errors which can be viewed via the Hmis Admin application. Below are the different workflow stages to efficiently track the status of a bulk upload. Once the data is completely loaded the data will be ready to access via the HMIS Apis. Below is a quick summary of the different work flow statuses in the bulk uploader process. Bulk Workflow Upload Status

  • S3 = When a file is about to be pushed to S3.
  • INITIAL = The HUD CSV/XML is in S3 and is ready for the bulkupload worker.
  • INPROGRESS = Worker process is processing client records.
  • ENROLLMENT = Worker process has is processing enrollment reords.
  • C_CLIENT = Worker process has is processing records for all children elements for client.
  • C_EMENT = Worker process has is processing records for all children elements for enrollment.
  • EXIT = Worker process has is processing exit records.
  • C_EXIT = Worker process has is processing records for all children elements for exit.
  • DISAB = Worker process has is processing records for all the disabilities records.
  • LIVE = Worker process has completed process of the entire file.
  • ERROR = Worker process failed due to invalid fail format.

Open EMPI

The worker process also makes REST calls to the "Client Dedup Microservice" which uses locally hosted "OPEN EMPI" application to determine a unique client(homeless person)

Hmis Transactional Database

The bulk uploaded data is stored in the Hmis transactional database once the upload is completed and is accessible via the APIS.

Sync Worker

Worker process that syncs the data from the Hmis transactional database to the Big Data warehouse (HBASE).

Big Data Warehouse

The data will be available for reporting once the data reaches the Big Data warehouse. Typically, we can expect the data to be available in the Big data warehouse within 2 hours after the upload was successfully processed.

HUD versions supported

  • 4.10
  • 4.11
  • 5.1
  • 6.12
  • FY2020

Admin application screens

Bulk upload screen Bulk Upload Design Manage bulk upload screen Bulk Upload Design Statistics on a bulk upload Bulk Upload Design Errors and Validation screen for a bulk upload Bulk Upload Design

Bulk Upload Validations/Tracking

The HMIS bulk uploader process is designed to be fault-tolerant. Typically when systems get data from an external source, it is highly possible that there may be various abnormalities with the data. Hence the data needs to staged and validated before processing.

  • The file will not be processed and the bulk uploader will be in an ERROR state when the file is not in the HUD specific format.
  • Each HMIS table has an Export_ID associated with it which makes rolling back very easy for a bulk upload if errors were encountered.
Clone this wiki locally