DDx Pre-Normalization

Overview

One of the driving concepts behind the Democratic Data Exchange is data normalization - the process of taking incoming survey and field data and transforming it in a way that obsfucates the origin of the data and allows the data to be more easily analyzed regardless of the collecting organization. The process of normalization is tedious and involves a large amount of human hours to interpret and normalize question and response data. Incoming data cannot be ingested until this process is complete.

During my time at DDx I had always wanted a way to "pre-normalize" data so that the analysts spent their time on verification and corner cases rather than on every single incoming question. There are a lot of cute ways to do this with Natural Language Processing but with the recent general availability of Large Language Models, I wanted to see if we could apply something like ChatGPT to create a rubric, in prose, on how to handle this initial normalization.

This project is a proof-of-concept of that idea and does the following:

Ingests a JSON file of survey questions and responses
Uses a prompt template to query ChatGPT and requests normalization
Transforms the response from a returned JSON object to a Golang struct

Normally I wouldn't bother putting a toy project like this online but I think the concept is interesting enough and hope an organization might find it useful, someday, when trying to expedite data ingestion into DDx.

Notes

I've never used Golang before
I'm writing this on an airplane to visit family for the holidays
This is just a proof of concept on how we can use LLMs for data normalization and integrity

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
prompt.txt		prompt.txt
test_all.json		test_all.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDx Pre-Normalization

Overview

Notes

About

Releases

Packages

Languages

License

m-stafford/ddx-prenorm

Folders and files

Latest commit

History

Repository files navigation

DDx Pre-Normalization

Overview

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages