Skip to content

Takes a standard LexisNexis input file and flattens it to a data analyzable SQL table using regex and Gemini

Notifications You must be signed in to change notification settings

ni-xu/document-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

This project was made by Nicholas Xu for his RAship for Dr. Bobby Harris in the Georgia Tech School of Economics.

It takes the input of a large list of delineated articles in one file, submits basic article information into an SQL table, then runs each entry through an LLM. The results are then submitted to a new SQL table. A random selection of articles are checked to prevent hallucination. The result is a flattened version of the given articles in data analyzable form.

This process is asychronously controlled by a semaphore that regulates the number of tasks that are run at a time, in batch_processor, to avoid rate limits.

By slightly augmenting regex segmentation rules, gemini prompting, and adding a Gemini API key to your local machine, this script can be repurposed to process any large text file.

About

Takes a standard LexisNexis input file and flattens it to a data analyzable SQL table using regex and Gemini

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages