GitHub - jd-coderepos/proc-tm: An annotated dataset for procedural text mining from product manuals

proc-tm: Prompts for Structured Procedure Extraction from Product Manual Specifications

About

This repository hosts an annotated dataset of prompts for structured procedural extraction (or text mining (tm)) using large language models (LLMs) from domain-specific product manuals as non-machine-actionable PDFs written in natural language text.

Repository Structure

.
└── proc-tm                                      <- root directory of the repository
    ├── photography                              <- data domain 1
        ├── manual-chunks                  <- split PDF procedure-wise page chunks
	        ├── procedure[x].pdf
	        └── ...
        └── type1 - list prompts           <- first prompt instruction type for procedural tm using LLMs
            └── raw				  <- prompt ChatGPT out-of-the-box
	           ├── prompts.txt   <- prompts collection
	           ├── chatgpt-response-example[x].txt   <- response from ChatGPT
	           ├── ...
	           ├── goldstd-response-example[x].txt   <- human-annotated response
	           └── ...	
            └── 2shot				  <- prompt ChatGPT in a 2-shot setting
	           ├── prompts.txt
	           └── chatgpt-response.txt
            └── ontology			  <- prompt ChatGPT out-of-the-box with an ontology
	           ├── prompts.txt
	           ├── ...	 
	           └── ...		
            └── ontology+2shot		  <- prompt ChatGPT in a 2-shot setting with an ontology
	           ├── prompts.txt
	           ├── ...	 
	           └── ...
        └── type2 - count prompts          <- second prompt instruction type for procedural tm using LLMs
            ├── raw				  <- prompt ChatGPT out-of-the-box	
            ├── ...            
            └── ontology+2shot		  <- prompt ChatGPT in a 2-shot setting with an ontology
        └── type3 - comparison prompts     <- third prompt instruction type for procedural tm using LLMs
            ├── raw				  <- prompt ChatGPT out-of-the-box	
            ├── ...            
            └── ontology+2shot		  <- prompt ChatGPT in a 2-shot setting with an ontology
        └── type4 - nested_proc prompts    <- fourth prompt instruction type for procedural tm using LLMs
            ├── raw				  <- prompt ChatGPT out-of-the-box	
            ├── ...            
            └── ontology+2shot		  <- prompt ChatGPT in a 2-shot setting with an ontology
        └── type5 - sequence prompts       <- fifth prompt instruction type for procedural tm using LLMs
            ├── raw				  <- prompt ChatGPT out-of-the-box	
            ├── ...            
            └── ontology+2shot		  <- prompt ChatGPT in a 2-shot setting with an ontology
	├── agriculture                              <- data domain 2
	├── medicine                                 <- data domain 3
	├── manufacturing                            <- data domain 4
	├── ontology-procedure.ttl                   <- the procedural ontology structure	
    └── README.md                       <- README file for documenting the dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

proc-tm: Prompts for Structured Procedure Extraction from Product Manual Specifications

About

Repository Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
agriculture		agriculture
manufacturing		manufacturing
medicine		medicine
photography		photography
.gitignore		.gitignore
README.md		README.md
ontology-procedure.ttl		ontology-procedure.ttl
rouge_scorer.py		rouge_scorer.py

jd-coderepos/proc-tm

Folders and files

Latest commit

History

Repository files navigation

proc-tm: Prompts for Structured Procedure Extraction from Product Manual Specifications

About

Repository Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages