wikiverify

This project explores using LLMs to verify that a cited source supports a passage of text in tertiary-source research material such as Wikipedia articles. Many Wikipedia articles cite reliable sources, but do not include the supporting passage of text directly in their reference. Finding the direct supporting material can be tedious when the source is a book, academic paper, or even a news article.

The core idea of the project is to use embedding to vectorize chunks of text, then use similarity search to find chunks relevant to the supported text in question. Feeding this context to a generative model (GPT) can yield a concise supporting passage of text or determine that there are no directly supporting passages.

Using recursive chunking, even sources containing millions of tokens can be efficiently searched .

Roadmap:

Use a similarity score cutoff to determine chunks in context
Explore different semantic chunking methods for large sources Adverse testing: test related but unsupported claims

Requirements

langchain v0.0.354
openai v1.6.1
pinecone-client v3.1.0
tiktoken v0.5.2

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
citation-checker.ipynb		citation-checker.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikiverify

Requirements

About

Releases

Packages

Languages

License

jsteng19/wikiverify

Folders and files

Latest commit

History

Repository files navigation

wikiverify

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages