Skip to content
/ sentify Public

A tool that converts documents to clean one sentence per line text files ready for NLP and Generative AI processing

License

Notifications You must be signed in to change notification settings

ptarau/sentify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sentify is a simple and fast open source Python toolkit that aggregates in one step the tedious task of fetching, converting to text and segmenting documents into one sentence per line clean text files

I put it together thinking that it is an often unavoidable "stepping stone" for getting quickly to the really interesting NLP and AI tasks we care about these days.

The collected clean sentences are ready for NLP and ML tasks, including passing them to Generative AI for summarization, relation extraction and QA.

It handles local and remote txt and pdf files and urls as well as Wikipedia pages given by their title.

See code at

https://github.com/ptarau/sentify/blob/main/sentify/main.py

for the simple, all in one API.

Get it from github or fetch it from pypi with

pip3 install sentify

See tests/tests.py for testing out the API on several use cases.

Enjoy,

Paul Tarau

January, 2024

About

A tool that converts documents to clean one sentence per line text files ready for NLP and Generative AI processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published