You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for making this awesome library.i am trying to make a bengali tafsir reader using your repository.
here is the code that i tried in colab:
!pip install gTTS
#!pip install PyPDF2
!pip install playsound
!pip install multilingual-pdf2text==1.1.0
!apt install tesseract-ocr
!apt install libtesseract-dev
!apt-get install poppler-utils
!apt-get install tesseract-ocr-ara
!apt-get install tesseract-ocr-ben
from multilingual_pdf2text.pdf2text import PDF2Text
from multilingual_pdf2text.models.document_model.document import Document
import logging
logging.basicConfig(level=logging.INFO)
def main():
## create document for extraction with configurations
pdf_document = Document(
document_path='/content/tafsir.pdf',
language='ben'
)
pdf2text = PDF2Text(document=pdf_document)
content = pdf2text.extract()
for page in content:
print(page['text'])
if __name__ == "__main__":
main()
it takes a lot of time and basically is stuck after printing this :
INFO:multilingual_pdf2text.doc2img.parse_document:Parsing document from pdf to image
INFO:multilingual_pdf2text.ocr.image_to_text:Extracting text from images via OCR
Thank you for making this awesome library.i am trying to make a bengali tafsir reader using your repository.
here is the code that i tried in colab:
it takes a lot of time and basically is stuck after printing this :
INFO:multilingual_pdf2text.doc2img.parse_document:Parsing document from pdf to image
INFO:multilingual_pdf2text.ocr.image_to_text:Extracting text from images via OCR
and after few minutes colab will crash,,seems like after exhausting all available ram of colab,the notebook gets crashed.
the pdf book that i am trying to read using this library is written in bangla and arabic.here is the link of that pdf book : https://i-onlinemedia.net/downloads/books/quran-tafsir/tafsir_ibn_kasir/Tafsir_Ibn_Kasir_Part-1-2-3.pdf
The text was updated successfully, but these errors were encountered: