Linux Hacks

NB: I keep doing stuffs and post the code snippt here, if I feel the process usefull. packages may expire or developers may update the methood. So, try; if failed, debug. Thanks!

1. OCR from PDF using TIFF2TXT

Install Imagemagic, tesseract.

pip install imagemagic

Run this to convet the pdfs into .tiff file to keep the resulation intact.

convert -density 300 *.pdf -depth 8 -strip -background white -alpha off 2%5d.tiff

Extract the texts into text file.

tesseract filename.tiff eng > outtext //single file

for i in *.tif ; do tesseract $i stdout >> outtext;  done; //multiple files

2. Clone an Entire Website using Wget

wget --mirror --convert-links --wait=2 https://example.com/

3. Concat PDFs in a folder

sudo apt install pdftk

pdftk *.pdf cat output output.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux Hacks.md

Linux Hacks.md

Linux Hacks

Files

Linux Hacks.md

Latest commit

History

Linux Hacks.md

File metadata and controls

Linux Hacks