Skip to content

Latest commit

 

History

History
58 lines (48 loc) · 1.15 KB

Linux Hacks.md

File metadata and controls

58 lines (48 loc) · 1.15 KB

Linux Hacks

NB: I keep doing stuffs and post the code snippt here, if I feel the process usefull. packages may expire or developers may update the methood. So, try; if failed, debug. Thanks!

1. OCR from PDF using TIFF2TXT
  1. Install Imagemagic, tesseract.
pip install imagemagic
  1. Run this to convet the pdfs into .tiff file to keep the resulation intact.
convert -density 300 *.pdf -depth 8 -strip -background white -alpha off 2%5d.tiff
  1. Extract the texts into text file.
tesseract filename.tiff eng > outtext //single file
for i in *.tif ; do tesseract $i stdout >> outtext;  done; //multiple files

2. Clone an Entire Website using Wget
wget --mirror --convert-links --wait=2 https://example.com/

3. Concat PDFs in a folder
sudo apt install pdftk
pdftk *.pdf cat output output.pdf