Skip to content

Topic wise PDFs of Geeks for Geeks articles. (Last updated in October 2018)

Notifications You must be signed in to change notification settings

stjordanis/geeksforgeeks.pdf

 
 

Repository files navigation

Geeks for Geeks PDFs

Table of Contents of the Dynamic Programming Book.

Download the PDFs from the releases page.

I started in 2015 from @gnijuohz's repo, but now (in 2018) I've re-written pretty much every part of the process.

Dependencies

  • docopt

    • Basic CLI in scripts
  • requests & requests_cache

    • To download pages and cache the result locally
  • lxml

    • Cleaning of the downloaded pages
  • pandoc & xelatex

    • Convert the cleaned pages to PDF

Running the code

  1. First, find out a "topic url" for what you want to download. Eg:

    • https://www.geeksforgeeks.org/tag/samsung/
    • https://www.geeksforgeeks.org/category/dynamic-programming/
  2. Create a JSON containing links of all posts on that topic

    • python3.6 list_links.py https://www.geeksforgeeks.org/tag/samsung/

    • This JSON can now be edited by hand, to remove some links, re-order them etc.

  3. Now fetch the actual posts

    • python3.6 download_html.py JSON/Samsung.json
  4. Finally, convert the HTML to a PDF using Pandoc

    • python3.6 html_to_pdf.py HTML/Samsung.html

Things will work only if you're really lucky. This project has taught me how fragile my HTML to PDF pipeline really is. There's just too many things that can go wrong.

What could go wrong

  • The PDF engine that pandoc calls may err!
    • In which case, you should convert the html to tex
    • Then run pandoc on the tex file in verbose mode
    • and manually fix the tex file

Topic URLs

List of Topic URLs that have I've fetched. You can download these from the releases page.

Algorithms

Data Strucutres

Companies

About

Topic wise PDFs of Geeks for Geeks articles. (Last updated in October 2018)

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 59.4%
  • TeX 23.0%
  • Shell 17.6%