Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

GetShorty

by Jake Kara jake@jakekara.com

"Brute force" crawler of url shortening services.

  • GetShorty.py is the library;
  • brute.py is a sample command line tool using GetShorty.py

Basic crawling

To start crawling bitly two-character URLs (and storing in out.json cache file):

 $ python brute.py

If you want to crawl longer URLs, the code can be modified. I didn't bother making it a command line argument.

Note that if you have the out.json file from this repo in the local directory, it won't need to crawl anything, and it'll just tell you that each URL is cached and then exit.

TSV output

To output the cached URLs to in TSV format (like the one in sample_output folder of this repo), use:

   $ python brute.py --csv

To output the TSV to a file:

  $ python brute.py --csv out-file.tsv

Yes, I called it --csv, but it's really in TSV format (tab-separated, not commas). You'll get over it.

Why?

This is just a "proof of concept" of an idea that I've read about a lot lately. I was just curious about how difficult it would be. No real reason.

About

POC for "brute force" crawling url shortening services (bitly by default, but customizable)

Resources

License

Releases

No releases published

Packages

No packages published

Languages