Skip to content

Exercise: Shell scripts

Saagar Deshpande edited this page Sep 25, 2013 · 4 revisions

Building our Amazon image web scraper

Let's build our scraper that we introduced at our very first meeting! Recall (this command is also located in scraper/

cutpoint=$(echo "" | wc -m | grep '[0-9]\{1,\}' --only-matching); mkdir -p images; curl --silent\&field-keywords\=ocaml | grep '[0-9A-Za-z\.\_\,\%\\-]\{0,\}.jpg' --only-matching | while read image; do suffix=$(echo $image | cut -c $cutpoint-); wget $image -O images/$suffix; done

Let's break this down piece-by-piece and rewrite this more elegantly as a shell script!

Finding sources to scrape

How did I know to scrape images from Go to, do a search for ocaml, then right-click a result image and click "Inspect Element":

inspect element

This should open up the web inspector with the corresponding html highlighted.


If you look more carefully, you'll see the source of the image:


It's! If you poke around at the other images, you'll see that they all start with this same prefix as well.

Build the scraper

Make sure you have the bootcamp code repo (same as the scavenger hunt exercise).

git clone git://
cd bootcamp-unix/exercise-scraper

Open up the file in your favorite text editor, and then follow the instructions in the file to finish building the scraper.

When you think you're done, run the scraper ./, and this should create an images directory will the images! To remove the images:

rm -rf images

The solution is located in ./ To see the original scraper from the first week and the prettified version of it:

cd ../scraper

There should be two files (the original version in the demo) and (a more readable version of it).

Finish bootcamp

Go back to the main page.