# CTRL-F
This notebook uses EDGI's `wayback` Python programming [script](https://www.github.com/edgi-govdata-archiving/wayback) to access snapshots of webpages from the Internet Archive's [Wayback Machine](https://www.archive.org) and count the mention of keywords or terms on these pages. It's like doing a ctrl-f search, but on archives of potentially several pages and for several terms!

## First, we have to load some extra code to help us out
Run the cell below by clicking the "Play" button.

In [None]:
# Requirements
import ipywidgets as widgets
from datetime import datetime

# EDGI's web monitoring scripts
!pip install wayback &>/dev/null;
from wayback import WaybackClient

## Next, we need to know what term(s) you want to search for, on what page(s), and between what dates.
We will find the most recent copy of each page (if available) during the timeframe you specify and count the use of the keyterms on it.

In [None]:
# Parameters
term_widget = widgets.Text(
    value = 'Climate Change, Climate, Energy Independence',
    placeholder='Climate Change, Climate, Energy Independence',
    disabled=False
)
print("What terms do you want to search for? Separate these by commas, like so: Climate Change, Climate, Energy Independence")
display(term_widget)

page_widget = widgets.Text(
    value = 'epa.gov, epa.gov/climatechange',
    placeholder='epa.gov, epa.gov/climatechange',
    disabled=False
)
print("On what pages do you want to search for these terms? Separate these by commas and a space, like so: epa.gov, epa.gov/climatechange")
display(page_widget)

print("We need a timeframe to search for these terms on these pages. We will start with the most recent date you pick ('Start Date') and work backwards until we find a Wayback Machine snapshot or hit the 'End Date'")

start_date_widget = widgets.DatePicker(
    disabled=False
)
print("What is the most recent date in your search ('Start Date')? For instance: July 1 2016")
display(start_date_widget)


end_date_widget = widgets.DatePicker(
    disabled=False
)
print("What is the earliest date in your search ('End Date')? For instance: January 1 2016")
display(end_date_widget)


## Almost there!
We just need to process and confirm the input from the forms above.

In [None]:
# Getting set up to count
terms = term_widget.value.lower().split(", ") # The terms we'll look at
pages = page_widget.value.lower().split(", ") # The pages we want to look at
start_date = start_date_widget.value # The most recent day to search
end_date = end_date_widget.value # The date to stop search

print("Looking for "+str(len(terms))+" terms on "+str(len(pages))+" pages between "+ str(end_date) + " and " + str(start_date) + ", starting backwards from "+str(start_date)+".") 

## Go get snapshots of the page(s) and count the term(s)
We'll go through each page, counting eaching term, and tell you the results as we go!

In [None]:
for page in pages:
    try:
        with WaybackClient() as client:
            dump = client.search(page, from_date=end_date, to_date=start_date)
            versions = list(dump)
            for n, version in enumerate(reversed(versions)): # For each version in all the snapshots, starting from the most recent
                if version.status_code == 200 or version.status_code == '-': # If the IA snapshot was viable...
                    url=version.raw_url
                    response = client.get_memento(url)
                    content = response.content.decode()
                    for p, t in enumerate(terms):
                        page_sum = content.lower().count(t) # Count each term on this page
                        print("Counted " + t + " " + str(page_sum) + " times on " + url)
                    break
            else: 
                # If we've gone through all the snapshots and there's not one we can get...
                print("There's no snapshot we can decode for", page)
    except:
        print("There's either no snapshot we can decode for, or something else has happened on our end...", page)