# SI 330: Homework 4: APIs on AWS


## Due: Friday, February 9, 2018,  11:59:00pm

### Submission instructions</font>
After completing this homework, you will turn in two files via Canvas ->  Assignments -> HW 4:
Your Notebook, named si330-hw4-YOUR_UNIQUE_NAME.ipynb and
the HTML file, named si330-hw4-YOUR_UNIQUE_NAME.html.

### Name:  YOUR NAME GOES HERE
### Uniqname: YOUR UNIQNAME GOES HERE
### People you worked with: [if you didn't work with anyone else write "I worked by myself" here].

## Top-Level Goal
To create a microservice that returns the counts of all bigrams in a text passage.



## Learning Objectives
After completing this Lab, you should know how to:
* create an AWS Lambda function that takes a string and returns the counts of all bigrams in that text
* write an AWS API Gateway integration which allows both GET and POST requests to access an AWS Lambda
* write documenation to the microservice that you've created

### Note: See end of notebook for notes about going "Above and Beyond"

### Outline of Steps For Analysis
Here's an overview of the steps that you'll need to do to complete this lab.
2. Upload data to an S3 bucket
1. Create an AWS Lambda function that normalizes, tokenizes, and creates and counts bigrams from text, both via a POST request with the text and via a GET request to a URL that returns the text (e.g. an S3 bucket)
3. Create a python code block in this notebook to demonstrate the functionality of your microservice

Each of these steps is detailed below.

## Step 1: Upload data to an S3 bucket
To get ready to test the POST functionality of the code you generate in the next step, you should upload a text file that is **500 or fewer lines** to an S3 bucket.  See the description of CORS for an explanation of why we want to put the data in the same domain (amazonaws.com) as the Lambda.

Follow the same approach that we used in the lab to upload a small text file to your S3 bucket, ensuring that the permissions are set to allow public access

### <font color="magenta">Q1: Enter the URL of your text file

## Grading Rubric

5 points for putting an url here (Do not deduct points, if they are getting mostly full otherwise - or where it seem might an oversight)

Put your S3 text file's URL here

## Step 2: Create an AWS Lambda function that normalizes, tokenizes, and creates and counts bigrams from text

Similar to what we did in the lab, you're going to create a microservice that consists of two parts: an AWS Lambda and an API Gateway.  You can use exactly the same technique that we did in the lab to get started.

You will need to modify the code in the Lambda to handle two types of requests:
1. A GET request with a queryStringParameter of url=http://some.url.goes.here/text.txt, which specifies the location of the text to be processed and
2. A POST request with the text to be processed included as the "text" value in the body payload.

### The following code block is a reasonable starting point for creating your Lambda.  Note that this code should not be run in this notebook but rather serve as the starting point for your work in the Lambda editor.

**NOTE** Please see https://stackoverflow.com/questions/21844546/forming-bigrams-of-words-in-list-of-sentences-with-python for hints about how to create bigrams without NLTK.

## Grading Rubric

30 points
* If the lambda function is handling both GET and POST correctly.
* The URL is being dynamically retrieved for GET **(10 pts)**
* The text data is correctly being retrieved in the GET method **(10 pts)**
* The text is normalized **(2 points)**
* The text is tokenized (this can be tokenized by words or characters - but they should provide an explanation; Some people may have just copied the code from StackOverflow which would return bigrams of characters) **(5 points)**
* Counts are correct (check using some other text file) **(3 points)**

In [None]:
"""
PUT SOME DOCUMENTATION HERE
"""
import json
import re
from botocore.vendored import requests
from collections import defaultdict

def get_bigrams(text):
    sent_tokens = re.split(r'[.?!]', text.lower())
    bigrams = [b for l in sent_tokens for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
    bigram_counter = defaultdict(int)
    for bigram in bigrams:
        bigram_counter[str(bigram)] += 1
    return bigram_counter
    

def lambda_handler(event, context):
    method = event['httpMethod']
    text = ""
    d = {"text": ""}
    # Handle GET method
    if method == 'GET':
        params = event['queryStringParameters']
        if params:
            url = params['url']
            text = requests.get(url).text
            d['text'] = get_bigrams(text)
    if method == 'POST':
        body = json.loads(event['body'])
        if 'text' in body:
            d['text'] = get_bigrams(body['text'])
            
    return { 
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps(d),
   }

### <font color="magenta">Q2a: Enter the URL of your Lambda

Put your Lambda's URL here

### <font color="magenta">Q2b: Copy your final Lambda code into the following code block (but do not run it here)

In [1]:
# Copy your lambda code here

## Step 3: Demonstrate the GET and POST functionality of your Lambda

### <font color="magenta">Q3: Create a code block that uses `requests` to demonstrate the functionality of your Lambda.  You can modify the template below or create your own.

## Grading Rubric

5 points
* There's not much to do here. If they put in some effort to print out properly - give them 5 points.

In [17]:
import requests
import json

lambdaURL = 'https://fovsbrjx2i.execute-api.us-east-2.amazonaws.com/prod/bigrammer' # change this URL
textURL = 'https://s3.us-east-2.amazonaws.com/abhsarma-week4/dickens-totc.txt' # change this URL

# Demonstrate the GET functionality by passing the URL of your text file in S3 to your Lambda as a GET request
response = requests.get(lambdaURL + '?url=' + textURL)
bigrams = json.loads(response.text)["text"]
for k, v in bigrams.items():
    print(k, v)

('it', 'was') 10
('was', 'the') 10
('the', 'best') 1
('best', 'of') 1
('of', 'times,') 2
('the', 'worst') 1
('worst', 'of') 1
('the', 'age') 2
('age', 'of') 2
('of', 'wisdom,') 1
('of', 'foolishness,') 1
('the', 'epoch') 2
('epoch', 'of') 2
('of', 'belief,') 1
('of', 'incredulity,') 1
('the', 'season') 2
('season', 'of') 2
('of', 'light,') 1
('of', 'darkness,') 1
('the', 'spring') 1
('spring', 'of') 1
('of', 'hope,') 1
('the', 'winter') 1
('winter', 'of') 1
('of', 'despair,') 1
('we', 'had') 2
('had', 'everything') 1
('everything', 'before') 1
('before', 'us,') 2
('had', 'nothing') 1
('nothing', 'before') 1
('we', 'were') 2
('were', 'all') 2
('all', 'going') 2
('going', 'direct') 2
('direct', 'to') 1
('to', 'heaven,') 1
('direct', 'the') 1
('the', 'other') 1
('other', 'way--') 1
('in', 'short,') 1
('short,', 'the') 1
('the', 'period') 1
('period', 'was') 1
('was', 'so') 1
('so', 'far') 1
('far', 'like') 1
('like', 'the') 1
('the', 'present') 1
('present', 'period,') 1
('period,', 'that

In [23]:
# Demonstrate the POST functionality by passing the text as a JSON parameter to requests.post()
# note that we retrieve the contents of the S3 bucket using requests.get()
s3text = requests.get(textURL).text # get the text from the bucket
d = {"text": s3text}
response = requests.post(lambdaURL, json = d)
bigrams = json.loads(response.text)['text']
for k, v in bigrams.items():
    print(k, v)

('it', 'was') 10
('was', 'the') 10
('the', 'best') 1
('best', 'of') 1
('of', 'times,') 2
('the', 'worst') 1
('worst', 'of') 1
('the', 'age') 2
('age', 'of') 2
('of', 'wisdom,') 1
('of', 'foolishness,') 1
('the', 'epoch') 2
('epoch', 'of') 2
('of', 'belief,') 1
('of', 'incredulity,') 1
('the', 'season') 2
('season', 'of') 2
('of', 'light,') 1
('of', 'darkness,') 1
('the', 'spring') 1
('spring', 'of') 1
('of', 'hope,') 1
('the', 'winter') 1
('winter', 'of') 1
('of', 'despair,') 1
('we', 'had') 2
('had', 'everything') 1
('everything', 'before') 1
('before', 'us,') 2
('had', 'nothing') 1
('nothing', 'before') 1
('we', 'were') 2
('were', 'all') 2
('all', 'going') 2
('going', 'direct') 2
('direct', 'to') 1
('to', 'heaven,') 1
('direct', 'the') 1
('the', 'other') 1
('other', 'way--') 1
('in', 'short,') 1
('short,', 'the') 1
('the', 'period') 1
('period', 'was') 1
('was', 'so') 1
('so', 'far') 1
('far', 'like') 1
('like', 'the') 1
('the', 'present') 1
('present', 'period,') 1
('period,', 'that

## Save your notebook, download it as HTML and submit both the .ipynb and .html files to Canvas

## Notes about going "Above and Beyond"

There are ample opportunities for extending this homework assignment.  You might, for example, decide to break the microservice into three separate ones (normalizing, tokenizing, and creating bigrams).  Alternatively, you might invest time into getting NLTK data into Lambda so you can use its functionality (see https://stackoverflow.com/questions/42394335/paths-in-aws-lambda-with-python-nltk).  Another interesting investigation might be to use the addition of a data file to an S3 bucket as a trigger to run the bigram analysis, perhaps writing the results to another (public) bucket.

**IF YOU CHOOSE TO GO ABOVE AND BEYOND, YOU _MUST_ CHANGE THE FOLLOWING MARKDOWN BLOCK**

## Above and Beyond

Indicate here why you believe that your work should be considered "above and beyond".