<a href="https://colab.research.google.com/github/ogfunkycold/awslambda/blob/master/Web_Scraping_in_Python_for_AI_Fun_Profit_(Coordination_Tasks).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### AWS Lambda and Chalice

Can Be Used To Coordinate Work.  

* [AWS Lambda](https://aws.amazon.com/lambda/) allows a user to run functions in AWS.
* [Chalice](http://chalice.readthedocs.io/en/latest/) is a framework for building AWS Lambdas in Python.

*Some prerequisites to get started:*

1.  Must have AWS Account
2.  Need to have API Credentials
3.  Lambda role (that chalice creates) must have a policy associated with privilages necessary to call appropriate AWS Services...i.e. S3.




##### Getting IAM Credentials Setup

* There are detailed instructions on setting up [AWS Credentials Here](http://boto3.readthedocs.io/en/latest/guide/configuration.html)

* Details about [exporting AWS variables here on Windows and Linux](http://docs.aws.amazon.com/amazonswf/latest/awsrbflowguide/set-up-creds.html)


There are many ways configure credentials, but for users of [virtualenv](https://virtualenv.pypa.io/en/stable/)....
One trick is to put your AWS credentials into your local virtualenv in side /bin/activate
```bash
#Add AWS Keys
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=xxxxxxxx
AWS_SESSION_TOKEN=xxxxxxxx

#export Keys
export AWS_DEFAULT_REGION
export AWS_ACCESS_KEY_ID
export AWS_DEFAULT_REGION
```

##### Run chalice new-project command

In [0]:
!chalice

Usage: chalice [OPTIONS] COMMAND [ARGS]...

Options:
  --version             Show the version and exit.
  --project-dir TEXT    The project directory.  Defaults to CWD
  --debug / --no-debug  Print debug logs to stderr.
  --help                Show this message and exit.

Commands:
  delete
  deploy
  gen-policy
  generate-pipeline  Generate a cloudformation template for a...
  generate-sdk
  local
  logs
  new-project
  package
  url


##### Create a timed execution

```python
@app.schedule(Rate(1, unit=Rate.MINUTES))
def every_minute(event):
    """Scheduled event that runs every minute"""

    #do web scraping here
    print(event.to_dict())
```

#### DEMO:  Can test out code locally with chalice local

```bash
(.web_scraping_python) ➜  scrape-yahoo git:(master) ✗ chalice local
Serving on 127.0.0.1:8000
scrape-yahoo - INFO - / Route: for scrape-yahoo
127.0.0.1 - - [12/Dec/2017 03:25:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [12/Dec/2017 03:25:42] "GET /favicon.ico HTTP/1.1" 403 -
scrape-yahoo - INFO - / Route: for scrape-yahoo
127.0.0.1 - - [12/Dec/2017 03:25:45] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [12/Dec/2017 03:25:45] "GET /favicon.ico HTTP/1.1" 403 -
scrape-yahoo - INFO - /player_urls Route: for scrape-yahoo
scrape-yahoo - INFO - Fetching urls from https://sports.yahoo.com/nba/stats/
https://sports.yahoo.com/nba/players/4563/
https://sports.yahoo.com/nba/players/5185/
https://sports.yahoo.com/nba/players/3704/
https://sports.yahoo.com/nba/players/5012/
https://sports.yahoo.com/nba/players/4612/
https://sports.yahoo.com/nba/players/5015/
https://sports.yahoo.com/nba/players/4497/
https://sports.yahoo.com/nba/players/4720/
https://sports.yahoo.com/nba/players/3818/
https://sports.yahoo.com/nba/players/5432/
https://sports.yahoo.com/nba/players/5471/
https://sports.yahoo.com/nba/players/4244/
https://sports.yahoo.com/nba/players/5464/
https://sports.yahoo.com/nba/players/5294/
https://sports.yahoo.com/nba/players/5336/
https://sports.yahoo.com/nba/players/4390/
https://sports.yahoo.com/nba/players/4563/
https://sports.yahoo.com/nba/players/3704/
https://sports.yahoo.com/nba/players/5600/
https://sports.yahoo.com/nba/players/4624/
127.0.0.1 - - [12/Dec/2017 03:25:53] "GET /player_urls HTTP/1.1" 200 -
127.0.0.1 - - [12/Dec/2017 03:25:53] "GET /favicon.ico HTTP/1.1" 403 -
```
#### DEMO:  Quick Demo of Deploying Chalice App With Attached Lambdas
```bash
(.web_scraping_python) ➜  scrape-yahoo git:(master) ✗ chalice deploy 
Creating role: scrape-yahoo-dev
Creating deployment package.
Creating lambda function: scrape-yahoo-dev
Initiating first time deployment.
Deploying to API Gateway stage: api
https://bt98uzs1cc.execute-api.us-east-1.amazonaws.com/api/
```

#### DEMO Retrieve Links:

Using [http cli](https://github.com/jakubroztocil/httpie)

```bash
(.web_scraping_python) ➜  scrape-yahoo git:(master) ✗ http https://bt98uzs1cc.execute-api.us-east-1.amazonaws.com/api/player_urls
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 941
Content-Type: application/json
Date: Tue, 12 Dec 2017 11:48:41 GMT
Via: 1.1 ba90f9bd20de9ac04075a8309c165ab1.cloudfront.net (CloudFront)
X-Amz-Cf-Id: ViZswjo4UeHYwrc9e-5vMVTDhV_Ic0dhVIG0BrDdtYqd5KWcAuZKKQ==
X-Amzn-Trace-Id: sampled=0;root=1-5a2fc217-07cc12d50a4d38a59a688f5c
X-Cache: Miss from cloudfront
x-amzn-RequestId: 64f24fcd-df32-11e7-a81a-2b511652b4f6

{
    "nba_player_urls": [
        "https://sports.yahoo.com/nba/players/4563/", 
        "https://sports.yahoo.com/nba/players/5185/", 
        "https://sports.yahoo.com/nba/players/3704/", 
        "https://sports.yahoo.com/nba/players/5012/", 
        "https://sports.yahoo.com/nba/players/4612/", 
        "https://sports.yahoo.com/nba/players/5015/", 
        "https://sports.yahoo.com/nba/players/4497/", 
        "https://sports.yahoo.com/nba/players/4720/", 
        "https://sports.yahoo.com/nba/players/3818/", 
        "https://sports.yahoo.com/nba/players/5432/", 
        "https://sports.yahoo.com/nba/players/5471/", 
        "https://sports.yahoo.com/nba/players/4244/", 
        "https://sports.yahoo.com/nba/players/5464/", 
        "https://sports.yahoo.com/nba/players/5294/", 
        "https://sports.yahoo.com/nba/players/5336/", 
        "https://sports.yahoo.com/nba/players/4390/", 
        "https://sports.yahoo.com/nba/players/4563/", 
        "https://sports.yahoo.com/nba/players/3704/", 
        "https://sports.yahoo.com/nba/players/5600/", 
        "https://sports.yahoo.com/nba/players/4624/"
    ]
}
```


There are many other ways to invoke AWS Lambdas:

* SNS Lambda (Massive Parallization of work with "true" concurrency in Python vs GIL)
* Invoking from other [Lambdas or tools in Python](http://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html)
* Scheduled Lambdas, running on timers
* Events, like S3 Events

#### DEMO Commandline Tool:
```bash
(.web_scraping_python) ➜  web_scraping_python git:(master) ✗ ./wscli.py lambda               
Lambda Function invoked from cli:
{"message": "instantiate lambda client", "region_name": "us-east-1", "aws_service": "lambda"}
{"message": "Calling lambda function", "function_name": "scrape-yahoo-dev-return_player_urls", "aws_service": "lambda", "payload": "{\"cli\":\"invoke\"}"}
{"message": null, "ResponseMetadata": {"RequestId": "e86077de-df3b-11e7-8b67-13dd918ec87d", "HTTPStatusCode": 200, "HTTPHeaders": {"date": "Tue, 12 Dec 2017 12:56:47 GMT", "content-type": "application/json", "content-length": "941", "connection": "keep-alive", "x-amzn-requestid": "e86077de-df3b-11e7-8b67-13dd918ec87d", "x-amzn-remapped-content-length": "0", "x-amz-executed-version": "$LATEST", "x-amzn-trace-id": "root=1-5a2fd20d-3b57840f0e4fac761d8c30e6;sampled=0"}, "RetryAttempts": 0}, "StatusCode": 200, "ExecutedVersion": "$LATEST", "Payload": "<botocore.response.StreamingBody object at 0x10d564e10>", "function_name": "scrape-yahoo-dev-return_player_urls", "aws_service": "lambda", "payload": "{\"cli\":\"invoke\"}"}
Lambda Return Value Below:
{
    "nba_player_urls": [
        "https://sports.yahoo.com/nba/players/4563/",
        "https://sports.yahoo.com/nba/players/5185/",
        "https://sports.yahoo.com/nba/players/3704/",
        "https://sports.yahoo.com/nba/players/5012/",
        "https://sports.yahoo.com/nba/players/4612/",
        "https://sports.yahoo.com/nba/players/5015/",
        "https://sports.yahoo.com/nba/players/4497/",
        "https://sports.yahoo.com/nba/players/4720/",
        "https://sports.yahoo.com/nba/players/3818/",
        "https://sports.yahoo.com/nba/players/5432/",
        "https://sports.yahoo.com/nba/players/5471/",
        "https://sports.yahoo.com/nba/players/4244/",
        "https://sports.yahoo.com/nba/players/5294/",
        "https://sports.yahoo.com/nba/players/5464/",
        "https://sports.yahoo.com/nba/players/5336/",
        "https://sports.yahoo.com/nba/players/4390/",
        "https://sports.yahoo.com/nba/players/4563/",
        "https://sports.yahoo.com/nba/players/3704/",
        "https://sports.yahoo.com/nba/players/5600/",
        "https://sports.yahoo.com/nba/players/4624/"
    ]
}
```
#### DEMO Commandline Tool (Invoking Second Lambda Function):

```bash
(.web_scraping_python) ➜  web_scraping_python git:(master) ✗ ./wscli.py lambda --func=scrape-yahoo-dev-birthplace_from_urls --payload '{"url":["https://sports.yahoo.com/nba/players/4624/", "https://sports.yahoo.com/nba/players/5185/"]}'
Lambda Function invoked from cli:
{"message": "instantiate lambda client", "region_name": "us-east-1", "aws_service": "lambda"}
{"message": "Calling lambda function", "function_name": "scrape-yahoo-dev-birthplace_from_urls", "aws_service": "lambda", "payload": "{\"url\":[\"https://sports.yahoo.com/nba/players/4624/\", \"https://sports.yahoo.com/nba/players/5185/\"]}"}
{"message": null, "ResponseMetadata": {"RequestId": "a6049115-df59-11e7-935d-bb1de9c0649d", "HTTPStatusCode": 200, "HTTPHeaders": {"date": "Tue, 12 Dec 2017 16:29:43 GMT", "content-type": "application/json", "content-length": "118", "connection": "keep-alive", "x-amzn-requestid": "a6049115-df59-11e7-935d-bb1de9c0649d", "x-amzn-remapped-content-length": "0", "x-amz-executed-version": "$LATEST", "x-amzn-trace-id": "root=1-5a3003f2-2583679b2456022568ed0682;sampled=0"}, "RetryAttempts": 0}, "StatusCode": 200, "ExecutedVersion": "$LATEST", "Payload": "<botocore.response.StreamingBody object at 0x10ee37dd8>", "function_name": "scrape-yahoo-dev-birthplace_from_urls", "aws_service": "lambda", "payload": "{\"url\":[\"https://sports.yahoo.com/nba/players/4624/\", \"https://sports.yahoo.com/nba/players/5185/\"]}"}
Lambda Return Value Below:
{
    "https://sports.yahoo.com/nba/players/4624/": "Indianapolis",
    "https://sports.yahoo.com/nba/players/5185/": "Athens"
}
```


#### AWS Step Functions

Can coordinate multiple lambdas to create a pipeline. Can be triggered via "timed" lambda, called

```json
{
    "Comment": "Fetch Player Urls",
    "StartAt": "FetchUrls",
    "States": {
      "FetchUrls": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:us-east-1:561744971673:function:scrape-yahoo-dev-return_player_urls",
        "Next": "FetchBirthplaces"
      },
      "FetchBirthplaces": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:us-east-1:561744971673:function:scrape-yahoo-dev-birthplace_from_urls",
        "Next": "Finish"
      },
        "Finish": {
        "Type": "Pass",
        "Result": "Finished",
        "End": true
      }
    } 
}
```

#### Demo:  AWS Step Functions
