# Recycling Project Interview Assessment



# Part 1b.
Explain your HTTP verb choice in 1 to 2 sentences for each of your endpoints.
    
    - Endpoint 1: ('/') 
        - I used this home page to query all records on get request. 
        - I also used a post request here to delete individual record by ID; I put the delete request here just to prevent creating another function and minimizing code
        
    - Endpoint 2: ('/new-pet') 
        - I used this endpoint to post form data to create new pet entry as name would imply
    
    - Endpoint 3: ('/pet')
        - I used get request here to query individual pet by ID
        - I used post request here to update individual pet by ID 
    
# Part 2a.
Write a docker file for your flask app (including the sqlite database). The docker run command will be docker run -p 8881:8881

    - Created working Dockerfile
    - Created working docker-compose.yml

    
    
# Part 2b.
Explain how you would test your backend container locally. Please specify any tools or libraries that you think are relevant, as well as the specific tests that you would perform.

        - You can create *.yml files to run with docker to test specified commands and outputs
        - Create Bash Scripts to test container for: 
            - Networking (is sockets available outside container?)
            - File System (file owners and permissions)
        - Container Structure Tets (CST) - Open source tool developed by Google. Contains predefined set of tests
        - Manual user tests
        
        


# Part 3
Imagine that sites contains a list of thousands of URLs. Re-write the following code to be more performant.
    
    - Solution 1: Multi-Threading with Thread Pool Executer
        - Uses session objection connection pooling to reuse connection if targeting a site multiple times
        - Uses ThreadPoolExecuter to create multi thread requests
        
    -Solution 2: Asynchronous 
        - I used asyncio package to create the optimal time efficient solution
        - Asyncio uses a single threaded event loop which is even faster than multi-threading 
        - Aiohttp package is the asynchronous HTTP Client/Server for asyncio 
    

In [32]:
# Solution 1: Multi Threading
import requests
from requests.sessions import Session
from concurrent.futures import ThreadPoolExecutor
from threading import Thread,local
import time

sites = [
    "http://www.google.com",
    "http://www.github.com",
    "http://www.youtube.com",
    "http://www.facebook.com",
    "http://www.yahoo.com"
]*3

results = []
thread_local = local()


def get_session() -> Session:
    if not hasattr(thread_local,'session'):
        thread_local.session = requests.Session()
    return thread_local.session

def scrape(site:str):
    session = get_session()
    with session.get(site) as response:
        print(f'Read {len(response.content)} from {site}')
        results.append(response)


def scrape_sites(sites):
    with ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(scrape,sites)



start = time.time()
scrape_sites(sites)
end = time.time()

print(f'download {len(sites)} links in {end - start} seconds')

Read 14120 from http://www.google.com
Read 80555 from http://www.facebook.com
Read 625920 from http://www.youtube.com
Read 292620 from http://www.github.com
Read 14121 from http://www.google.com
Read 719202 from http://www.yahoo.com
Read 80551 from http://www.facebook.com
Read 627284 from http://www.youtube.com
Read 292620 from http://www.github.com
Read 14123 from http://www.google.com
Read 80552 from http://www.facebook.com
Read 292620 from http://www.github.com
Read 719185 from http://www.yahoo.com
Read 719927 from http://www.yahoo.com
Read 627961 from http://www.youtube.com
download 15 links in 1.9816310405731201 seconds


Solution 1 (multithreading):

    completed in 1.98seconds which is about 2.5x faster than original code


In [25]:
# Solution 2: Asyncio
import asyncio
import time 
import aiohttp
from aiohttp.client import ClientSession

sites = [
    "http://www.google.com",
    "http://www.github.com",
    "http://www.youtube.com",
    "http://www.facebook.com",
    "http://www.yahoo.com"
]*3

results = []

async def scrape(site:str,session:ClientSession):
    async with session.get(site) as response:
        result = await response.text()
        print(f'Read {len(result)} from {site}')
        return result

async def scrape_sites(sites):
    connection = aiohttp.TCPConnector(limit=7)
    async with aiohttp.ClientSession(connector=connection) as session:
        for site in sites:
            task = asyncio.ensure_future(scrape(site=site,session=session))
            results.append(task)
        await asyncio.gather(*results,return_exceptions=True) # the await must be nest inside of the session




start = time.time()
asyncio.run(scrape_sites(sites))
end = time.time()

print(f'download {len(sites)} links in {end - start} seconds')

Read 14139 from http://www.google.com
Read 14141 from http://www.google.com
Read 14098 from http://www.google.com
Read 292549 from http://www.github.com
Read 292550 from http://www.github.com
Read 80514 from http://www.facebook.com
Read 629276 from http://www.youtube.com
Read 292549 from http://www.github.com
Read 711376 from http://www.yahoo.com
Read 637227 from http://www.youtube.com
Read 80509 from http://www.facebook.com
Read 711304 from http://www.yahoo.com
Read 620382 from http://www.youtube.com
Read 80509 from http://www.facebook.com
Read 711788 from http://www.yahoo.com
download 15 links in 1.1307389736175537 seconds


Solution 2 (optimal):

    The optimized asynchronous solution imporoved run-time from 5.18seconds to 1.13 seconds
    This solution is about ~4.5x faster than the original base case


# Part 4
Describe what happens in the steps between merging a pull request and new code running in production. Feel free to reference any relevant technologies or tools that you've used for things such as CICD pipelines, container registries, hosting, etc.

- Hosting: Configure AWS CodePipeline to connect to github via webhooks for continuous integration and configure pipeline with AWS CodeDeploy to automatically update and deploy 
    - I've used Heroku to automatically deploy on git merge
    - I've used Amplify & GCP to deploy from command line or github push
    
- CI/CD Pipeline:
    - Jenkins: Build, Deploy, Test, Release
        - Configure Jenkins to recieve new code merges via webhooks
        - Configuer to build and deploy code and run unit tests and analysis and to generate health reports
        - If no errors, Jenkins automates code to be dockerized by creating image and pushing to container registry
        - Next step is further testing (database migration, container deployment, Security & Load testing)
        - IFF, all steps above are passed, the code should be sent to Production Deployment
    

