### **Asynchronous Web Scraping**

### Blocking vs Non-Blocking code

#### 1. Blocking Code Example

In blocking code, each task waits for the previous one to finish before starting.

```python
# Blocking code example with multiple tasks
import time

# Simulate a blocking task
def read_file(file_name):
    print(f"Reading file: {file_name}")
    time.sleep(2)  # Simulate a time delay for reading a file (2 seconds)
    print(f"Finished reading {file_name}")

# Main execution
def main():
    start_time = time.time()
    
    # Simulate multiple blocking tasks
    read_file("file1.txt")
    read_file("file2.txt")
    read_file("file3.txt")
    
    print(f"All tasks finished in {time.time() - start_time} seconds.")

main()
```
---

#### 2. Non-blocking Code Example

In non-blocking code, the program can start a new task without waiting for the previous one to finish. This is achieved using asynchronous programming.

```python
import aiofiles
import asyncio

# Non-blocking task using asyncio
async def read_file(file_name):
    print(f"Reading file: {file_name}")
    await asyncio.sleep(2)  # Simulate a time delay without blocking
    print(f"Finished reading {file_name}")

# Main execution
async def main():
    start_time = asyncio.get_event_loop().time()
    
    # Start multiple tasks concurrently
    tasks = [
        read_file("file1.txt"),
        read_file("file2.txt"),
        read_file("file3.txt")
    ]
    
    # Run all tasks concurrently
    await asyncio.gather(*tasks)
    
    print(f"All tasks finished in {asyncio.get_event_loop().time() - start_time} seconds.")

# Run the async program
asyncio.run(main())
```
---
**Situation:**  
You have 10 kilograms of dirty laundry, and you went to a self-service laundry. The machines can each hold up to 3 kilograms of clothes, and each machine takes 1 hour to finish a load. If you use just one machine, it will take 4 hours to finish all your laundry. How can you speed up the process?

**Situation:** You have a date in one hour, and you want to impress your crush with a fresh haircut. However, you also need to do laundry because you don’t have any clean clothes to wear for work tomorrow. How would you handle this situation?

---

**Asynchronous web scraping** allows you to send multiple HTTP requests concurrently without blocking the execution of the program. This is ideal for I/O-bound tasks like scraping many web pages, as it enables the program to process multiple requests at once, reducing total scraping time.

### Key Concepts

- **Event Loop**: Manages asynchronous tasks, allowing one task to run while others wait.
- **Non-blocking I/O**: HTTP requests don't block the program; it continues to send more requests or process other tasks while waiting for responses.
- **Coroutines**: Functions defined with `async def` that can be paused and resumed.
- **`await`**: Pauses a coroutine until a result is available, such as the response from an HTTP request.

### Example Code

```python
import aiohttp
import asyncio

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def main():
    urls = ["http://example.com/page1", "http://example.com/page2"]
    tasks = [fetch(url) for url in urls]
    results = await asyncio.gather(*tasks)
    for result in results:
        print(f"Fetched {len(result)} characters")

asyncio.run(main())
```

### Advantages

1. **Efficiency**: Scrapes multiple pages concurrently, reducing total time.
2. **Low Resource Usage**: Uses a single thread, consuming less memory and CPU.
3. **Scalability**: Handles large volumes of data without the overhead of multithreading.

### Considerations

- **Rate Limiting**: Respect website rate limits to avoid getting blocked.
- **Error Handling**: Ensure proper handling for failed requests and timeouts.

### Asyncio

In [9]:
"""  
Objective: Defining an Async Function
"""
import asyncio
import time


def greet():
    time.sleep(1)
    print("Hello, World!")

async def async_greet():
    # await asyncio.sleep(1)
    print("Hello, World!")

print(f"type(greet) : {type(greet)}")
# TODO: Print the type of async_greet()
print(f"type(async_greet) : {type(async_greet)}")

print(f"type(greet()) : {type(greet())}")
# TODO: Print the type of returned value of async_greet()
print(type(async_greet()))

# RuntimeWarning: coroutine 'async_greet' was never awaited
#   print(type(async_greet()))
# RuntimeWarning: Enable tracemalloc to get the object allocation traceback ???


type(greet) : <class 'function'>
type(async_greet) : <class 'function'>
Hello, World!
type(greet()) : <class 'NoneType'>
<class 'coroutine'>


  print(type(async_greet()))


In [4]:
"""  
Objective: Executing async function
"""
import asyncio
import nest_asyncio
import tracemalloc

# Enable tracemalloc for better debugging
tracemalloc.start()

# Enable nested event loops
nest_asyncio.apply()

async def add_numbers(a, b):
    return a + b

add_numbers(1, 2)
asyncio.run(add_numbers(1, 2))

# TODO: Try to execute add_numbers like a normal function
result1 = add_numbers(1, 2)
print(f"Normal function call result: {result1}")
# TODO: Try to execute add_numbers using asyncio.run()
result2 = asyncio.run(add_numbers(1, 2))
print(f"asyncio.run() result: {result2}")

#Notes : This will allow asyncio.run() to work properly in Jupyter notebooks by enabling nested event loops.

Normal function call result: <coroutine object add_numbers at 0x000001E250F5DA80>
asyncio.run() result: 3


  add_numbers(1, 2)
Object allocated at (most recent call last):
  File "C:\Users\rudic\AppData\Local\Temp\ipykernel_22400\327975533.py", lineno 17
    add_numbers(1, 2)
  result1 = add_numbers(1, 2)


In [6]:
"""  
Objective: Executing async function
"""
import asyncio


async def add_numbers(a, b):
    return a + b

def main():
    result = None
    # TODO: Change result value by executing add_numbers
    result = asyncio.run(add_numbers(1, 2))

    print(result)

# TODO: Execute main() function
main()

3


In [7]:
"""  
Objective: Executing async function
"""
import asyncio


async def add_numbers(a, b):
    return a + b

async def main():
    result = None
    # TODO: Change result value by await add_numbers
    result = await add_numbers(1, 2)

    print(result)

# TODO: Execute main() function
asyncio.run(main())

3


In [8]:
"""  
Objective: Error Handling in Async Functions
"""
async def divide_numbers(a, b):
    if b == 0:
        raise ValueError("Cannot divide by zero!")
    return a / b

# TODO: Create main() function to execute divide_numbers(10, 0) asynchronously
# TODO: Add error handling if b is zero
# TODO: Execute main() function

async def main():
    try:
        result = await divide_numbers(10, 0)
        print(f"Result: {result}")
    except ValueError as e:
        print(f"Error: {e}")

asyncio.run(main())

Error: Cannot divide by zero!


In [None]:
"""  
Objective: Running Multiple Tasks
"""
import asyncio


async def task_1():
    print("Task 1 started...")
    await asyncio.sleep(2)
    print("Task 1 completed!")
    return "Result from Task 1"

# TODO: Create 2 more function like above
# TODO: Create main function to execute all task function asynchronously
# TODO: Execute main function
# TODO: Analyze the flow execution

In [None]:
"""  
Objective: Running Multiple Tasks Concurrently
"""
# TODO: Change the main function using asyncio.gather()

In [None]:
"""  
Objective: Another way to gather many tasks at once
Using *args
"""
# TODO: Improve the previous code using *args instead of calling one by one function

In [None]:
"""  
Objective: Using asyncio.create_task()
"""
import asyncio


async def main():
    # Create tasks
    t1 = asyncio.create_task(task_1())
    # TODO: Create task for Task 2
    # TODO: Create task for Task 3
    
    await t1  # Wait for Task 1 to finish
    # TODO: Wait for Task 2 to finish
    # TODO: Wait for Task 3 to finish

asyncio.run(main())

In [None]:
"""  
Objective: Combining create_task() and gather()
"""
import asyncio


# Simulated web scraping task
async def scrape_page(url):
    print(f"Starting {url}")
    await asyncio.sleep(1)  # Simulate network delay
    print(f"Scraped: {url}")
    return {"url": url, "content": f"Dummy content from {url}"}

# Main function
async def main():
    urls = [
        "http://example.com",
        "http://example.org",
        "http://example.net"
    ]
    
    # Simulate creating tasks for scraping pages
    tasks = []
    # TODO: Create a list of task by combining scrape_page and urls list
    
    # Wait for all tasks to complete and gather results
    results = await asyncio.gather(*tasks)
    # TODO: print the result here

# TODO: Run the main function

In [None]:
"""  
Objective: Using asyncio.as_completed for Immediate Results
"""
# TODO: Import necessary package
# TODO: Create a task function that accept 2 parameters: task name, delay time
# TODO: Create a main function to create a list of task
# TODO: Loop coroutine object inside asyncio.as_completed(list of task)
# TODO: Wait the coroutine object to get the result and print it
# TODO: Execute the main function

In [None]:
"""  
Objective: Simulating web scraping process
"""
import asyncio

async def fetch_data(url):
    await asyncio.sleep(1)  # Simulate fetch
    print(f"Fetched: {url}")
    return f"Data from {url}"

async def process_data(data):
    await asyncio.sleep(0.5)  # Simulate processing
    print(f"Processed: {data}")
    return f"Processed {data}"

async def save_data(data):
    await asyncio.sleep(0.5)  # Simulate saving
    print(f"Saved: {data}")

async def main():
    urls = ["http://example.com", "http://example.org", "http://example.net"]
    
    # TODO: Fetch all data concurrently
    # TODO: Process data concurrently
    # TODO: Save all data concurrently

# TODO: Run the workflow


### HTTPX

httpx is a modern Python library designed for making HTTP requests. It supports both synchronous and asynchronous programming and offers advanced features like HTTP/2 and connection pooling. It’s often described as an async-friendly alternative to requests with a similar API.

---

```bash
pip install httpx
```
---

**Basic Usage**
**Synchronous Request**
```python
import httpx

response = httpx.get('https://example.com')
print(response.status_code)
print(response.text)
```
---

### **When to Use `httpx`**
- **Web Scraping**: Make multiple requests concurrently with async support.
- **APIs**: Communicate with RESTful or GraphQL APIs using efficient HTTP/2.
- **Proxies**: Handle requests via proxy servers with ease.
- **Modern HTTP Features**: Use advanced features like HTTP/2 and custom middleware.

---

In [None]:
"""  
Objective: Sending a simple HTTP request using httpx
"""
import httpx

r = httpx.get('https://httpbin.org/get')
# TODO: Try to manipulate the r object above as you are using requests

In [None]:
"""  
Objective: Sending HTTP request using httpx client
httpx.Client() is what you can use instead of requests.Session()
"""
import httpx
import time


start_time = time.time()

# Send the first request
response_1 = httpx.get("https://httpbin.org/cookies/set?cookie_name=cookie_value", follow_redirects=True)
print("First Request (Set Cookie):", response_1.json())

# Send a second request to check cookies
response_2 = httpx.get("https://httpbin.org/cookies")
print("Second Request (No Session):", response_2.json())

# TODO: Send a third request to check cookies

end_time = time.time()

print(f"Total execution time {end_time - start_time:.2f}")

In [1]:
"""  
Objective: Sending HTTP request using httpx client
httpx.Client() is what you can use instead of requests.Session()
"""
# TODO: Improve code above by using httpx client
# TODO: Analyze the difference

import httpx
import time


start_time = time.time()

# Using httpx.Client() to maintain session
with httpx.Client(follow_redirects=True) as client:
    # Send the first request
    response_1 = client.get("https://httpbin.org/cookies/set?cookie_name=cookie_value")
    print("First Request (Set Cookie):", response_1.json())

    # Send a second request to check cookies
    response_2 = client.get("https://httpbin.org/cookies")
    print("Second Request (With Session):", response_2.json())

    # Send a third request to check cookies
    response_3 = client.get("https://httpbin.org/cookies")
    print("Third Request (With Session):", response_3.json())

end_time = time.time()
print(f"Total execution time {end_time - start_time:.2f}")

First Request (Set Cookie): {'cookies': {'cookie_name': 'cookie_value'}}
Second Request (With Session): {'cookies': {'cookie_name': 'cookie_value'}}
Third Request (With Session): {'cookies': {'cookie_name': 'cookie_value'}}
Total execution time 3.34


In [2]:
"""  
Objective: Make asynchronous requests, Using AsyncClient.
"""
# async with httpx.AsyncClient() as client:
#     response_1 = await client.get("https://httpbin.org/get")
#     response_2 = await client.get("https://httpbin.org/get")
    # TODO: Add another response object from the same site
    # TODO: Print all response status code
    
    
    
import httpx
import asyncio
import nest_asyncio

# Enable nested event loops for Jupyter
nest_asyncio.apply()

async def main():
    async with httpx.AsyncClient() as client:
        response_1 = await client.get("https://httpbin.org/get")
        response_2 = await client.get("https://httpbin.org/get")
        response_3 = await client.get("https://httpbin.org/get")  # Added third response
        
        # Print all response status codes
        print(f"Response 1 status code: {response_1.status_code}")
        print(f"Response 2 status code: {response_2.status_code}")
        print(f"Response 3 status code: {response_3.status_code}")

# Run the async function
asyncio.run(main())

Response 1 status code: 200
Response 2 status code: 200
Response 3 status code: 200


In [3]:
"""  
Objective: Simulating sending a list of URLs
"""
import httpx
import asyncio
import nest_asyncio

nest_asyncio.apply()

# List of URLs to scrape (use a test URL or public API)
urls = ["https://httpbin.org/get"] * 100  # Sending 100 requests to the same URL

# Function to send requests concurrently
async def fetch(url, client):
    print(f"Sending request to {url}")
    response = await client.get(url)
    return response.status_code  # Return the status code to track success

# Main function to send all requests concurrently
async def send_requests():
    async with httpx.AsyncClient() as client:
        
        # TODO: Use asyncio.gather to send requests concurrently  
        tasks = [fetch(url, client) for url in urls]
        results = await asyncio.gather(*tasks)
        print(f"Completed {len(results)} requests")
        return results    

# TODO: Run the function
asyncio.run(send_requests())

Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending req

[200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200,
 200]

In [4]:
"""  
Objective: Monitoring progress as each task completed
"""
# TODO: Improve previous code to monitor progress using asyncio.as_completed()
# TODO: Add a counter to count how many request already send

import httpx
import asyncio
import nest_asyncio

# Enable nested event loops for Jupyter
nest_asyncio.apply()

# List of URLs to scrape
urls = ["https://httpbin.org/get"] * 100  # Sending 100 requests to the same URL

async def fetch(url, client):
    print(f"Sending request to {url}")
    response = await client.get(url)
    return response.status_code

async def send_requests():
    async with httpx.AsyncClient() as client:
        tasks = [fetch(url, client) for url in urls]
        completed = 0
        
        # Use as_completed to process results as they arrive
        for task in asyncio.as_completed(tasks):
            status_code = await task
            completed += 1
            print(f"Completed {completed}/100 requests. Status code: {status_code}")

# Run the async function
asyncio.run(send_requests())

Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending req

In [5]:
"""  
Objective: Limiting Concurrent Requests using Semaphore to avoid overloading the server
"""
# TODO: improve previous code by limiting to max 10 requests using asyncio.Semaphore()

import httpx
import asyncio
import nest_asyncio

# Enable nested event loops for Jupyter
nest_asyncio.apply()

# List of URLs to scrape
urls = ["https://httpbin.org/get"] * 100  # Sending 100 requests to the same URL

# Create a semaphore to limit concurrent requests
semaphore = asyncio.Semaphore(10)  # Limit to 10 concurrent requests

async def fetch(url, client):
    async with semaphore:  # Use semaphore to limit concurrent requests
        print(f"Sending request to {url}")
        response = await client.get(url)
        return response.status_code

async def send_requests():
    async with httpx.AsyncClient() as client:
        completed = 0
        tasks = [fetch(url, client) for url in urls]
        
        # Process tasks as they complete
        for task in asyncio.as_completed(tasks):
            status_code = await task
            completed += 1
            print(f"Completed {completed}/100 requests. Status code: {status_code}")

# Run the async function
asyncio.run(send_requests())

Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Completed 1/100 requests. Status code: 200
Sending request to https://httpbin.org/get
Completed 2/100 requests. Status code: 200
Sending request to https://httpbin.org/get
Completed 3/100 requests. Status code: 200
Sending request to https://httpbin.org/get
Completed 4/100 requests. Status code: 200
Sending request to https://httpbin.org/get
Completed 5/100 requests. Status code: 200
Sending request to https://httpbin.org/get
Sending request to https://httpbin.org/get
Completed 6/100 requests. Status code: 200
Completed 7

In [7]:
"""  
Objective: Implement asynchronous in your web scraping
"""
# TODO: Create a new branch from your previous web scraping project
# TODO: Implement asynchronous using httpx.AsyncClient
# TODO: Push and put the github link here for grading


import httpx
import asyncio
import pandas as pd
from bs4 import BeautifulSoup
import nest_asyncio

# Enable nested event loops for Jupyter
nest_asyncio.apply()

# Semaphore to limit concurrent requests
semaphore = asyncio.Semaphore(10)

async def fetch_page(url, client):
    async with semaphore:
        try:
            response = await client.get(url)
            return response.text
        except Exception as e:
            print(f"Error fetching {url}: {e}")
            return None

async def parse_page(html):
    if html is None:
        return None
    
    soup = BeautifulSoup(html, 'html.parser')
    # Add your parsing logic here
    # Example:
    data = {
        'title': soup.title.text if soup.title else '',
        # Add more fields based on your scraping needs
    }
    return data

async def scrape_website():
    # Your list of URLs to scrape
    urls = [
        "https://example.com/page1",
        "https://example.com/page2",
        # Add more URLs
    ]
    
    async with httpx.AsyncClient() as client:
        # Fetch all pages concurrently
        html_contents = await asyncio.gather(*[
            fetch_page(url, client) for url in urls
        ])
        
        # Parse all pages concurrently
        results = await asyncio.gather(*[
            parse_page(html) for html in html_contents if html is not None
        ])
        
        # Filter out None results and convert to DataFrame
        valid_results = [r for r in results if r is not None]
        df = pd.DataFrame(valid_results)
        return df

# Run the scraper
df = asyncio.run(scrape_website())
print(f"Scraped {len(df)} pages successfully")

# GitHub repository link for grading
github_link = "https://github.com/rudicatsmile/scrpping_tutorial/tree/async-scraping"
print(f"GitHub Link: {github_link}")



# git add .
# git commit -m "Implement async web scraping using httpx"
# git push origin async-scraping

Scraped 2 pages successfully
GitHub Link: https://github.com/rudicatsmile/scrpping_tutorial/tree/async-scraping


### **Reflection**
By using asynchronous, we can send multiple request at once. By doing that, what do you think will effect on the server side?

(answer here)

When using asynchronous requests to send multiple requests simultaneously to a server, there are several potential effects on the server side:

1. Increased Server Load
   
   - The server needs to handle multiple concurrent requests instead of processing them one at a time
   - This can lead to higher CPU and memory usage on the server
2. Resource Consumption
   
   - More simultaneous connections need to be maintained
   - Database connections may increase if the requests require database access
   - Memory usage might spike to handle multiple requests at once
3. Rate Limiting Issues
   
   - Servers might interpret multiple simultaneous requests as a potential DDoS attack
   - This could trigger rate limiting or IP blocking mechanisms
   - Some servers might return 429 (Too Many Requests) errors
4. Bandwidth Usage
   
   - Network bandwidth consumption increases due to parallel requests
   - This could affect other users accessing the same server
5. Server Response Time
   
   - If the server isn't properly configured for concurrent requests, response times might increase
   - Other users' requests might be delayed due to increased server load
To mitigate these effects, it's important to:

- Implement rate limiting in your async code
- Use semaphores to control concurrent connections
- Respect the server's robots.txt and terms of service
- Add delays between requests when necessary
- Monitor server responses for rate limiting warnings

### **Exploration**
Explore how you can optimize the scraping execution time while still maintaining control over the quantity of request.

Here are several strategies to optimize web scraping execution time while maintaining control over request quantity:
1. Batch Processing with Semaphores

In [None]:
import httpx
import asyncio
import time

# Configure limits
BATCH_SIZE = 10
TOTAL_URLS = 100
DELAY_BETWEEN_BATCHES = 1  # seconds

async def fetch_with_batches():
    urls = ["https://httpbin.org/get"] * TOTAL_URLS
    semaphore = asyncio.Semaphore(BATCH_SIZE)
    start_time = time.time()
    
    async with httpx.AsyncClient() as client:
        async def fetch_url(url):
            async with semaphore:
                response = await client.get(url)
                return response.status_code
                
        # Process in batches
        for i in range(0, len(urls), BATCH_SIZE):
            batch = urls[i:i+BATCH_SIZE]
            tasks = [fetch_url(url) for url in batch]
            results = await asyncio.gather(*tasks)
            print(f"Batch {i//BATCH_SIZE + 1} completed: {len(results)} requests")
            await asyncio.sleep(DELAY_BETWEEN_BATCHES)
            
    print(f"Total time: {time.time() - start_time:.2f} seconds")

asyncio.run(fetch_with_batches())

2. Dynamic Rate Limiting

In [None]:
import httpx
import asyncio
import time
from collections import deque

class RateLimiter:
    def __init__(self, max_requests, time_window):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = deque()
    
    async def acquire(self):
        now = time.time()
        
        # Remove old requests
        while self.requests and self.requests[0] <= now - self.time_window:
            self.requests.popleft()
        
        # Wait if at limit
        if len(self.requests) >= self.max_requests:
            wait_time = self.requests[0] - (now - self.time_window)
            if wait_time > 0:
                await asyncio.sleep(wait_time)
        
        self.requests.append(now)

3. Prioritized Queue System

In [None]:
import asyncio
import httpx
from asyncio import PriorityQueue

async def priority_scraper():
    queue = PriorityQueue()
    # Priority 1: Critical pages
    # Priority 2: Important pages
    # Priority 3: Normal pages
    
    async def worker():
        while True:
            priority, url = await queue.get()
            try:
                async with httpx.AsyncClient() as client:
                    response = await client.get(url)
                    print(f"Processed {url} with priority {priority}")
            finally:
                queue.task_done()

Key Optimization Techniques:

1. Use connection pooling with httpx.AsyncClient()
2. Implement exponential backoff for retries
3. Monitor and adjust concurrency limits based on server response
4. Cache responses when appropriate
5. Use streaming responses for large payloads
6. Implement proper error handling and recovery