# WEEK 4: Async Scraping

## What is Async Scraping
Async scraping is a method of web scraping where your script fetches multiple pages or data simultaneously without waiting for each request to finish, making it faster and more robustâ€”even when running in headless mode and handling errors automatically.

## GraphQL
https://web-scraping.dev/

Task: graphQL exercise

Output: Review, Star, Date

In [1]:
import requests
import csv

url = "https://web-scraping.dev/api/graphql"
headers = {
    "Content-Type": "application/json",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)",
    "Referer": "https://web-scraping.dev/reviews"
}

query = """
query GetReviews($first: Int, $after: String) {
    reviews(first: $first, after: $after) {
        edges {
            node {
                rid
                text
                rating
                date
                }
            cursor
        }
        pageInfo {
            startCursor
            endCursor
            hasPreviousPage
            hasNextPage
            }
        }
    }
"""

all_reviews = []
after = None

while True:
    payload = {
        "query": query,
        "variables": {
            "first": 100,
            "after": after
        }
    }
    response = requests.post(url, json=payload, headers=headers)
    res_json = response.json()

    reviews_data = res_json["data"]["reviews"]
    edges = reviews_data["edges"]
    page_info = reviews_data["pageInfo"]

    for edge in edges:
        all_reviews.append(edge["node"])    
    print(f"Fetched {len(all_reviews)} reviews so far...")

    if not page_info["hasNextPage"]:
        break

    after = page_info["endCursor"]
print(f"Total reviews fetched: {len(all_reviews)}")

with open("reviews.csv", "w", newline='', encoding='utf-8') as csvfile:
    fieldnames = ["rid", "text", "rating", "date"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for review in all_reviews:
        writer.writerow(review)
print("Reviews have been written to reviews.csv")

Fetched 96 reviews so far...
Total reviews fetched: 96
Reviews have been written to reviews.csv


# Additional
### How to Use AI to improve data scraping
- selector
- element
- url

### Example: Linkedin Scraping
https://github.com/muhfajarags/linkedin-crawler

# Challenge (Contact)

Task: Scrape phone numbers of marketing agencies (any niche is fine), with a minimum of 5 agencies.

Output: tel_contact, website_url.