WebScraping

A C# web scraping project built with Microsoft Playwright that extracts quotes, authors, author profile links, and tags from https://quotes.toscrape.com/, then saves the results to a JSON file.

Overview

This project launches a headless Chromium browser with Playwright, navigates to the target page, collects quote data, prints the results to the console, and exports the scraped data to scraped_quotes.json.

It demonstrates:

Playwright browser automation
asynchronous programming with async/await
structured scraping with a Page Object Model
JSON serialization in C#

Features

Scrapes:
- quote text
- author name
- author profile link
- quote tags
Uses Playwright with headless Chromium
Uses a simple Page Object Model for selectors
Serializes scraped results to formatted JSON
Prints scraped data to the console

Tech Stack

C#
.NET 10
Microsoft Playwright

Project Structure

WebScraping/
├── Program.cs
├── WebScraping.csproj
├── WebScraping.sln
├── scraped_quotes.json
└── WebScraper/
    ├── Quote.cs
    ├── WebScraper.cs
    └── POMs/
        └── QuotesPage.cs

How It Works

The app creates a scraper instance with the target URL.
Playwright launches a headless Chromium browser.
The scraper navigates to the page and waits for the DOM to load.
Quote containers are collected from the page.
Each quote is parsed into a Quote object.
The results are printed to the console.
The dataset is saved to scraped_quotes.json.

Getting Started

Prerequisites

.NET 10 SDK
PowerShell
Internet connection

Installation

git clone https://github.com/Hibob555556/WebScraping.git
cd WebScraping
dotnet restore
dotnet build
pwsh bin/Debug/net10.0/playwright.ps1 install

Running the Project

dotnet run

Output

When the scraper runs:

prints quotes to the console
creates scraped_quotes.json

Example:

[
  {
    "QuoteText": "The world as we have created it is a process of our thinking.",
    "Author": "Albert Einstein",
    "AuthorAboutLink": "https://quotes.toscrape.com/author/Albert-Einstein",
    "Tags": ["change", "deep-thoughts", "thinking", "world"]
  }
]

Code Highlights

Program.cs: entry point and output handling
WebScraper.cs: browser automation and scraping logic
Quote.cs: data model
QuotesPage.cs: selectors (Page Object Model)

Notes

Targets https://quotes.toscrape.com/.
Selectors may need updating if the site changes.

Future Improvements

CLI arguments for custom URLs
pagination support
CSV export
improved error handling and logging

Author

Cayden Lunt
https://github.com/Hibob555556
https://www.linkedin.com/in/cayden-lunt-test-engineer

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
WebScraper		WebScraper
.gitignore		.gitignore
Program.cs		Program.cs
README.md		README.md
WebScraping.csproj		WebScraping.csproj
WebScraping.sln		WebScraping.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraping

Overview

Features

Tech Stack

Project Structure

How It Works

Getting Started

Prerequisites

Installation

Running the Project

Output

Code Highlights

Notes

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

WebScraping

Overview

Features

Tech Stack

Project Structure

How It Works

Getting Started

Prerequisites

Installation

Running the Project

Output

Code Highlights

Notes

Future Improvements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages