From 6c68ac104f2c0bab24c20b32d0f693e2ca9ecada Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Wed, 24 Apr 2024 10:07:46 +0200 Subject: [PATCH] fix: be specific about 'something' --- .../webscraping/web_scraping_for_beginners/crawling/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/webscraping/web_scraping_for_beginners/crawling/index.md b/sources/academy/webscraping/web_scraping_for_beginners/crawling/index.md index 54dce9c67..8a092764f 100644 --- a/sources/academy/webscraping/web_scraping_for_beginners/crawling/index.md +++ b/sources/academy/webscraping/web_scraping_for_beginners/crawling/index.md @@ -20,7 +20,7 @@ In this section, we will take a look at moving between web pages, which we call ## How do you crawl? {#how-to-crawl} -Crawling websites is a fairly straightforward process. We'll start by opening the first web page and extracting all the links (URLs) that lead to the other pages we want to visit. To do that, we'll use the skills learned in the [Basics of data extraction](../data_extraction/index.md) course. We'll add some extra filtering to make sure we only get the correct URLs. Then, we'll save those URLs, so in case something happens to our scraper, we won't have to extract them again. And, finally, we will visit those URLs one by one. +Crawling websites is a fairly straightforward process. We'll start by opening the first web page and extracting all the links (URLs) that lead to the other pages we want to visit. To do that, we'll use the skills learned in the [Basics of data extraction](../data_extraction/index.md) course. We'll add some extra filtering to make sure we only get the correct URLs. Then, we'll save those URLs, so in case our scraper crashes with an error, we won't have to extract them again. And, finally, we will visit those URLs one by one. At any point, we can extract URLs, data, or both. Crawling can be separate from data extraction, but it's not a requirement and, in most projects, it's actually easier and faster to do both at the same time. To summarize, it goes like this: