## What is web scraping
The process of extracting data from websites automatically.

### How web scraping works:

* When you visit a web page, the browser sends a request to the server hosting that web page and the server responds with the HTML, CSS, and JS code that constructs the page on the browser.

* When the server responds with the page code, the scraper can parse through it to extract the data it needs.

* Scrapers typically target specific parts of a page, like product listings or tables.

* Scraped data is usually structured and outputted as JSON, CSV or some other structured format. This makes it easy to work with programatically.

#### The library
In this notebook we'll use BeatuifulSoup.


## Let's start scraping
To demonstrate web scraping, let's try extracting data from a simple HTML page:

In [None]:
<html>
<body>

<h1>Top 5 Fruits</h1>

<ul>
<li>Apples</li> 
<li>Oranges</li>
<li>Bananas</li>
<li>Grapes</li>
<li>Strawberries</li>
</ul>

</body>
</html>

#### we want to extract the names of the fruits into a list

In [2]:
# import BeautifulSoup the library
from bs4 import BeautifulSoup

with open("./3.99-files/fruits.html") as f:
    soup = BeautifulSoup(f, 'html.parser')

This creates a BeautifulSoup object that we can now traverse and search like a parse tree.


To extract the fruits names, we can select the <span style="color: #00a1ff; background-color: #111111; border-radius: 2px 4px; padding: 0 4px;">\<li></span> tags and loop through them: 

In [3]:
fruits = []

for li in soup.select("li"):
    fruits.append(li.text)

print(fruits)

['Apples', 'Oranges', 'Bananas', 'Grapes', 'Strawberries']


And we have successfully scraped the data we wanted!

The same principles apply when scraping any web page. You inspect the HTML, identify patterns and select elements to extract data.

<hr>

There is a lot more to learn, but this should provide a solid starting point for your web scraping journey!


* For more I recommend watching <a href="https://youtu.be/7ahUnBhdI5o">the web scraper project with Tiff</a>

. . . . . . . 

&copy; Created by <a href="https://github.com/mohamedyosef101">Mohamed Yosef </a> | *<a href="https://medium.com/@mohamedyosef101">Medium</a> - <a href="https://linkedin.com/in/mohamedyosef101">LinkedIn</a>*

<hr>