# **Web Scraping**

---



Web scraping is the process of gathering information from the Internet. In fact, copying and pasting the lyrics of your favourite song is a form of web scraping! So basically web scraping generally relates to a process that involves automation of this copy and paste process. However, some websites don’t allow it when automatic scrapers gather their data, while some don’t mind. 
 
 If you’re scraping a website for educational purposes. Still, it’s a good idea to do some exploration on your own and make sure that you’re not violating any Terms of Service before you start a major design. 

In this article, we gonna see how to implement web scraping through Python. So let’s get started:


# **How Web Scraping is useful?**

---


 Let suppose you’re surfing the internet and searching for employment. And you want to job that matches your criteria. 
 There’s will be many points that provide offers precisely the kinds of jobs you want. Unfortunately, a new position only pops up formerly in a blue moon, and the point doesn’t give a dispatch announcement service. Now you have to keep your eye on the website every day, but that doesn’t sound like the most delightful and productive way to spend your time. 

So, rather than looking at the job point every day, you can use Python to help automate your job search. Automated web scraping can be a result to speed up the data collection process. You just need to write your code once, and it'll get the information you want numerous times.
 
As of now, we have a basic understanding of web scraping functioning. So now let’s see how to implement it in the form of Python code:


# **Doing web scraping**

You can scrape any website on the Internet, but the difficulty of doing so depends on the structure of the website. This article provides you with a preface to web scraping to help you understand the overall process. Also, you can apply this same process for every website you’ll want to scrape. 
 
 **Step 1: Check Your Data Source**

 Before writing any Python code, you need to explore the website that you want to scrape. You’ll need to understand the point structure to prize the information that’s applicable for you. Start by opening the point you want to scrape with your favourite browser

 **Explore the Website** 

 Click through the point and interact with it just like any typical job hunt would. For example, you can scroll through the main page of the website. You might also notice that the URL in your browser’s address bar changes when you interact with the website. 
 



##**Step 3: Decrypt the Information in URLs** 

 A programmer can get a lot of information in a URL. Your web scraping trip will be much easier if you first come familiar with how URLs work and what they’re made of. 
This URL consists of two components:

For this example, we are going scrape Flipkart website to extract the Price, Name, and Rating of Laptops. 

The URL for this page is 

https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniqBStoreParam1=val1&wid=11.productCard.PMU_V2.
 

##**Use Developer Tools**

 Next, you need to learn further about how the data is structured for display. You can understand the structure to pick what you want from the HTML response that you’ll collect in one of the forthcoming ways. 

Developer tools can help you understand the structure of a website. All brwosers come with developer tools installed. In this section, you’ll see how to work with the developer tools in Chrome. 

* For Mac: Cmd+Alt+I
* For Windows/Linux: Ctrl+Shift+I

 


#**Step 4: Scrape HTML Content** 
 Now that you have an idea of what you’re working with, it’s time to start using Python. First, get those HTML points into your Python script with which you want to interact. For this task, you’ll use Python’s requests library. 
First, you’ll be creating  a Python file. To do this, open the terminal in Ubuntu and type gedit (filename) with .py extension. Like this:

*gedit web-s.py*


Now let us write the Python code:


In [None]:
from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import pandas as pd


To configure webdriver to use Chrome browser, we have to set the path to chromedriver


In [None]:
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")


Now write the below code:

In [None]:
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;uniq")


Till now we've wrtiten the code for opening the URL, it’s time to extract the data from the website. As mentioned earlier, the data we want to extract is nested in *div* tags. So, we'll find the div tags with those respective class-names, extract the data and store the data in a variable.

In [None]:
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
name=a.find('div', attrs={'class':'_3wU53n'})
price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text) 


Run the code and extract the data
To run the code, use the below command:


In [None]:
python web-s.py

##**Step 5: Store the Data**

After extracting the data, you might want to store it in a format. This format varies depending on your requirement. For this example, we will store the extracted data in a CSV (Comma Separated Value) format.


In [None]:
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings}) 
df.to_csv('products.csv', index=False, encoding='utf-8')


Now just simply run the this code again. You'll see that a new file "products.csv" has been created and it contain the extracted data.

#**Conclusion** 

---


The requests library gives you a user-friendly way to scrape static HTML from the Internet using Python. You can also parse the HTML with another package called Beautiful Soup. Both packages are trusted and helpful companions for your web scraping adventures. You’ll find that Beautiful Soup will feed to the utmost of your parsing requirements, including navigation and advanced searching. 
 
I hope with this article, you've learned how to scrape data from the Web using Python, requests, and Beautiful Soup. 
Thank you!

