## **Web Scraping**
 Product Details on Flipkart**

In the world of e-commerce, staying updated with the latest product information, prices, and customer reviews is crucial for both consumers and businesses. However, manually collecting this data from online platforms can be a time-consuming and tedious process. This is where web scraping comes into play.

**Web scraping** is the process of programmatically extracting data from websites. In this specific task, we aim to scrape product details of iPhones listed on **Flipkart**, one of India's leading e-commerce platforms. Flipkart provides an extensive collection of iPhone products, each with its own set of attributes such as name, price, ratings, and specifications.

**Beautiful Soup** is a Python library that simplifies the parsing and extraction of data from HTML and XML documents. It allows us to navigate the HTML structure of a webpage, locate specific elements, and extract the information we need.

**The main objectives of this web scraping task are:**

1. **Data Collection:** Using BeautifulSoup, we will send HTTP GET requests to Flipkart's iPhone product pages and retrieve the HTML content.

2. **Data Extraction:** We will parse the HTML content to locate and extract relevant data, such as product names, prices, ratings, and specifications.

3. **Data Storage:** The extracted data will be stored in an organized format, such as a Pandas DataFrame or a CSV file, for further analysis or reporting.

4. **Automation:** By automating this data collection process, we can periodically update our dataset to ensure it reflects the latest product offerings and prices on Flipkart.

In [44]:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as req

In [45]:
url_f='https://www.flipkart.com/search?q=i%20phone&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'

In [46]:
uclient=req(url_f)
page_html=uclient.read()
# page_html


In [47]:
uclient.close()

In [48]:
page_soup=soup(page_html,'html.parser')
# page_soup

In [49]:
from pandas.compat.pyarrow import pa
containers=page_soup.find_all('div',{'class':'_4rR01T'})
# containers

In [50]:
containers
extracted_text=[]
for i in containers:
  text=str(i).find('APPLE')
  end_text=str(i).find(')')
  extracted_text.append(str(i)[text:end_text].split('(')[0]+',0'+str(i)[text:end_text].split('(')[1])

len(extracted_text)


24

In [51]:
price_container=page_soup.find_all('div',{"class":"_30jeq3 _1_WHN1"})
# price_container
price_container[0].text

'₹52,499'

In [52]:
price=[]
for i in price_container:
  price.append(i.text)

# price

In [53]:
with open("iphone_data.csv",'w') as iphone_file:
  iphone_file.write('Phone model,Colur,Storage,Rate\n')
  for i,j in zip(extracted_text,price):
    iphone_file.write(str(i)+",")
    iphone_file.write(str(j).replace(',','')+'\n')

In [54]:
import pandas as pd
data=pd.read_csv('iphone_data.csv')
data['Colur'][0]=''

In [55]:
data.to_csv('iphone_data_with_price.csv',index=False)

In [59]:
pd.read_csv('iphone_data_with_price.csv').head(10)

Unnamed: 0,Phone model,Colur,Storage,Rate
0,APPLE iPhone 13,,128 GB,₹52499
1,APPLE iPhone 13,0Midnight,128 GB,₹52499
2,APPLE iPhone 13,0Pink,128 GB,₹52499
3,APPLE iPhone 13,0Starlight,128 GB,₹52499
4,APPLE iPhone 14,0Blue,128 GB,₹64999
5,APPLE iPhone 11,0Black,64 GB,₹36999
6,APPLE iPhone 12,0White,64 GB,₹48999
7,APPLE iPhone 12,0Blue,64 GB,₹48999
8,APPLE iPhone 14,0Starlight,128 GB,₹64999
9,APPLE iPhone 14,0Midnight,128 GB,₹64999


access tables from url

In [57]:
import pandas as pd
tables_from_url=pd.read_html('https://en.wikipedia.org/wiki/Economy_of_India')
tables_from_url[1].head()

Unnamed: 0,Year,GDP (in Bil. US$PPP),GDP per capita (in US$ PPP),GDP (in Bil. US$nominal),GDP per capita (in US$ nominal),GDP growth (real),Inflation rate (in Percent),Unemployment (in Percent),Government debt (in % of GDP)
0,1980,371.9,532.0,189.4,271.0,5.3%,11.3%,,
1,1981,431.5,603.2,196.5,274.7,6.0%,12.7%,,
2,1982,474.1,647.5,203.5,278.0,3.5%,7.7%,,
3,1983,528.6,705.3,222.0,296.3,7.3%,12.6%,,
4,1984,568.6,741.4,215.6,281.1,3.8%,6.5%,,
