## Web Scraping with BeautifulSoup

This code snippet demonstrates web scraping using the `requests` and `BeautifulSoup` libraries in Python. The goal is to extract data from a website's HTML content.


In [1]:
import requests
from bs4 import BeautifulSoup

### Scraping Data From Flipkart



1. The `requests` library is imported to send HTTP requests to a web page.
2. The `BeautifulSoup` class from the `bs4` module is imported to parse HTML content.
3. The code starts by making an HTTP GET request to a specified URL using the `requests.get()` function.
4. The response content is then passed to the `BeautifulSoup` constructor along with the specified parser (`html.parser` in this case). This creates a `soup` object, which represents the parsed HTML content.
5. With the `soup` object, you can use various methods and functions provided by `BeautifulSoup` to navigate and search through the HTML structure.
6. The code snippet demonstrates finding specific HTML elements based on their attributes using the `find_all()` method. It locates elements with the class `_2WkVRV` (brand names) and `IRpwTa` (descriptions).
7. For each found element, it extracts the text using the `.text` attribute and stores the values in separate lists (`brand_names_list` and `descriptions_list`).
8. It also extracts the product URLs by concatenating the base URL with the `href` attribute of each description element.
9. Finally, the code returns the lists of brand names, descriptions, and product URLs.


In [54]:
def scrap_data_from_flipkart(url:str):
    header = ({'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accepted-Language':'en-US, en;q=0.5'})
    # Send a GET request to the provided URL
    response = requests.get(url,headers=header)
    
    # Create BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all elements with the class '_2WkVRV' (brand names)
    brand = soup.find_all('div', {'class': '_2WkVRV'})
    
    # Find all elements with the class 'IRpwTa' (descriptions)
    description = soup.find_all('a', {'class': 'IRpwTa'})
    
    image_url = soup.find_all('img',{'class':'_2r_T1I'})
    
    # Extract the text of each brand name and store it in a list
    brand_names_list = [each.text for each in brand]
    
    # Extract the text of each description and store it in a list
    descriptions_list = [each.text for each in description]
    
    # Extract the product URLs by concatenating the base URL with the 'href' attribute of each description element
    product_link_list = ['https://www.flipkart.com' + each['href'] for each in description]
    
    image_url_list = [each['src'] for each in image_url]
    
    # Return the lists of brand names, descriptions, and product URLs
    return brand_names_list, descriptions_list, product_link_list,image_url_list   

## Initializing Empty Lists for Flipkart Data

In [55]:
flipkart_product_brand_names = []
flipkart_product_descriptions = []
flipkart_product_urls = []
flipkart_product_image_urls = []

### Extracting Men's Clothing Data 

#### Topwear

In [56]:
from tqdm import tqdm

# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&otracker=nmenu_sub_Men_0_Top+wear&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls,image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()


100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:18<00:00,  1.60page/s]


#### Trackpants

In [5]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/bottomwear/track-pants/men-track-pants/pr?sid=clo%2Cvua%2Cjlk%2C6ql&otracker=categorytree&otracker=nmenu_sub_Men_0_Track+pants&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls,image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:29<00:00,  1.02page/s]


#### Jeans

In [6]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/bottomwear/jeans/men-jeans/pr?sid=clo%2Cvua%2Ck58%2Ci51&otracker=categorytree&otracker=nmenu_sub_Men_0_Jeans&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00,  1.28page/s]


#### Shorts

In [7]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/bottomwear/shorts/men-shorts/pr?sid=clo%2Cvua%2Ce8g%2Ckc7&otracker=categorytree&otracker=nmenu_sub_Men_0_Shorts&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:29<00:00,  1.03page/s]


### Extracting Women's Clothing Data 

#### Ethnicwear

In [8]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/ethnic-wear/palazzo/pr?sid=clo%2Ccfv%2Cmn6&otracker=categorytree&otracker=nmenu_sub_Women_0_Palazzos&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()


100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:31<00:00,  1.05s/page]


#### Lehnga Choli

In [9]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/lehenga-choli/women-lehenga-choli/pr?sid=clo%2Chlg%2Cwrp&otracker=categorytree&otracker=nmenu_sub_Women_0_Lehenga+Choli&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:26<00:00,  1.12page/s]


#### Plazzos

In [10]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/ethnic-wear/palazzo/pr?sid=clo%2Ccfv%2Cmn6&otracker=categorytree&otracker=nmenu_sub_Women_0_Palazzos&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:30<00:00,  1.02s/page]


#### Kurtas

In [11]:
# Total number of pages
total_pages = 30

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.flipkart.com/clothing-and-accessories/ethnic-wear/kurtas/women-kurtas-and-kurtis/pr?sid=clo%2Ccfv%2Ccib%2Crkt&q=kurtas+kurtis&otracker=categorytree&otracker=nmenu_sub_Women_0_Kurtas+%26+Kurtis&page={page}'
    
    # Call the `scrap_data_from_flipkart()` function to get brand names, descriptions, and URLs
    brand_names, descriptions, urls, image_urls = scrap_data_from_flipkart(url)
    
    # Extend the respective lists with the obtained data from the current page
    flipkart_product_brand_names.extend(brand_names)
    flipkart_product_descriptions.extend(descriptions)
    flipkart_product_urls.extend(urls)
    flipkart_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:26<00:00,  1.13page/s]


### Scraping Data From Amazon

In [59]:
def scrap_data_from_amazon(url: str):
    # Set the header for the request to mimic a web browser
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
        'Accepted-Language': 'en-US, en;q=0.5'
    }
    
    # Send a GET request to the provided URL with the specified header
    response = requests.get(url, headers=header)
    
    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all elements with class 'a-size-base-plus a-color-base' to extract brand names
    brand = soup.find_all('span', {'class': 'a-size-base-plus a-color-base'})
    
    # Find all elements with class 'a-size-base-plus a-color-base a-text-normal' to extract descriptions
    description = soup.find_all('span', {'class': 'a-size-base-plus a-color-base a-text-normal'})
    
    # Find all <a> tags with class 'a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal' to extract URLs
    urls = soup.find_all('a', {'class': 'a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal'})
    
    image_urls = soup.find_all('img',{'class':'s-image'})
    
    # Extract the text content from the 'brand' elements and store them in a list
    brand_names_list = [each.text for each in brand]
    
    # Extract the text content from the 'description' elements and store them in a list
    descriptions_list = [each.text for each in description]
    
    # Create a list of product URLs by appending the href attribute to 'https://www.amazon.in'
    product_link_list = ['https://www.amazon.in' + each['href'] for each in urls]
    
    image_urls_list = [each['src'] for each in image_urls]
    
    # Return the lists of brand names, descriptions, and product URLs
    return brand_names_list, descriptions_list, product_link_list,image_urls_list


1. Define a function `scrap_data_from_amazon(url: str)` to scrape data from Amazon.
2. Set the header for the request to mimic a web browser. This ensures that the request appears as if it is coming from a web browser rather than a bot.
3. Send a GET request to the provided URL with the specified header using the `requests.get()` function. Store the response in the `response` variable.
4. Create a BeautifulSoup object to parse the HTML content of the response. Pass the `response.content` and `'html.parser'` as arguments to the `BeautifulSoup` constructor. Store the BeautifulSoup object in the `soup` variable.
5. Use the `soup.find_all()` method to find all elements with class 'a-size-base-plus a-color-base' and extract the brand names. Store the results in the `brand` variable.
6. Use the `soup.find_all()` method to find all elements with class 'a-size-base-plus a-color-base a-text-normal' and extract the descriptions. Store the results in the `description` variable.
7. Use the `soup.find_all()` method to find all `<a>` tags with class 'a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal' and extract the URLs. Store the results in the `urls` variable.
8. Extract the text content from the `brand` elements and store them in a list using a list comprehension. Store the list in the `brand_names_list` variable.
9. Extract the text content from the `description` elements and store them in a list using a list comprehension. Store the list in the `descriptions_list` variable.
10. Create a list of product URLs by appending the href attribute of each `<a>` tag in `urls` to 'https://www.amazon.in'. Store the list in the `product_link_list` variable using a list comprehension.
11. Return the lists of brand names, descriptions, and product URLs as a tuple: `brand_names_list`, `descriptions_list`, `product_link_list`.


## Initializing Empty Lists for Amazon Data

In [60]:
amazon_product_brand_names = []
amazon_product_descriptions = []
amazon_product_urls = []
amazon_product_image_urls = []

### Extracting Men's Clothing Data 

#### T-Shirts

In [61]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968120031%2Cp_36%3A4595084031&page=2&content-id=amzn1.sym.f5e83e00-a666-492b-b882-5fa6fba3548e&pd_rd_r=21025b0c-ec69-41d2-96e6-ec699afe0fee&pd_rd_w=lDWwe&pd_rd_wg=IQSQx&pf_rd_p=f5e83e00-a666-492b-b882-5fa6fba3548e&pf_rd_r=4SSBTYVCG98DM53DR4PQ&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:03<00:00,  1.58s/page]


#### Jeans

In [16]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968076031&fs=true&page=2&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:05<00:00,  1.65s/page]


#### Jeans & Coats

In [17]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968088031&fs=true&page=2&qid=1684610738&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [00:58<00:00,  1.47s/page]


#### Sweaters

In [18]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968077031&fs=true&page=2&qid=1684610897&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:00<00:00,  1.52s/page]


### Extracting Women's Clothing Data 

#### Kurtas

In [19]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968255031&fs=true&page=2&qid=1684611025&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:03<00:00,  1.58s/page]


#### Western Wear

In [20]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A11400137031&fs=true&page=2&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [00:59<00:00,  1.48s/page]


#### Salwar Suits

In [21]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A3723380031&fs=true&page=2&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:00<00:00,  1.51s/page]


#### Sarees

In [22]:
# Set the total number of pages
total_pages = 40

# Initialize the progress bar
progress_bar = tqdm(total=total_pages, unit='page')

# Iterate over each page
for page in range(1, total_pages + 1):
    # Create the URL for the specific page
    url = f'https://www.amazon.in/s?i=apparel&rh=n%3A1968256031&fs=true&page=2&ref=sr_pg_{page}'
    
    # Scrape data from the Amazon page using the provided URL
    brand_names, descriptions, urls, image_urls = scrap_data_from_amazon(url)
    
    # Extend the respective lists with the obtained data from the current page
    amazon_product_brand_names.extend(brand_names)
    amazon_product_descriptions.extend(descriptions)
    amazon_product_urls.extend(urls)
    amazon_product_image_urls.extend(image_urls)
    
    # Update the progress bar
    progress_bar.update(1)

# Close the progress bar
progress_bar.close()

100%|████████████████████████████████████████████████████████████████████████████████| 40/40 [01:03<00:00,  1.58s/page]


# Combining Scrapped data of Amazon & Flipkart

In [40]:
# Combine the brand names from Flipkart and Amazon
combined_brands = flipkart_product_brand_names + amazon_product_brand_names

# Combine the product descriptions from Flipkart and Amazon
combined_product_descriptions = flipkart_product_descriptions + amazon_product_descriptions

# Combine the product URLs from Flipkart and Amazon
combined_product_urls = flipkart_product_urls + amazon_product_urls

# Combine the image URLs from Flipkart and Amazon
combined_image_urls = flipkart_product_image_urls + amazon_product_image_urls

# Converting These Lists into a DataFrame

In [41]:
import pandas as pd

In [44]:
# Create a dictionary named 'data' to store the data for the DataFrame
# The keys in the dictionary represent the column names in the DataFrame
# The values are lists containing the data for each column
data = {'brand': combined_brands, 'description': combined_product_desctiption, 'url': combined_product_url}

# Convert the 'data' dictionary to a DataFrame using the pd.DataFrame() function
# Each key in the 'data' dictionary will become a column in the DataFrame
df = pd.DataFrame(data)


# Saving DataFrame as .csv

In [45]:
# Save the DataFrame as a CSV file with the filename 'clothing_data.csv'
# Set the parameter 'index' to False to exclude the row index labels from the CSV file
df.to_csv('clothing_data.csv', index=False)