#### Test: Scraping Amazon for listings relating to Kirby Plush, converting to DataFrame

##### Setting up imports

In [39]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

#### Fetching remote resource, converting to soup

In [40]:
page = requests.get("https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=kirby+plush&_sacat=0")
soup = BeautifulSoup(page.text, 'lxml')

#### Narrowing down to collection of product cards

In [41]:
products = soup.find_all("li", attrs={'class':'s-item'})

#### Areas of interest:
* Product Name (span role="heading")
* Product Condition (span class="SECONDARY_INFO")
* Link to image (img class="s-item__image-img")
* Product Price (span class="s-item__price")
* If it's a new listing (span class="LIGHT_HIGHLIGHT")
* If it's a sponsored link (Interesting hiccup - this is present on sponsored AND non-sponsored posts, the only change is if it's visible or not)


#### Creating DF with headers

In [42]:
df = pd.DataFrame(columns=["Name","Condition","Price","New_Listing","Image_Link"])
df = df.astype({"New_Listing": bool})
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         0 non-null      object
 1   Condition    0 non-null      object
 2   Price        0 non-null      object
 3   New_Listing  0 non-null      bool  
 4   Image_Link   0 non-null      object
dtypes: bool(1), object(4)
memory usage: 0.0+ bytes


#### Iterating through products, drilling to information, appending to df
* The NAME attribute may also contain a nested span with the text "NEW LISTING".  To omit this, we have selected the entire span tag, then select the last element using [-1].  If "NEW LISTING" is present, will skip and grab product title.  If it is not present, will still grab product title
* Another issue - there is a product card following same format as products, but contains "Shop on eBay" and is hidden from view.  To avoid hitting this error, implementing simple conditional loop to check if title contains "Kirby".  This will also help avoid unrelated sponsored content from being added

In [43]:
for product in products:
    name = str(product.find("span", attrs={"role":"heading"}).contents[-1])
    condition = product.find("span", class_="SECONDARY_INFO").text
    price = product.find("span", class_="s-item__price").text
    image = product.find("img", class_="s-item__image-img")["src"]
    if product.find("span", class_="LIGHT_HIGHLIGHT") == None:
        new_listing = False
    else:
        new_listing = True
    #new_listing = product.find("span", class_="LIGHT_HIGHLIGHT")
    if "Kirby" in name:
        row = {"Name": name, "Condition": condition, "Price": price, "New_Listing":new_listing, "Image_Link":image}
        df = pd.concat([df, pd.DataFrame([row])])
        #print([name, condition, price, image, new_listing])
df.head()

Unnamed: 0,Name,Condition,Price,New_Listing,Image_Link
0,"Sanei Kirby 5.5"" Plush Stuffed Doll (KP01) - K...",Brand New,$13.94,False,https://i.ebayimg.com/thumbs/images/g/ITEAAOSw...
0,Kirby Plush 14 Inch Very Soft Stuffed Animal K...,Brand New,$22.88,False,https://i.ebayimg.com/thumbs/images/g/3HsAAOSw...
0,"Kirby 5.5"" Plush Toy Little Buddy Kirby Advent...",Brand New,$8.96,False,https://i.ebayimg.com/thumbs/images/g/xToAAOSw...
0,Kirby Sitting Plush,Pre-Owned,$10.00,True,https://i.ebayimg.com/thumbs/images/g/6FgAAOSw...
0,"Kirby plush 10"" King Dedede Plush Doll",Brand New,$18.88,False,https://i.ebayimg.com/thumbs/images/g/2X8AAOSw...


#### A few cleanup steps - removing repeated zero index row by using reset_index(), then dropping the "index" column of all zeros

In [44]:
df.reset_index(inplace=True)
df.drop(columns=["index"], inplace=True)
df.head(10)

Unnamed: 0,Name,Condition,Price,New_Listing,Image_Link
0,"Sanei Kirby 5.5"" Plush Stuffed Doll (KP01) - K...",Brand New,$13.94,False,https://i.ebayimg.com/thumbs/images/g/ITEAAOSw...
1,Kirby Plush 14 Inch Very Soft Stuffed Animal K...,Brand New,$22.88,False,https://i.ebayimg.com/thumbs/images/g/3HsAAOSw...
2,"Kirby 5.5"" Plush Toy Little Buddy Kirby Advent...",Brand New,$8.96,False,https://i.ebayimg.com/thumbs/images/g/xToAAOSw...
3,Kirby Sitting Plush,Pre-Owned,$10.00,True,https://i.ebayimg.com/thumbs/images/g/6FgAAOSw...
4,"Kirby plush 10"" King Dedede Plush Doll",Brand New,$18.88,False,https://i.ebayimg.com/thumbs/images/g/2X8AAOSw...
5,"6"" Kirby Super Star Plush Toys Cute Kirby Soft...",Brand New,$9.59 to $9.89,False,https://i.ebayimg.com/thumbs/images/g/~skAAOSw...
6,Kirby chef kawasaki plush doll anime video gam...,Pre-Owned,$8.38,False,https://i.ebayimg.com/thumbs/images/g/iNUAAOSw...
7,JUMBO Kirby Adventure Run Kirby Plush Toy Supe...,Brand New,$28.99,False,https://i.ebayimg.com/thumbs/images/g/A5wAAOSw...
8,NEW Kirby Galacta Knight Plush Doll Kirby Star...,Brand New,$18.15,False,https://i.ebayimg.com/thumbs/images/g/kqYAAOSw...
9,"Kirby Plush 8"" with Sword. NEW with TAGS!!!! ...",Brand New,$25.00,False,https://i.ebayimg.com/thumbs/images/g/z-AAAOSw...


#### Outputting as CSV file

In [45]:
df.to_csv("kirby_plush_prices.csv", index=False)