## Data Collection

Here we are extracting feedbacks coming from an eBay profile using "Requests".

In [1]:
import requests
import pandas as pd
import json

In [2]:
# ORIGINAL URL:https://www.ebay.ca/fdbk/feedback_profile/chord-91?q=125071219021&_trksid=p2047675.l2560

Let's use three urls as there is only 25 reviews per pages.

In [3]:
url1 = 'https://www.ebay.ca/fdbk/update_feedback_profile?url=username%3Dchord-91%26filter%3Dfeedback_page%253AAll%252Cperiod%253ATWELVE_MONTHS%252Coverall_rating%253AAll%26page_id%3D1%26limit%3D200&module=%3Fmodules%3DFEEDBACK_SUMMARY'
url2 = 'https://www.ebay.ca/fdbk/update_feedback_profile?url=username%3Dchord-91%26filter%3Dfeedback_page%253AAll%252Cperiod%253ATWELVE_MONTHS%252Coverall_rating%253AAll%26page_id%3D2%26limit%3D200&module=%3Fmodules%3DFEEDBACK_SUMMARY'
url3 = 'https://www.ebay.ca/fdbk/update_feedback_profile?url=username%3Dchord-91%26filter%3Dfeedback_page%253AAll%252Cperiod%253ATWELVE_MONTHS%252Coverall_rating%253AAll%26page_id%3D3%26limit%3D200&module=%3Fmodules%3DFEEDBACK_SUMMARY'
url4 = 'https://www.ebay.ca/fdbk/update_feedback_profile?url=username%3Dchord-91%26filter%3Dfeedback_page%253AAll%252Cperiod%253ATWELVE_MONTHS%252Coverall_rating%253AAll%26page_id%3D4%26limit%3D200&module=%3Fmodules%3DFEEDBACK_SUMMARY'

urls = [url1,url2,url3,url4]

cont = []

for url in urls:
    r = requests.get(url)
    c = json.loads(r.content)
    cont.append(c)


In [4]:
list_n = []

for i in range(0, len(cont)):
    feedbackCards = cont[i]['modules']['FEEDBACK_SUMMARY']['feedbackView']['feedbackCards']
    for b in range(0, len(feedbackCards)):
        feedbackInfo = feedbackCards[b]['feedbackInfo']
        review = pd.DataFrame({'Feedbacks' : [feedbackInfo['comment']['accessibilityText']]})
        list_n.append(review)

df = pd.concat([*list_n], ignore_index=True)

## DataFrame

Once collected, the data is here seen as a DataFrame

In [5]:
df = df.drop_duplicates()
df

Unnamed: 0,Feedbacks
0,AAA+++
1,Okay thanks A1 super fast
3,Great service
4,"as described, shipped fast . thanks"
5,will add to my collectiobn..thanks
...,...
622,very nice metal shapes and came well packed. T...
623,"Fast shipper, would purchase from again"
624,h/recomend
625,Excellent


## Data Cleaning

In [6]:
df = df['Feedbacks'].astype(str)
df.str.lower()

0                                                 aaa+++
1                            okay thanks  a1 super fast 
3                                          great service
4                    as described, shipped fast . thanks
5                     will add to my collectiobn..thanks
                             ...                        
622    very nice metal shapes and came well packed. t...
623              fast shipper, would purchase from again
624                                           h/recomend
625                                           excellent 
626      not the best quality but transaction was smooth
Name: Feedbacks, Length: 542, dtype: object

In [7]:
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

stop = stopwords.words('english')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\mevaa\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [8]:
df2 = df.str.lower().apply(lambda x: ' '.join(w for w in x.split() if not w in stop))
df2

0                                          aaa+++
1                       okay thanks a1 super fast
3                                   great service
4                described, shipped fast . thanks
5                         add collectiobn..thanks
                          ...                    
622    nice metal shapes came well packed. thanks
623                  fast shipper, would purchase
624                                    h/recomend
625                                     excellent
626               best quality transaction smooth
Name: Feedbacks, Length: 542, dtype: object

We can now extract the data into a CSV file.

In [9]:
df2.to_csv('eBay_Feedbacks.csv')