In [48]:
import requests
import json

import pandas

from pandas.io.json import json_normalize

Defining a function to return our tsId for a given shop url by calling the trustedshops.com shops API using requests.get(). I lifted the request URL from the API documentation: https://api.trustedshops.com/documentation/public/#!/shops/getShopByURL. It took me a while to work out how to get a response in json format by adding <em>.json</em> to the end of the url path.

Also added some error handling in case our API call to fetch the tsId fails for any reason.

In [55]:
def get_tsid(shop_url):
    response = requests.get(f"https://api.trustedshops.com/rest/public/v2/shops.json?url={shop_url}")
    if response.status_code == 200:
        return response.json()["response"]["data"]["shops"][0]["tsId"]
    else: 
        print("Failed to fetch tsId, Error code "+ str(response.status_code))
        return None

get_tsid("www.zalando.de")

'X1C77CF6EE730D2E88A284D7203D1B20F'

Error handling in the case of a bad url:

In [56]:
get_tsid("www.zdalando.de")

Failed to fetch tsId, Error code 404


Defining a new function, <em>get_reviews</em> to return the response of a call to the trustedshops reviews API, calling our <em>get_tsid</em> function to retrive the tsId for a given shop url.

In [51]:
def get_reviews(shop_url):
    tsid = get_tsid(shop_url)
    if tsid:
        response = requests.get(f"https://api.trustedshops.com/rest/public/v2/shops/{tsid}/reviews.json")
        if response.status_code == 200:
            return response.json()["response"]["data"]["shop"]["reviews"]
        else: 
            print("Failed to fetch reviews, Error code "+ str(response.status_code))
            return None

get_reviews("www.zalando.de")

[{'changeDate': '2018-08-22T00:36:44+02:00',
  'comment': 'bin sehr zufrieden',
  'confirmationDate': '2018-08-22T00:36:44+02:00',
  'creationDate': '2018-08-22T00:36:44+02:00',
  'criteria': [{'mark': '5',
    'markDescription': 'EXCELLENT',
    'markDescriptionGUILanguage': 'EXCELLENT',
    'type': 'DELIVERY',
    'typeGUILanguage': 'DELIVERY'},
   {'mark': '5',
    'markDescription': 'EXCELLENT',
    'markDescriptionGUILanguage': 'EXCELLENT',
    'type': 'GOODS',
    'typeGUILanguage': 'GOODS'},
   {'mark': '5',
    'markDescription': 'EXCELLENT',
    'markDescriptionGUILanguage': 'EXCELLENT',
    'type': 'SERVICE',
    'typeGUILanguage': 'CUSTOMER SERVICE'}],
  'mark': '5.00',
  'markDescription': 'EXCELLENT',
  'markDescriptionGUILanguage': 'EXCELLENT',
  'UID': '73627b2c28ff91de1fc02d615e7c6ca3'},
 {'changeDate': '2018-08-22T00:23:53+02:00',
  'comment': 'super',
  'confirmationDate': '2018-08-22T00:23:54+02:00',
  'creationDate': '2018-08-22T00:23:54+02:00',
  'criteria': [{'mar

I chose to use pandas to handle the API response as it seemed to be the most straightforward method. I'd be keen to get your feedback on this decision. We are passing the results of our API call to <em>json_normalise</em> to flatten the json into a pandas dataframe. As an example, I decided to also flatten out the nested marking criteria data into a new dataframe, which we will later add as new columns to the <em>reviews</em> dataframe, making sure to include the UID metadata so that we can later merge the dataframes on this value.

In [52]:
r = get_reviews('www.zalando.de')

reviews = json_normalize(r)
review_criteria = json_normalize(r, record_path='criteria', meta='UID')

review_criteria

Unnamed: 0,mark,markDescription,markDescriptionGUILanguage,type,typeGUILanguage,UID
0,5,EXCELLENT,EXCELLENT,DELIVERY,DELIVERY,73627b2c28ff91de1fc02d615e7c6ca3
1,5,EXCELLENT,EXCELLENT,GOODS,GOODS,73627b2c28ff91de1fc02d615e7c6ca3
2,5,EXCELLENT,EXCELLENT,SERVICE,CUSTOMER SERVICE,73627b2c28ff91de1fc02d615e7c6ca3
3,5,EXCELLENT,EXCELLENT,DELIVERY,DELIVERY,bd7ec905435abb2f3e3835ede3be6d61
4,5,EXCELLENT,EXCELLENT,GOODS,GOODS,bd7ec905435abb2f3e3835ede3be6d61
5,5,EXCELLENT,EXCELLENT,SERVICE,CUSTOMER SERVICE,bd7ec905435abb2f3e3835ede3be6d61
6,5,EXCELLENT,EXCELLENT,DELIVERY,DELIVERY,621f5019061dc388f3cc3d0ab1fcecd5
7,5,EXCELLENT,EXCELLENT,GOODS,GOODS,621f5019061dc388f3cc3d0ab1fcecd5
8,5,EXCELLENT,EXCELLENT,SERVICE,CUSTOMER SERVICE,621f5019061dc388f3cc3d0ab1fcecd5
9,5,EXCELLENT,EXCELLENT,DELIVERY,DELIVERY,4553049da93d938557e301160e8ff444


To merge this into our parent dataframe, we define a function <em>get_criteria_with_type</em> to return a new filtered dataframe containing only results for a given marking criteria, with columns renamed accordingly. Then, looping through the three marking criteria (<em>DELIVERY, GOODS, SERVICE</em>) we merge these one-by-one into the parent dataframe on an inner join. This gives the desired flattened dataframe, from which we then select a number of desired columns.

In [53]:
def get_criteria_with_type(df, criteria_type):
    df = df[df['type'] == criteria_type][['mark', 'markDescription', 'UID']]
    df.columns = [f'{criteria_type}_Mark', f'{criteria_type}_MarkDescription', 'UID']
    return df

criteria_types = ['GOODS', 'DELIVERY', 'SERVICE']

for criteria_type in criteria_types:
    df = get_criteria_with_type(review_criteria, criteria_type)
    reviews = reviews.merge(df, how='inner', on='UID')
    
reviews_view = reviews[['creationDate', 'comment', 'mark', 'markDescription', 'GOODS_Mark', 'GOODS_MarkDescription', 'DELIVERY_Mark', 'DELIVERY_MarkDescription', 'SERVICE_Mark', 'SERVICE_MarkDescription']]
reviews_view

Unnamed: 0,creationDate,comment,mark,markDescription,GOODS_Mark,GOODS_MarkDescription,DELIVERY_Mark,DELIVERY_MarkDescription,SERVICE_Mark,SERVICE_MarkDescription
0,2018-08-22T00:36:44+02:00,bin sehr zufrieden,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
1,2018-08-22T00:23:54+02:00,super,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
2,2018-08-22T00:19:09+02:00,Top Leistung super schnell,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
3,2018-08-22T00:18:36+02:00,Alles bestens. Die Größen waren nicht immer ga...,4.67,EXCELLENT,4,GOOD,5,EXCELLENT,5,EXCELLENT
4,2018-08-22T00:05:02+02:00,Lieferung sehr schnell. Ware ist ok. Nicht gep...,4.33,GOOD,3,FAIR,5,EXCELLENT,5,EXCELLENT
5,2018-08-21T23:55:51+02:00,Schnelle Lieferung und die Ware so wie dargest...,4.0,GOOD,4,GOOD,4,GOOD,4,GOOD
6,2018-08-21T23:44:04+02:00,Top,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
7,2018-08-21T23:43:03+02:00,Alles perfekt.,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
8,2018-08-21T23:39:27+02:00,Supi,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT
9,2018-08-21T23:38:27+02:00,Top wie immer,5.0,EXCELLENT,5,EXCELLENT,5,EXCELLENT,5,EXCELLENT


Finally, exporting to csv.

In [54]:
reviews_view.to_csv('/Users/theoevans/Documents/reviews.csv')