# On wine drinking preferences

In this article we want to have a look at present wine drinking preferences. **MORE**

The Data Journalism technical topics we will cover in this article notebook include:  

- How to retrieve data from a web API, the [Wine.com Developer API](https://api.wine.com/) in our case.
- How to work with JSON formatted data, including:
 - How to get it from a HTTP requestresult into Python data structures.
 - How to write into a text file.
 - How to read it back.
 - How to put it into a Pandas data frame.
- How to create an interactive map using our data that we can visualise in a modern web browser using [Bokeh](http://bokeh.pydata.org/en/latest/).

Hopefully you will find this article interesting but, overall, you will learn the techniques we use here in order to apply them to your own data journalism projects.

## Getting Wine.com API data

In this notebook we will use [Wine.com Developer API](https://api.wine.com/) in order to get a catalog of products we can later use for different analysis. We will use Python's library [Requests](http://www.python-requests.org/en/latest/) to retrieve the data in json format. Then we will store that data in a file for later use.

### Loading API key

First of all you need to sign up for a Wine.com developer account. Once you are registered, go to your Dashboard and copy your API key into a file called `apikey` that we can read using the following Python code.

In [1]:
apikey_f = open('apikey','r')
apikey = apikey_f.readline().replace('\n', ' ').replace('\r', '').replace(' ', '')

In [2]:
print apikey

45e24313426662fc6b8ab832d9140a16


### Making API requests

The goal of the Wine.com Developer API is to provide developers access to their extensive catalog of wine and wine related content in an open and easy to use manner. The API is built using [REST principles](https://en.wikipedia.org/wiki/Representational_state_transfer). You can retrieve content in either XML or JSON format. The best way to start is by having a look at [their documentation](https://api.wine.com/wiki) to read how the API works and the conditions of use.

From there we can learn that the base URL for any catalog query is as follows.

In [3]:
base_catalog_url = "http://services.wine.com/api/beta2/service.svc/json/catalog"

This base URL will be followed by a series of parameters and our API key in order to perform an actual query.

One of the best ways to query a web API is to use the Python library [Requests](http://www.python-requests.org/en/latest/). In the words of its developers *"Python’s standard [urllib2](https://docs.python.org/2/library/urllib2.html) module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks"*. Let's start by importing the library (it might need [installation](http://docs.python-requests.org/en/latest/user/install/)).

In [4]:
import requests

#### Getting the total number of wines in the catalog

The goal of our first query is to find out how many products does the catalog have in total. Since we are using Python [Requests](http://www.python-requests.org/en/latest/), the best way to prepare queries is by using the base URL with a Python dictionary of parameters. For example, the following dictionary will ask for zero products, but still the API will give as the total of products as part of the response.

In [5]:
zero_query_params = {
    'filter': 'categories(490)',
    'apikey': apikey,
    'size': 0,
    'offset': 0
}

Using Requests to [pass request parameters](http://www.python-requests.org/en/latest/user/quickstart/#passing-parameters-in-urls) is super easy. Just call `requests.get` passing the base URL and the previous dictionary. We call `json` on the result so we get the json result into a Python dictionary.

In [6]:
zero_request_json = requests.get(base_catalog_url, params=zero_query_params).json()

In [7]:
zero_request_json

{u'Products': {u'List': [], u'Offset': 0, u'Total': 85142, u'Url': u''},
 u'Status': {u'Messages': [], u'ReturnCode': 0}}

There we have an empty list of products and the total we are looking for.

In [8]:
total_wines = zero_request_json['Products']['Total']
total_wines

85142

#### Getting the actual products

We can proceed now to get actual products from the catalog. With a [Wine.com Developer account](https://api.wine.com/), we are limited to 1000 hits per day. Therefore we have to manage to get the list of products we want in just 1000 hits. We have more than 85K products in total. Then we need to get at least 86 products per hit if we want to get all of them. Let's define then a page size of 500 so we spend just 171 of our requests. Let's also wait 10 seconds between requests, so we don't overload the server.

In [9]:
# Don't make this too small. Be respectful!
inter_request_lapse = 10

# Total products to be requested
max_wines = total_wines # If you don't want all wines, use something smaller like 5000

# Max. products by request
page_size = 500

We are now ready to get our products using the Wine.com API as follows.

In [10]:
import time

offset = 0
wines_json = []

while (offset < max_wines):
    
    catalog_query_params = {
        'filter': 'categories(490)',
        'apikey': apikey,
        'size': page_size,
        'offset': offset
    }
    catalog_request_json = requests.get(base_catalog_url, params=catalog_query_params).json()
    wines_json.extend(catalog_request_json['Products']['List'])
    print "Read {} wines from Wine.com so far".format(len(wines_json))
    offset = offset + page_size
    time.sleep(inter_request_lapse)

Read 500 wines from Wine.com so far
Read 1000 wines from Wine.com so far
Read 1500 wines from Wine.com so far
Read 2000 wines from Wine.com so far
Read 2500 wines from Wine.com so far
Read 3000 wines from Wine.com so far
Read 3500 wines from Wine.com so far
Read 4000 wines from Wine.com so far
Read 4500 wines from Wine.com so far
Read 5000 wines from Wine.com so far
Read 5500 wines from Wine.com so far
Read 6000 wines from Wine.com so far
Read 6500 wines from Wine.com so far
Read 7000 wines from Wine.com so far
Read 7500 wines from Wine.com so far
Read 8000 wines from Wine.com so far
Read 8500 wines from Wine.com so far
Read 9000 wines from Wine.com so far
Read 9500 wines from Wine.com so far
Read 10000 wines from Wine.com so far
Read 10500 wines from Wine.com so far
Read 11000 wines from Wine.com so far
Read 11500 wines from Wine.com so far
Read 12000 wines from Wine.com so far
Read 12500 wines from Wine.com so far
Read 13000 wines from Wine.com so far
Read 13500 wines from Wine.com s

We ended up with a list of products, as they were given by the Wine.com Developer API. Let's check how many of them we have.

In [11]:
len(wines_json)

85142

### Writing JSON data into a file

One thing we want to do is to store the list of products in a text file so we can process the data without querying the Wine.com Developer API over and over again. We do this in Python as follows.

In [12]:
import json
with open("data_{}.json".format(max_wines), 'w') as outfile:
    json.dump(wines_json, outfile)

Let's read it back in order to check so we know how to do that later on when needed.

In [15]:
with open("data_{}.json".format(max_wines),'r') as inputfile:
    new_data = json.load(inputfile)

In [14]:
len(new_data)

50900

## Loading wine data into a Pandas data frame

We will need the following: 

- A data frame of appellations
- A data frame of wines
- A data frame of ratings

## Visualising wine preferences

Some charts visualising wine preferences. Candidates:
- A map
- Distribution of appellations
- Distribution of ratings

## Conclusions