# JSON

## Wrangling JSON data with Python

One way of creating a JSON object in Python is to write a JSON object in a string:

In [11]:
json_string = '{"Name": "Dimitra","Surname": "Gkatzia", "places_lived": ["Greece", "UK"],"modules_taught": [{"name":"Data Wrangling", "level":"MSc"},{"name": "Introduction to Human Computer Interaction", "level":"UG"}]}'

In [12]:
type(json_string)

str

You might have noticed that arrays in look a lot like Python lists, and objects look like Python dictonaries. Python makes it easy to convert a JSON string into a Python object that's a mixture of arrays and dictionaries. Now you can work with using everything you learnt about slicing in unit 1. 

"json_string" is just a string. We can convert it to a dictionary using the following

In [6]:
import json
python_string = json.loads(json_string)

In [10]:
type(python_string)

dict

If your JSON is in the right shape we can convert it into a data-frame.

In [16]:
import pandas
modules_taught = pandas.DataFrame(python_string['modules_taught'], columns=['name','level'])
modules_taught

Unnamed: 0,name,level
0,Data Wrangling,MSc
1,Introduction to Human Computer Interaction,UG


We can go backwards and convert a dictionary or an array into JSON as follows:

In [14]:
asjson = json.dumps(python_string)
asjson

'{"Name": "Dimitra", "Surname": "Gkatzia", "places_lived": ["Greece", "UK"], "modules_taught": [{"name": "Data Wrangling", "level": "MSc"}, {"name": "Introduction to Human Computer Interaction", "level": "UG"}]}'

In [15]:
type(asjson)

str

### I/O with JSON
It is more probable that you'll need to read a JSON file rather than creating a JSON object from your program. In order to read a JSON file: 

In [20]:
 with open('json_file.json') as data_file:
    data = json.load(data_file)

When working with JSON there is a useful library called pprint, which prints out JSON objects in Python nicely:

In [21]:
from pprint import pprint
pprint(data)

{'colors': [{'category': 'hue',
             'code': {'hex': '#000', 'rgba': [255, 255, 255, 1]},
             'color': 'black',
             'type': 'primary'},
            {'category': 'value',
             'code': {'hex': '#FFF', 'rgba': [0, 0, 0, 1]},
             'color': 'white'},
            {'category': 'hue',
             'code': {'hex': '#FF0', 'rgba': [255, 0, 0, 1]},
             'color': 'red',
             'type': 'primary'},
            {'category': 'hue',
             'code': {'hex': '#00F', 'rgba': [0, 0, 255, 1]},
             'color': 'blue',
             'type': 'primary'},
            {'category': 'hue',
             'code': {'hex': '#FF0', 'rgba': [255, 255, 0, 1]},
             'color': 'yellow',
             'type': 'primary'},
            {'category': 'hue',
             'code': {'hex': '#0F0', 'rgba': [0, 255, 0, 1]},
             'color': 'green',
             'type': 'secondary'}]}


And then you can simply turn the Python data structure into a JSON string with the way we saw earlier: 

In [23]:
json.dumps(data)

'{"colors": [{"color": "black", "category": "hue", "type": "primary", "code": {"rgba": [255, 255, 255, 1], "hex": "#000"}}, {"color": "white", "category": "value", "code": {"rgba": [0, 0, 0, 1], "hex": "#FFF"}}, {"color": "red", "category": "hue", "type": "primary", "code": {"rgba": [255, 0, 0, 1], "hex": "#FF0"}}, {"color": "blue", "category": "hue", "type": "primary", "code": {"rgba": [0, 0, 255, 1], "hex": "#00F"}}, {"color": "yellow", "category": "hue", "type": "primary", "code": {"rgba": [255, 255, 0, 1], "hex": "#FF0"}}, {"color": "green", "category": "hue", "type": "secondary", "code": {"rgba": [0, 255, 0, 1], "hex": "#0F0"}}]}'

Finally, you can simply store a JSON object to a file:

In [24]:
with open('json_file2.json', 'w') as outfile:
    json.dump(data, outfile)

# Web Scraping

## Example: Basic Web Scraping

In [37]:
from bs4 import BeautifulSoup
import requests

#specify the url where the product is
url = 'http://www.newlook.com/uk/womens/clothing/dresses/black-floral-print-soft-touch-dress/p/563723709?comp=Browse'

#query the website and return the html 
page = requests.get(url)

#parse the html using the Beautiful Soup
soup = BeautifulSoup(page.content, 'html.parser')

#get the price
price_box = soup.find('span', attrs={'class':'price'})
price = price_box.text.strip()
price

'£7.00'

In [38]:
# Get the name
name_box = soup.find('h1', attrs={'class':'product-description__name'})
name = name_box.text.strip()
name

'Black Floral Print Soft Touch Skater Dress'

## Example: Saving a CSV file

In [43]:
import requests
import csv
from bs4 import BeautifulSoup
from datetime import datetime

# specify the url of the page 
target_page = 'http://www.newlook.com/uk/womens/clothing/dresses/black-floral-print-soft-touch-dress/p/563723709?comp=Browse'

page = requests.get(target_page)

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page.content, 'html.parser')

#get name
name_box = soup.find('h1', attrs={'class': 'product-description__name'})
name = name_box.text.strip()

#get price
price_box = soup.find('span', attrs={'class':'price'})
price = price_box.text.strip()

with open('price.csv', 'a') as csv_file:
  writer = csv.writer(csv_file)
  writer.writerow([name, price, datetime.now()])

## Example: Looping

In [47]:
import requests
import csv
from datetime import datetime
from bs4 import BeautifulSoup

# specify the url of the page a
target_page = ['http://www.newlook.com/uk/womens/clothing/dresses/black-floral-print-soft-touch-dress/p/563723709?comp=Browse', 
'http://www.newlook.com/uk/mens/clothing/jackets-coats/navy-puffer-jacket/p/563683041?comp=Browse']

#Then we change the data extraction code into a for loop, which will process the URLs one by one and store all 
#the data into a variable data in tuples.
data = []
for pg in target_page:
    page = requests.get(pg)

    # parse the html using beautiful soup and store in variable `soup`
    soup = BeautifulSoup(page.content, 'html.parser')

    #get name
    name_box = soup.find('h1', attrs={'class': 'product-description__name'})
    name = name_box.text.strip()

    #get price
    price_box = soup.find('span', attrs={'class':'price'})
    price = price_box.text.strip()

    #save the data
    data.append((name,price))

with open('price.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)
    for name, price in data:
        writer.writerow([name,price,datetime.now()])