In [8]:
# make sure the working directory is set to the right path
setwd('C:/Users/zech011/LOCAL-Repos/Intro-To-DS/assignments/01-GabrieldaSilvaZech-JSON-jsonlite')

https://hendrikvanb.gitlab.io/2018/07/nested_data-json_to_tibble/

https://www.rdocumentation.org/packages/jsonlite/versions/1.7.2

https://robotwealth.com/how-to-wrangle-json-data-in-r-with-jsonlite-purr-and-dplyr/

In [4]:
# Load necessary packages
library(jsonlite)

## What's in a JSON file anyways?

In [27]:
# inspect existing files in folder "data/"
list.files("data/")

In [31]:
# print contents of the JSON file
cat(paste0(readLines("data/students_data.json", warn=FALSE), collapse="\n"))

{
    "students": [
        { 
            "id":"014789", 
            "name": "Francesco", 
            "lastname": "Danovi"
        }, 
        { 
            "id":"023657", 
            "name": "Gabriel", 
            "lastname": "da Silva Zech" 
        }
    ],
    
    "teachers": ["Simon Munzer", "Lisa Oswald"],
    
    "ages_students_teachers": [[23, 24],
                               [27, 33]],
    
    "university": {
        "name": "Hertie School of Governance",
        "location": "Berlin",
        "coordinates": [52.51297066351335, 13.389164312650935]
    }
}

## Reading a JSON file with `fromJSON()`

In [33]:
# load json file
json_data <- fromJSON("data/students_data.json")
json_data

id,firstname,lastname
14789,Francesco,Danovi
23657,Gabriel,da Silva Zech

0,1
23,24
27,33


## Why the different data types? It's a *simple* answer

JSON data usually gets converted into **lists** when it is read by `fromJSON()`.

However, **this is not always the case**. Depending on how the JSON data is structured, `fromJSON()` automatically converts the array to other R classes.

This process where JSON arrays automatically get converted from a list into a more specific R class is called **simplification** (get the joke?).

There are **3 types** of JSON arrays that get converted to different R classes by default: 
* arrays of **primitives**
* arrays of **objects**
* arrays of **arrays**

Let's take a closer look at these.

### R `vectors` are created from JSON arrays of primitives (strings, numbers, booleans or null)

The `teachers` element in the JSON file is an **array of primitives**. "Primitives" are either `strings`, `numbers`, `booleans` or `null` values (they are named so because they are the simplest elements available in a programming language).

> "teachers": ["Simon Munzer", "Lisa Oswald"]

Why? It has multiple `strings` separated by commas `,` within blockquotes `[]`. 

Primitives get converted to a R `vector` by default. See again:

In [34]:
json_data["teachers"] # output is a R vector

### JSON array of objects (key-value pairs) -> R data frame

In [4]:
# when the JSON data is in the form of an array of objects...
json_objects <- '[
                {"name": "Francesco", 
                 "age": 23}, 
                {"name": "Gabriel", 
                 "age": 25}
              ]'

# ...it gets converted into a data frame
fromJSON(json_objects)

name,age
Francesco,23
Gabriel,25


### JSON array of arrays (equal-length sub-arrays) -> R matrix

In [71]:
# when the JSON data is in the form of an array of arrays...
json_arrays <- '[
                [23, 25, 27],
                [30, 35, 37],
                [40, 43, 48]
              ]'

# ...it gets converted into a matrix
fromJSON(json_arrays)

0,1,2
23,25,27
30,35,37
40,43,48


## How to convert JSON elements strictly to lists

It is possible, however, to disable this automatic simplification. This can be done by passing `simplifyVector = FALSE` to the `fromJSON()` function.

This will make sure all values are returned as **lists**.

In [76]:
fromJSON(json_primitives, simplifyVector = FALSE)
fromJSON(json_objects, simplifyVector = FALSE)
fromJSON(json_arrays, simplifyVector = FALSE)

In [7]:
getwd()

In [9]:
# load a json file named "students.json"
json_data <- fromJSON("data/students.json")
json_data
class(json_data)

id,name,lastname
1,Francesco,Danovi
2,Gabriel,da Silva Zech


In [17]:
# load a json file named "students.json"
json_data <- fromJSON("data/students_universities.json")
#json_data <- fromJSON("data/students_universities.json", simplifyVector = FALSE, flatten = FALSE)

json_data
class(json_data)

id,name,lastname
1,Francesco,Danovi
2,Gabriel,da Silva Zech


In [None]:
# this is how the contents of json file looks like 
{
    "student": [
        { 
            "id":"01", 
            "name": "Francesco", 
            "lastname": "Danovi"
        }, 
        { 
            "id":"02", 
            "name": "Gabriel", 
            "lastname": "da Silva Zech" 
        }
    ],
    "university": {
        "name": "Hertie School of Governance",
        "location": "Berlin",
        "coordinates": [52.51297066351335, 13.389164312650935]
    }
}

In [11]:
# acquired from https://schema.org/JobPosting
job_posting <- 
{
    "@context": "https://schema.org",
    "@type": "JobPosting",
    "baseSalary": "100000",
    "jobBenefits": "Medical, Life, Dental",
    "datePosted": "2011-10-31",
    "description": "Description: ABC Company Inc. seeks a full-time mid-level software engineer to develop in-house tools.",
    "educationRequirements": "Bachelor's Degree in Computer Science, Information Systems or related fields of study.",
    "employmentType": "Full-time",
    "experienceRequirements": "Minumum 3 years experience as a software engineer",
    "incentiveCompensation": "Performance-based annual bonus plan, project-completion bonuses",
    "industry": "Computer Software",
    "jobLocation": {
    "@type": "Place",
    "address": {
        "@type": "PostalAddress",
        "addressLocality": "Kirkland",
        "addressRegion": "WA",
        "addressCoordinates": [25.1212, 55.1535]
    }
    },
    "occupationalCategory": "15-1132.00 Software Developers, Application",
    "qualifications": "Ability to work in a team environment with members of varying skill levels. Highly motivated. Learns quickly.",
    "responsibilities": "Design and write specifications for tools for in-house customers Build tools according to specifications",
    "salaryCurrency": "USD",
    "skills": "Web application development using Java/J2EE Web application development using Python or familiarity with dynamic programming languages",
    "specialCommitments": "VeteranCommit",
    "title": "Software Engineer",
    "workHours": "40 hours per week"
    }


ERROR: Error in parse(text = x, srcfile = src): <text>:4:35: Unerwartete(s) ','
3: {
4:   "@context": "https://schema.org",
                                     ^


In [12]:
{
  "student": [ 
     { 
        "id":"01", 
        "name": "Tom", 
        "lastname": "Price" 
     }, 
     { 
        "id":"02", 
        "name": "Nick", 
        "lastname": "Thameson" 
     } 
  ]   
}

# XML Example
<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <student>
        <id>01</id>
        <name>Tom</name>
        <lastname>Price</lastname>
    </student>
    <student>
        <id>02</id>
        <name>Nick</name>
        <lastname>Thameson</lastname>
    </student>
</root>

ERROR: Error in parse(text = x, srcfile = src): <text>:2:14: Unerwartete(s) '['
1: {
2:   "student": [
                ^
