#### Working with JSON in Python: A Step-by-Step Guide for Beginners
https://www.datacamp.com/tutorial/json-data-python

Objectives:

- Understand JSON and its pros and cons
- Identify use cases and alternatives to JSON
- Perform JSON serialization and deserialization in Python
- Work with JSON data in Python
- Format JSON data using the `json` library
- Optimize JSON performance in Python

## What is JSON?

JSON (JavaScript Object Notation) is a lightweight, language-independent data interchange format. It's ideal for exchanging data between web-based applications due to its simplicity, efficiency, and support for complex data structures.
It's widely used in web development, APIs, and client-side web applications.



#### Extra Resources:
https://www.w3schools.com/python/python_json.asp
https://www.datacamp.com/tutorial/json-data-python
alt: https://realpython.com/python-json/
https://www.realpythonproject.com/a-cheat-sheet-for-working-with-json-data-in-python/

Valerie's Playlist:
https://bit.ly/3DZlaaY


PCC V3: page 201-207
\\nas\Media\Documents [V]\_CIS Classes\DataEngineering\Python Resources\Python Crash Course - Eric Matthes - 3e files-vlb\chapter_10\storing_data

numberwriter.py
numberreader.py
rememberme.py
greetuser.py


https://www.techiedelight.com/json-introduction/  - Good!
https://techiedelight.com/tools/json


Really Good:
https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON


### JSON Example Below

•  In this example, we have a JSON object representing a person with various properties like name, age, hobbies, and address.  
•  It defines an object with several key-value pairs.  
•  The keys are (MUST BE) strings and the values can be of various types, including strings, numbers, booleans, arrays, and nested objects.  
•  In this particular example, the object represents a person named John Doe with an age of 30, an email address, and a boolean value indicating whether or not they are an employee.  
•  The hobbies property is an array [list] that contains three strings.   
•  The address property is an object with several properties consisting of key-value pairs defining their street, city, state, and zip code.  

Note that JSON data is typically formatted as a series of key-value pairs and are formatted exactly like a Python dictoinary, with the key represented as a string and the value represented in various types such as string, number, boolean, array, or object.

### The KEY difference (see what i did there?) is that dictionary keys can be any hashable data type -
(In the context of dictionaries in programming, the term "hashable" refers to an object that has a hash value that remains constant throughout its lifetime. A hash value is a numerical representation generated from the object's data, and it's used by dictionaries and other data structures to quickly look up and access values associated with specific keys.

Hashable keys are crucial for dictionary performance. When you use a hashable object as a key in a dictionary, the dictionary can calculate the hash value of the key and use it to efficiently find the corresponding value without iterating through the entire dictionary.)

### Quote Usage Json vs dictionary:
While JSON enforces the use of double quotes for keys and values, Python dictionaries provide more flexibility by allowing both single and double quotes.


```json example
{
  "name": "John Doe",
  "age": 30,
  "email": "john.doe@example.com",
  "is_employee": true,
  "hobbies": [
    "reading",
    "playing soccer",
    "traveling"
  ],
  "address": {
    "street": "123 Main Street",
    "city": "New York",
    "state": "NY",
    "zip": "10001"
  }
}
```json

## Advantages and Disadvantages of Using JSON
### Pros of JSON

- Lightweight and easy to read
- Interoperable between different systems
- Easy to validate against a schema

### Cons of JSON

- Limited support for complex data structures (like graphs or trees)
- No schema enforcement - which means that it is possible to store inconsistent or invalid data in a JSON file.
- Limited query and indexing capabilities

## Alternatives to JSON

- XML (Extensible Markup Language) XML is a markup language that uses tags to define elements and attributes to describe the data. It is a more verbose format than JSON, but it has strong support for schema validation and document structure.
- YAML (Yet Another Markup Language) YAML is a human-readable data serialization format that is designed to be easy to read and write. It is a more concise format than XML and has support for complex data types and comments.
- MessagePack (binary serialization)
- Protocol Buffers (binary serialization)
- BSON (Binary JSON, used in MongoDB)

## Python Libraries for Working with JSON

- `json`: Built-in package for encoding and decoding JSON data
- `simplejson`: Fast JSON encoder and decoder
- `ujson`: Ultra-fast JSON encoder and decoder
- `jsonschema`: Validate JSON data against a schema



Python and JSON have different data types, with Python offering a broader range of data types than JSON. While Python is capable of storing intricate data structures such as sets and dictionaries, JSON is limited to handling strings, numbers, booleans, arrays, and objects. Let’s look at some of the differences:

| Python Data Type | Comparable JSON Data Type |
|------------------|--------------------------|
| int              | Number                   |
| float            | Number                   |
| str              | String                   |
| list             | Array                    |
| tuple            | Array                    |
| dict             | Object                   |
| True             | true                     |
| False            | false                    |
| None             | null                     |




## JSON Serialization and Deserialization
## Serializing JSON
Serialization is the process of converting an object or data structure into a JSON string. This process is necessary in order to transmit or store the data in a format that can be read by other systems or programs. JSON serialization is a common technique used in web development, where data is often transmitted between different systems or applications.

## Deserializing JSON
Deserialization, on the other hand, is the process of converting a JSON string back into an object or data structure. This process is necessary to use the data in a program or system. JSON deserialization is often used in web development to parse data received from an API or other source.

In [None]:
""" 1. json.dumps(python_obj) Serialization = Convert Python Object to JSON String
The dumps() function takes a single argument, the Python object, and returns a JSON string. Here's an example: """

import json

python_obj = {'name': 'John', 'age': 30}


""" The Python object python_obj in the given code is a dictionary. In Python, a dictionary is an unordered collection of key-value pairs, where each key is unique and associated with a value. Dictionaries are also sometimes referred to as "dicts."

the dictionary python_obj has two key-value pairs:

Key: 'name', Value: 'John'
Key: 'age', Value: 30
 """
# We can access the values stored in the dictionary using their corresponding keys. For instance:

name = python_obj['name']  # Retrieves the value 'John'
age = python_obj['age']    # Retrieves the value 30


# Dictionaries are commonly used for storing and retrieving data using meaningful keys, making it easy to organize and access information.

json_string = json.dumps(python_obj)

print(json_string)  # Output: {"name": "John", "age": 30}
print(type(json_string))


print(python_obj)
print(type(python_obj))


{"name": "John", "age": 30}
<class 'str'>
{'name': 'John', 'age': 30}
<class 'dict'>


In [None]:
# Dictionaries are commonly used for storing and retrieving data using meaningful keys, making it easy to organize and access information.

json_string = json.dumps(python_obj)
print(json_string)  # Output: {"name": "John", "age": 30}
print(type(json_string))
print(type(python_obj))


In [None]:
""" 2. json.loads() Deserialization = parse a JSON string into a Python object.
The loads() function takes a single argument, the JSON string, and returns a Python object. Here's an example:
 """

import json

json_string = '{"name": "John", "age": 30}'
python_obj = json.loads(json_string)
print(python_obj)   # output: {'name': 'John', 'age': 30}
print(type(python_obj))


{'name': 'John', 'age': 30}
<class 'dict'>


Explanation
json.loads() uses the Python programming language and the built-in json module to convert a JSON string into a Python object.
• First, a JSON string is defined and assigned to the variable json_string.
• The json.loads() function is used to convert the JSON string into a Python object.
• This function takes a JSON string as input and returns a Python object.
• In this case, the Python object is assigned to the variable python_obj.
• Finally, the print() function is used to display the Python object on the console.
• The output of this code is a dictionary with two key-value pairs: {'name': 'John', 'age': 30}.

In [None]:
""" 3. json.dump() Serialization = convert python object and write it to a JSON file.
The dump() function takes two arguments, the Python object and the file object. Here's an example:
 """
import json

# serialize Python object and write to JSON file
python_obj = {'name': 'John', 'age': 30}
with open('data.json', 'w') as file:
    json.dump(python_obj, file)



Explanation   
json.dump() from json module, which provides methods for working with JSON data.
• It then creates a Python dictionary object called python_obj with two key-value pairs: 'name': 'John' and 'age': 30.
• The with statement is used to open a file called data.json in write mode ('w').
• The json.dump() method is then used to serialize the python_obj dictionary object into JSON format and write it to the data.json file.
• In other words, this code converts a Python dictionary object into a JSON string and saves it to a file.


In [None]:
""" 4. json.load() Deserialization = read from a JSON file and parse its contents into a Python Object.
The load() function takes a single argument, the file object, and returns a Python object. Here's an example:
 """

import json

# read JSON file and parse contents
with open('data.json', 'r') as file:
    python_obj = json.load(file)
print(python_obj)  # output: {'name': 'John', 'age': 30}



{'name': 'John', 'age': 30}


In [None]:
Explanation
read a JSON file named "data.json" and parse its contents.
• First, the code imports the json module, which provides methods for working with JSON data.
• Next, the code uses a with statement to open the "data.json" file in read mode ('r') and assigns the resulting file object to the variable file.
• Within the with block, the json.load() method is called with the file object as its argument.
• This method reads the contents of the file and converts it into a Python object, which is then assigned to the variable python_obj.
• Finally, the code prints the value of python_obj, which should be a dictionary with the keys "name" and "age" and their corresponding values.


In [None]:
""" json.dumps() Python list to JSON JSON-formatted string, which we store in the json_string variable.
In this example, we defined a list called my_list with a mix of integers and strings.
"""

import json

my_list = [1, 2, 3, "four", "five"]
json_string = json.dumps(my_list)
print(json_string)

Explanation
json.dumps() convert a Python list into a JSON string.
• First, the json module is imported.
• The json.dumps() function is used to convert the my_list into a JSON string which is assigned to the variable json_string.

### Formatting JSON Data - Indentation and Sorting JSON output

In [None]:
""" 1. Indent
This option specifies the number of spaces to use for indentation in the output JSON string. For example:
 """

import json

data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, indent=2)
print(json_data)
# This will produce a JSON formatted string with an indentation of 2 spaces for each level of nesting

In [None]:
Explanation
• A dictionary data is defined with three key-value pairs.
• The json.dumps() method is then used to convert the data dictionary into a JSON formatted string.
• The indent parameter is set to 2, which adds 2 spaces of indentation for each level of nesting in the JSON data.
• Finally, the JSON formatted string is printed to the console using the print() function.

In [None]:
""" 2. Sort_keys
This option specifies whether the keys in the output JSON string should be sorted in alphabetical order. For example:

 """
import json

data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, sort_keys=True)
print(json_data)

 # This will produce a JSON formatted string with the keys in alphabetical order:
#{"age": 30, "city": "New York", "name": "John"}


In [None]:
Explanation
This code creates a Python dictionary with three key-value pairs.
• The keys are "age", "city", and "name", and their corresponding values are 30, "New York", and "John", respectively.
• The curly braces {} denote the creation of a dictionary, and the colons : separate the keys from their values.
• The keys and values are separated by commas.
• The json.dumps() function is then used to convert the data dictionary into a JSON-formatted string.
• The sort_keys=True argument sorts the keys in the resulting JSON string alphabetically.
• Finally, the JSON string is printed to the console using the print() function.


In [None]:
""" 3. Separators
This option allows you to specify the separators used in the output JSON string. The separators parameter takes a tuple of two strings, where the first string is the separator between JSON object key-value pairs, and the second string is the separator between items in JSON arrays. For example:
 """

import json

data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, separators=(",", ":"))
print(json_data)

# This will produce a JSON formatted string with a comma separator between key-value pairs and a colon separator between keys and values:
#{"name":"John","age":30,"city":"New York"}


Explanation
• a Python dictionary data is defined with three key-value pairs representing a person's name, age, and city.
• The json.dumps() function is then called with data as the argument, which converts the dictionary to a JSON formatted string.
• The separators parameter is also specified with a comma separator between key-value pairs and a colon separator between keys and values.
• Finally, the resulting JSON string is printed to the console using the print() function.

In [None]:
""" Python Example - JSON data in APIs
More Info: https://www.datacamp.com/tutorial/making-http-requests-in-python that has a datacamp workspace
 """
import requests
import json


url = "https://jsonplaceholder.typicode.com/posts"


response = requests.get(url)


if response.status_code == 200:
    data = json.loads(response.text)
    print(data)
else:
    print(f"Error retrieving data, status code: {response.status_code}")



Explanation  
This code uses the requests library and the json library in Python to make a request to the URL "https://jsonplaceholder.typicode.com/posts" and retrieve data.
 The requests.get(url) line makes the actual request and stores the response in the response variable.
• If the response status code is 200 (which means the request was successful), the code loads the response.text (formatted as JSON) using the json library and into a Python dictionary using the json.loads() method and stores it in the data variable.

json.loads() - used to parse JSON-formatted text into a Python data structure. It transforms the JSON data into a dictionary, list, or other appropriate Python data type, depending on the JSON content.
• Overall, this code retrieves data from a JSON API and prints it to the console.




Optimizing JSON Performance in Python
When working with large amounts of JSON data in Python, optimizing the performance of your code is important to ensure that it runs efficiently. Here are some tips for optimizing JSON performance in Python:

Use the cjson or ujson libraries. These libraries are faster than the standard JSON library in Python and can significantly improve the performance of JSON serialization and deserialization.
Avoid unnecessary conversions. Converting back and forth between Python objects and JSON data can be expensive in terms of performance. If possible, try to work directly with JSON data and avoid unnecessary conversions.
Use generators for large JSON data. When working with large amounts of JSON data, using generators can help reduce memory usage and improve performance.
Minimize network overhead. When transmitting JSON data over a network, minimizing the amount of data transferred can improve performance. Use compression techniques such as gzip to reduce the size of JSON data before transmitting it over a network.
Use caching. If you frequently access the same JSON data, caching the data can improve performance by reducing the number of requests to load the data.
Optimize data structure: The structure of the JSON data can also impact performance. Using a simpler, flatter data structure can improve performance over a complex, nested structure.
Limitations of JSON format
While JSON is a popular format for data exchange in many applications, there are some implementation limitations to be aware of:

Lack of support for some data types. JSON has limited support for certain data types, such as binary data, dates, and times. While there are workarounds to represent these types in JSON, it can make serialization and deserialization more complicated.
Lack of support for comments. Unlike other formats, such as YAML and XML, JSON does not support comments. This can make it harder to add comments to JSON data to provide context or documentation.
Limited flexibility for extensions. While JSON does support extensions through custom properties or the $schema property, the format does not provide as much flexibility for extensions as other formats, such as XML or YAML.
No standard for preserving key order. JSON does not have a standard way of preserving the order of keys in an object, making it harder to compare or merge JSON objects.
Limited support for circular references. JSON has limited support for circular references, where an object refers back to itself. This can make it harder to represent some data structures in JSON.
It's important to be aware of these implementation limitations when working with JSON data to ensure that the format is appropriate for your needs and to avoid potential issues with serialization, deserialization, and data representation.

Conclusion
JSON is a versatile and widely used format for data exchange in modern web development, and Python provides a powerful set of tools for working with JSON data. Whether you are building an API or working with client-side web applications, understanding the basics of JSON in Python is an essential skill for any modern developer. By mastering the techniques outlined in this tutorial, you will be well on your way to working with JSON data in Python and building robust, scalable applications that leverage the power of this powerful data interchange format.

Later, we willlearn how to build pipelines to import data kept in common storage formats, check out our Streamlined Data Ingestion with pandas course. You’ll use pandas, a major Python library for analytics, to get data from a variety of sources, including a spreadsheet of survey responses, a database of public service requests, and an API for a popular review site.