## Multiple formats of structured data

1. Objects: 

Analogy: Dictionaries, or books (with table of contents, chapters, sections, indexes)
- With the help of book organization such as table of contents, sections and indexes, readers can navigates and manipulate the underlying data much easier. 
![image.png](attachment:image.png)

Pros:

Objects provide a structured and intuitive representation of data, allowing for easy access to properties and methods.
They support encapsulation, allowing data and functionality to be bundled together.
Objects can represent complex relationships and hierarchies, making them suitable for modeling real-world entities.
Object-oriented programming languages often provide built-in support for working with objects.

Cons:

Objects can be memory-intensive, especially when dealing with large datasets or complex object hierarchies.
Working with objects may require more code and overhead compared to other data formats.
Objects may not be directly compatible with certain data storage or transmission formats, requiring conversion or serialization.

Example data objects: DataFrames: 

DataFrames are a core data structure provided by the pandas library, commonly used for data manipulation and analysis. Here's an example of working with a DataFrame object:

In [1]:
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['John', 'Jane', 'Alice'],
    'Age': [30, 25, 35],
    'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)

# Accessing DataFrame columns
print(df['Name'])  # Output: 0     John, 1     Jane, 2    Alice

# Accessing DataFrame rows
print(df.loc[1])  # Output: Name     Jane, Age        25, City    London

# Modifying DataFrame data
df.loc[0, 'Age'] = 32
print(df)  # Output:    Name  Age      City, 0   John   32  New York, 1   Jane   25    London, 2  Alice   35     Paris

# Performing operations on DataFrame
average_age = df['Age'].mean()
print(average_age)  # Output: 30.666666666666668

0     John
1     Jane
2    Alice
Name: Name, dtype: object
Name      Jane
Age         25
City    London
Name: 1, dtype: object
    Name  Age      City
0   John   32  New York
1   Jane   25    London
2  Alice   35     Paris
30.666666666666668


2. Arrays

Analogy: Stack of papers. 
- Related documents are putting together sequentially. You can stack or unqueue any new pages easily.

![image-2.png](attachment:image-2.png)

Pros:

Arrays provide a simple and efficient way to store and access ordered collections of elements.
They have a fixed length, making them suitable for scenarios where the number of elements is known in advance.
Arrays can be easily iterated over and manipulated using standard looping constructs.
Many programming languages provide built-in array operations and methods.

Cons:

Arrays have a rigid structure, meaning they may not be suitable for representing complex relationships or hierarchical data.
Inserting or removing elements in the middle of an array can be inefficient, as it may require shifting other elements.
Array indices are typically numeric and sequential, which may limit their flexibility in certain scenarios.

In [None]:
# Creating an array
numbers = [1, 2, 3, 4, 5]

# Accessing array elements
print(numbers[0])  # Output: 1
print(numbers[2])  # Output: 3

# Modifying array elements
numbers[1] = 10
print(numbers)  # Output: [1, 10, 3, 4, 5]

# Iterating over an array
for number in numbers:
    print(number)

# Output:
# 1
# 10
# 3
# 4
# 5

3. JSON/XML:

Analogy: file-folder system
- While JSON/XML do not inherit any classes and having any methods, they are an organized way of storing and managing information. 
![image.png](attachment:image.png)

Pros:

JSON and XML are widely supported and accepted data interchange formats.
They are human-readable and self-describing, making them easy to understand and work with.
JSON and XML can represent complex data structures, including nested objects and arrays.
They are platform-independent and can be easily parsed and generated in most programming languages.

Cons:

JSON and XML can be more verbose compared to other binary formats, resulting in larger file sizes and increased network bandwidth usage.
Parsing and generating JSON/XML can be slower compared to binary formats due to their text-based nature.
JSON and XML may require additional parsing and validation steps to ensure data integrity and security.
Working with deeply nested structures in JSON and XML can be more complex and may require additional code.

In [1]:
# Creating an array
numbers = [1, 2, 3, 4, 5]

# Accessing array elements
print(numbers[0])  # Output: 1
print(numbers[2])  # Output: 3

# Modifying array elements
numbers[1] = 10
print(numbers)  # Output: [1, 10, 3, 4, 5]

# Iterating over an array
for number in numbers:
    print(number)

# Output:
# 1
# 10
# 3
# 4
# 5

1
3
[1, 10, 3, 4, 5]
1
10
3
4
5


## Serialization of JSON

What is deserialization and serialization?
- These two terms are frequently appears because in python we usually omit the intermediate step of handling arrays. 

![image-3.png](attachment:image-3.png)
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

- Deserialization combines the action of decode and denormalization; while serialization combines the action of encode and normalization. 

## Transforamtion between different data formats

More technical terms underneath the scene: 

![image.png](attachment:image.png)

The flow of transformation between data formats typically involves the following steps:

- With JSON / XML, we can decode them as array.
- With array, we can denormalize them to be object.
- Or in other words, we "deserialize" JSON or XML to be objects. 

What is decode of JSON / XML?
- Decoding refers to the process of converting the JSON or XML data into a structured format that can be easily manipulated programmatically. The decoded representation of JSON/XML is often in the form of an array.

In [12]:
import json

# JSON string
json_str = '''
[
  { "id": 1, "name": "John" },
  { "id": 2, "name": "Jane" },
  { "id": 3, "name": "Bob" }
]
'''

# Decode JSON into an array
array = json.loads(json_str)
print(type(json_str))
print(type(array))
print(array)

<class 'str'>
<class 'list'>
[{'id': 1, 'name': 'John'}, {'id': 2, 'name': 'Jane'}, {'id': 3, 'name': 'Bob'}]


What is denormalization of arrays?
- Denormalization involves restructuring the array data by merging related elements into a single object. The denormalized object can provide a more intuitive and convenient representation of the data, particularly when dealing with complex relationships or nested structures.

In [14]:
# denormalization of arrays

array = [
  { 'id': 1, 'name': 'John' },
  { 'id': 2, 'name': 'Jane' },
  { 'id': 3, 'name': 'Bob' }
]

# Denormalizing the array into a dictionary
denormalized_dict = {}
for item in array:
    denormalized_dict[item['id']] = item

print(type(array))
print(type(denormalized_dict))
print(denormalized_dict)

<class 'list'>
<class 'dict'>
{1: {'id': 1, 'name': 'John'}, 2: {'id': 2, 'name': 'Jane'}, 3: {'id': 3, 'name': 'Bob'}}


What is normalization of object?
- Normalization is the opposite process of denormalization. It involves breaking down complex or nested objects into simpler components, often splitting them into multiple tables or entities, to reduce redundancy and improve data integrity. 

In [13]:
# Dictionary
denormalized_dict = {
    1: {'id': 1, 'name': 'John'},
    2: {'id': 2, 'name': 'Jane'},
    3: {'id': 3, 'name': 'Bob'}
}

# Normalize the dictionary into an array
normalized_array = []
for key, value in denormalized_dict.items():
    normalized_array.append(value)

print(type(denormalized_dict))
print(type(normalized_array))
print(normalized_array)

<class 'dict'>
<class 'list'>
[{'id': 1, 'name': 'John'}, {'id': 2, 'name': 'Jane'}, {'id': 3, 'name': 'Bob'}]


What is encode of array?
- Encoding refers to the process of converting an array or object into a serialized format that can be stored or transmitted. When working with arrays, encoding typically involves converting the array elements into a string representation, often using a specific format or encoding scheme. This encoded representation can be later decoded to retrieve the original array.

In [15]:
import json

# The array to encode
array = [{"id": 1, "name": "John"}, {"id": 2, "name": "Jane"}, {"id": 3, "name": "Bob"}]

# Convert the array to JSON string
json_str = json.dumps(array)

print(type(array))
print(type(json_str))
print(json_str)

<class 'list'>
<class 'str'>
[{"id": 1, "name": "John"}, {"id": 2, "name": "Jane"}, {"id": 3, "name": "Bob"}]


## Serialization of python 

(De)serialization skips the step of explicitly turning data into array. 
In this example, python tries to pack data into JSON before transmitting data to other device. 
`json.dumps(data)` serializing python dictionary / list of dictionaries into strings. Strings will be sent as bytes and transmitted to other device. 


In [None]:
import json

def serialize_to_json(data):
    serialized_data = json.dumps(data)
    return serialized_data

# Example usage
python_data = {
    "name": "John Doe",
    "age": 30,
    "city": "New York"
}

serialized_data = serialize_to_json(python_data)
print(serialized_data)