<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/IDSNlogo.png" width="300" alt="cognitiveclass.ai logo">
</center>


# Hands-on Lab: Working with different file formats

Estimated time: **40 mins**


# Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#Data-Engineering">Data Engineering</a>
2.  <a href="#Data-Engineering-Process">Data Engineering Process</a>
3.  <a href="#Working-with-different-file-formats">Working with different file formats</a>
4.  <a href="#Data-Analysis">Data Analysis</a>

</font>
</div>


# Data Engineering


**Data engineering** is one of the most critical and foundational skills in any data scientist’s toolkit.


# JSON file Format


**JSON (JavaScript Object Notation)** is a lightweight data-interchange format. It is easy for humans to read and write.

JSON is built on two structures:

1.  A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.

2.  An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. It is a very common data format with a diverse range of applications.


The text in JSON is done through quoted string which contains the values in key-value mappings within { }. It is similar to the dictionary in Python.


Python supports JSON through a built-in package called **json**. To use this feature, we import the json package in Python script.


In [16]:
import json

# Writing JSON to a File

This is usually called **serialization**. It is the process of converting an object into a special format which is suitable for transmitting over the network or storing in file or database.

To handle the data flow in a file, the JSON library in Python uses the **dump()** or **dumps()** function to convert the Python objects into their respective JSON object. This makes it easy to write data to files.


In [17]:
import json
person = {
    'first_name' : 'Mark',
    'last_name' : 'abc',
    'age' : 27,
    'address': {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
    }
}

#### serialization using dump() function

**json.dump()** method can be used for writing to JSON file.

Syntax: json.dump(dict, file_pointer)

Parameters:

1.  **dictionary** – name of the dictionary which should be converted to JSON object.
2.  **file pointer** – pointer of the file opened in write or append mode.


In [18]:
with open('person.json', 'w') as f:  # writing JSON object
    json.dump(person, f)

#### serialization using dumps() function

**json.dumps()** that helps in converting a dictionary to a JSON object.

It takes two parameters:

1.  **dictionary** – name of the dictionary which should be converted to JSON object.
2.  **indent** – defines the number of units for indentation


In [19]:
# Serializing json  
json_object = json.dumps(person, indent = 4) 
  
# Writing to sample.json 
with open("sample.json", "w") as outfile: 
    outfile.write(json_object) 

In [20]:
print(json_object)

{
    "first_name": "Mark",
    "last_name": "abc",
    "age": 27,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
    }
}


Our Python objects are now serialized to the file. For deserialize it back to the Python object, we use the load() function.


# Reading JSON to a File


This process is usually called **Deserialization** - it is the reverse of serialization. It converts the special format returned by the serialization back into a usable object.

### Using json.load()

The JSON package has json.load() function that loads the json content from a json file into a dictionary.

It takes one parameter:

**File pointer** : A file pointer that points to a JSON file.


In [21]:
import json 
  
# Opening JSON file 
with open('sample.json', 'r') as openfile: 
  
    # Reading from json file 
    json_object = json.load(openfile) 
  
print(json_object) 
print(type(json_object)) 

{'first_name': 'Mark', 'last_name': 'abc', 'age': 27, 'address': {'streetAddress': '21 2nd Street', 'city': 'New York', 'state': 'NY', 'postalCode': '10021-3100'}}
<class 'dict'>
