# CSV, JSON, XML

## CSV

Tasks: load, save, change delimiter, header, encoding

**Creating a CSV file:**

In [34]:
%%writefile example.csv
Name,Surname,Age
James,Smith,35
Amelia,Rose,26
Jacob,Black,42

Overwriting example.csv


**Loading a CSV:**

In [35]:
import csv
data = open('example.csv', encoding="utf-8")
csv_data = csv.reader(data)
data_lines = list(csv_data)

for line in data_lines:
    print(line)

['Name', 'Surname', 'Age']
['James', 'Smith', '35']
['Amelia', 'Rose', '26']
['Jacob', 'Black', '42']


In [36]:
len(data_lines)

4

**Loading using Pandas:**

In [None]:
import pandas as pd
file = 'dataset/job_postings.csv'
data = pd.read_csv(file, delimiter=',')

**Extracting values from CSV:**

In [37]:
# Extracting names from csv:
all_names = []
for line in data_lines[1:]:
    all_names.append(line[0])
    
all_names

['James', 'Amelia', 'Jacob']

In [39]:
# Extracting full names from csv:
all_full_names = []
for line in data_lines[1:]:
    all_full_names.append(f"{line[0]} {line[1]}")
    
all_full_names

['James Smith', 'Amelia Rose', 'Jacob Black']

**Saving values to CSV:**

In [80]:
file_to_output = open('example_02.csv', 'w', newline='')
# newline controls how universal newlines works (it only applies to text
# mode). It can be None, '', '\n', '\r', and '\r\n'.

csv_writer = csv.writer(file_to_output, delimiter=',')
csv_writer.writerow(['Name','Surname','Age'])
csv_writer.writerows([['Jacob','Smith','25'],['Lily','McNeil','36']])

file_to_output.close()

In [81]:
with open('example_02.csv', mode='r') as file:
    print(file.read())

Name,Surname,Age
Jacob,Smith,25
Lily,McNeil,36



**Saving values to CSV using Pandas:**

In [None]:
import pandas as pd

pd.to_csv('output.csv', index=False)  # Set index=False to exclude row indices from the CSV file

**Changing a delimiter:**

!! Best as a **delimiter** is to use: **`**

In [93]:
inputFileName = 'example_02.csv'
outputFileName = 'example_02_edit_delimiter.csv'

with open(inputFileName, mode='r') as in_file, open(outputFileName, mode='w') as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file, delimiter=';')
    writer.writerows(reader)
        
with open('example_02_edit_delimiter.csv', mode='r') as file:
    print(file.read())

Name;Surname;Age

Jacob;Smith;25

Lily;McNeil;36




**Changing a header:**

In [135]:
inputFileName = 'example_02.csv'
outputFileName = 'example_02_edit_encoding.csv'
header = ['A_Name','A_Surname','A_Age']
rows = []

with open(inputFileName, mode='r') as in_file, open(outputFileName, mode='w', newline = '') as out_file:
    #reader = csv.reader(in_file)
    reader = csv.DictReader(in_file)
    reader.keys()
    for row in reader:
        print(row.keys())
        #rows.append(row)
    writer = csv.DictWriter(out_file, delimiter=',', fieldnames=header)
    writer.writeheader()
    writer.writerows(rows)
        
with open('example_02_edit_encoding.csv', mode='r') as file:
    print(file.read())
    
rows

AttributeError: 'DictReader' object has no attribute 'keys'

Method 2: Skipping a first row:

**Changing an encoding:**

In [94]:
inputFileName = 'example_02.csv'
outputFileName = 'example_02_edit_delimiter.csv'

with open(inputFileName, mode='r') as in_file, open(outputFileName, mode='w', newline = '', encoding='utf-8') as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file, delimiter = ';')
    writer.writerows(reader)
        
with open('example_02_edit_encoding.csv', mode='r') as file:
    print(file.read())

a;b;c
1;2;3
4;5;6



## JSON

There are a few popular Python packages that you can use to work with JSON files:

- **json.** This is a built-in Python package that provides methods for encoding and decoding JSON data.
- **simplejson.** This package provides a fast JSON encoder and decoder with support for Python-specific types.
- **ujson.** This package is an ultra-fast JSON encoder and decoder for Python.
- **jsonschema.** This package provides a way to validate JSON data against a specified schema.

Tasks: mine nested jsons, from string to json, load, export, loops, lambdas, .get(), with/without comments

In [None]:
my_json_01 = {
  "name": "John Doe",
  "age": 30,
  "email": "john.doe@example.com",
  "is_employee": true,
  "hobbies": [
    "reading",
    "playing soccer",
    "traveling"
  ],
  "address": {
    "street": "123 Main Street",
    "city": "New York",
    "state": "NY",
    "zip": "10001"
  }
}

**json.dumps()**

The dumps() function takes a single argument, the Python object, and returns a JSON string. 

In [116]:
import json

# Python object to JSON string
python_obj = {'name': 'John', 'age': 30}

json_string = json.dumps(python_obj)
json_string

'{"name": "John", "age": 30}'

**json.loads()**

The loads() function takes a single argument, the JSON string, and returns a Python object. 

In [117]:
# JSON string to Python object
json_string = '{"name": "John", "age": 30}'

python_obj = json.loads(json_string)
python_obj

{'name': 'John', 'age': 30}

**json.dump()**

This function is used to serialize a Python object and write it to a JSON file. The dump() function takes two arguments, the Python object and the file object.

In [121]:
# serialize Python object and write to JSON file
python_obj = {'name': 'John', 'age': 30}
with open('data.json', 'w') as file:
    json.dump(python_obj, file)
    
with open('data.json', 'r') as file:
    print(file.read())

{"name": "John", "age": 30}


**json.load()**

This function is used to read a JSON file and parse its contents into a Python object. The load() function takes a single argument, the file object, and returns a Python object.

In [122]:
# read JSON file and parse contents
with open('data.json', 'r') as file:
    python_obj = json.load(file)
    print(python_obj)

{'name': 'John', 'age': 30}


**Lists to json:**

In [124]:
my_list = [1, 2, 3, "four", "five"]
json_string = json.dumps(my_list)
json_string

'[1, 2, 3, "four", "five"]'

### Formatting JSON data

**Indent**

This option specifies the number of spaces to use for indentation in the output JSON string.

In [127]:
data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, indent=2)
json_data

'{\n  "name": "John",\n  "age": 30,\n  "city": "New York"\n}'

**Sort keys**

This option specifies whether the keys in the output JSON string should be sorted in alphabetical order

In [125]:
data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, sort_keys=True)
json_data

'{"age": 30, "city": "New York", "name": "John"}'

**Separators**

This option allows you to specify the separators used in the output JSON string. The separators parameter takes a tuple of two strings, where the first string is the separator between JSON object key-value pairs, and the second string is the separator between items in JSON arrays. 

In [126]:
data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

json_data = json.dumps(data, separators=(",", ":"))
json_data


'{"name":"John","age":30,"city":"New York"}'

### Python Example - JSON data in APIs

In [115]:
import requests
import json

url = "https://jsonplaceholder.typicode.com/posts"

response = requests.get(url)

if response.status_code == 200:
    data = json.loads(response.text)
    #print(data)
else:
    print(f"Error retrieving data, status code: {response.status_code}")
    
data

[{'userId': 1,
  'id': 1,
  'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit',
  'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'},
 {'userId': 1,
  'id': 2,
  'title': 'qui est esse',
  'body': 'est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla'},
 {'userId': 1,
  'id': 3,
  'title': 'ea molestias quasi exercitationem repellat qui ipsa sit aut',
  'body': 'et iusto sed quo iure\nvoluptatem occaecati omnis eligendi aut ad\nvoluptatem doloribus vel accusantium quis pariatur\nmolestiae porro eius odio et labore et velit aut'},
 {'userId': 1,
  'id': 4,
  'title': 'eum et est occaecati',
  'body': 'ullam et saepe reiciendis voluptatem adipisci\nsit amet autem assumenda provid

## XML

Tasks: parsing, with/without comments

### Parsing XML Data

In [134]:
pip install xml.etree.ElementTree

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement xml.etree.ElementTree (from versions: none)
ERROR: No matching distribution found for xml.etree.ElementTree


In [130]:
import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()

ParseError: no element found: line 75, column 0 (<string>)

In [None]:
root.tag