You can use the ```open``` function to create a file object that can read and write files.

It takes two arguments, the path to the file and the mode: read, write or binary (read is default)

In [39]:
file_path = "some_text.txt"

In [40]:
open_file = open(file_path, "r")

In [41]:
text = open_file.read()

In [42]:
print(text)

Quick six blind smart out burst.
Perfectly on furniture dejection determine my depending an to.
Add short water court fat.
Her bachelor honoured perceive securing but desirous ham required.
Questions deficient acuteness to engrossed as.
Entirely led ten humoured greatest and yourself.
Besides ye country on observe.
She continue appetite endeavor she judgment interest the met.
For she surrounded motionless fat resolution may.



In [43]:
len(text)

429

In [44]:
text[10]

'b'

In [45]:
open_file

<_io.TextIOWrapper name='some_text.txt' mode='r' encoding='UTF-8'>

In [46]:
open_file.close()

***NOTE***: It's a good practice to close a file when you finish working with it.

You can also read files with```readline()```, this separates the file at every \n character.

In [47]:
open_file = open(file_path, "r")

In [48]:
text_lines = open_file.readlines()

In [49]:
text_lines[0]

'Quick six blind smart out burst.\n'

In [50]:
text_lines[5]

'Entirely led ten humoured greatest and yourself.\n'

In [51]:
open_file.close()

A more practical way of opening files is using the ```with``` statement, which will close the file once Python exits the indented block

In [52]:
with open(file_path, "r") as open_file_2:
    text_lines_2 = open_file_2.readlines()

In [53]:
text_lines_2[3]

'Her bachelor honoured perceive securing but desirous ham required.\n'

In [54]:
open_file_2.closed

True

Unix represents line endings as \n, while Windows represents them as \r\n.

In both cases, Python will convert them to \n when you open a file as text.

When opening images such as .jpg or .pdf, make sure to append 'b' to the method when opening the file, that way Python opens the file in **binary** mode.

***NOTE***: Binary mode does not use any line-ending conversion

In [55]:
image_path = "baptist-standaert-mx0DEnfYxic-unsplash.jpg"

In [56]:
with open(image_path, "rb") as open_image:
    btext = open_image.read()

In [57]:
btext[5]

16

In [58]:
btext[:20]

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00H\x00H\x00\x00'

The tool direnv is used to set up development environments, it reads configuration from a .envrc where you can define environment variables and application runtimes.

Let's create such a file and define some environment variables in it.

In [59]:
env_variables = """export STAGE=PROD
TABLE_ID=token-storage-3"""

with open(".envrc", "w") as opened_file:
    opened_file.write(env_variables)

In [60]:
!ls -a

.				 baptist-standaert-mx0DEnfYxic-unsplash.jpg
..				 characters.xml
.envrc				 sb.config
.ipynb_checkpoints		 service-policy.json
Reading_and_Writing_Files.ipynb  some_text.txt
addresses.csv			 verify-apache.yaml


In [61]:
!cat .envrc

export STAGE=PROD
TABLE_ID=token-storage-3

***NOTE***: The ```open``` function creates a file if it doesn't already exist, and overwrites the file if it does exist.

You can use the append 'a' method to append to existing files instead of overwriting them.

For image files, it's best to use 'wb' or 'ab' in order to safely modify the file.

Let's look at some useful file read and write functions from pathlib

In [62]:
import pathlib

path = pathlib.Path(
    "/workspaces/Dev/Python/Python_For_DevOps/Chapter_1_Python_Essentials_for_DevOps/exceptions.py")
path.read_text()

# NOTE: read_binary() can be used to read binary data

'print("Exceptions can be caught using a try-except block")\n\nthinkers = ["Plato", "PlayDo", "Gumby"]\nwhile True:\n    try:\n        thinker = thinkers.pop()\n        print(thinker)\n    except IndexError as e:\n        print(e)\n        break\n'

In [63]:
# we can also create new files or overwrite existing ones
import pathlib

path2 = pathlib.Path(
    "/workspaces/Dev/Python/Python_For_DevOps/Chapter_2_Automating_File_System/sb.config")
path2.write_text("LOG:DEBUG")

9

pathlib and open can be usfeul for opening unstructured text files, but what if we have structured text files such as JSON?

Let's take a look at this example where we have an AWS IAM Policy:

```json
{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": "service-prefix:action-name",
    "Resource": "*",
    "Condition": {
      "DateGreaterThan": { "aws:CurrentTime": "2017-07-01T00:00:00Z" },
      "DateLessthan": { "aws:CurrentTime": "2017-12-31T23:59:59Z" }
    }
  }
}
```

We can use the ```json``` module in order to correctly parse the file into Python data structures

In [89]:
import json

with open("service-policy.json", "r") as opened_file:
    policy = json.load(opened_file)

In [90]:
print(policy)

{'Version': '2012-10-17', 'Statement': {'Effect': 'Allow', 'Action': 'service-prefix:action-name', 'Resource': '*', 'Condition': {'DateGreaterThan': {'aws:CurrentTime': '2017-07-01T00:00:00Z'}, 'DateLessthan': {'aws:CurrentTime': '2017-12-31T23:59:59Z'}}}}


In [91]:
# the pprint module automatically formats Python objects for printing
from pprint import pprint

pprint(policy)

{'Statement': {'Action': 'service-prefix:action-name',
               'Condition': {'DateGreaterThan': {'aws:CurrentTime': '2017-07-01T00:00:00Z'},
                             'DateLessthan': {'aws:CurrentTime': '2017-12-31T23:59:59Z'}},
               'Effect': 'Allow',
               'Resource': '*'},
 'Version': '2012-10-17'}


In [92]:
# Now we can work with the original file structure

# Let's change the resource to S3

policy["Statement"]["Resource"] = "S3"

In [93]:
pprint(policy)

{'Statement': {'Action': 'service-prefix:action-name',
               'Condition': {'DateGreaterThan': {'aws:CurrentTime': '2017-07-01T00:00:00Z'},
                             'DateLessthan': {'aws:CurrentTime': '2017-12-31T23:59:59Z'}},
               'Effect': 'Allow',
               'Resource': 'S3'},
 'Version': '2012-10-17'}


In [94]:
# You can write a Python dictinary as a JSON file with the json.dump() method

# Let's update the IAM policy we just modified

with open("service-policy.json", "w") as opened_file:
    policy_update = json.dump(policy, opened_file)

In [95]:
!cat service-policy.json

{"Version": "2012-10-17", "Statement": {"Effect": "Allow", "Action": "service-prefix:action-name", "Resource": "S3", "Condition": {"DateGreaterThan": {"aws:CurrentTime": "2017-07-01T00:00:00Z"}, "DateLessthan": {"aws:CurrentTime": "2017-12-31T23:59:59Z"}}}}

We can also work with YAML files, Ansible uses this file format for playbooks, which define the actions you want to automate, the following is an example:

```yaml
---
- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum:
      name: httpd
      state: latest
```

For YAML parsing, we'll install PyYAML.

```bash
pip install PyYAML
```

In [85]:
import yaml

with open("verify-apache.yaml", "r") as open_yml_file:
    verify_apache = yaml.safe_load(open_yml_file)

pprint(verify_apache)

[{'hosts': 'webservers',
  'remote_user': 'root',
  'tasks': [{'name': 'ensure apache is at the latest version',
             'yum': {'name': 'httpd', 'state': 'latest'}}],
  'vars': {'http_port': 80, 'max_clients': 200}}]


Note how the YAML file was transformed into Python-compatible data structures.

We can also write a Python data to a YAML file.

In [86]:
# Now that we're working with the original file structure, let's change the value for maximum clients
verify_apache[0]["vars"]["max_clients"] = 350
pprint(verify_apache)

[{'hosts': 'webservers',
  'remote_user': 'root',
  'tasks': [{'name': 'ensure apache is at the latest version',
             'yum': {'name': 'httpd', 'state': 'latest'}}],
  'vars': {'http_port': 80, 'max_clients': 350}}]


In [87]:
with open("verify-apache.yaml", "w") as update_yaml_file:
    yaml.dump(verify_apache, update_yaml_file)

In [88]:
!cat verify-apache.yaml

- hosts: webservers
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum:
      name: httpd
      state: latest
  vars:
    http_port: 80
    max_clients: 350


Another popular format for structured data is XML, this is mainly used in RSS feeds.

Let's look at this example:

```xml
<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
    <actor>
        <name>John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>
```

In [75]:
import xml.etree.ElementTree as ET

tree = ET.parse("characters.xml")

In [76]:
root = tree.getroot()

In [77]:
root

<Element '{http://people.example.com}actors' at 0x7f3f11633950>

In [78]:
# We can walk down the tree by iterating over the child nodes
for child in root:
    print(child.tag, child.attrib)

{http://people.example.com}actor {}
{http://people.example.com}actor {}


XML allows for *namespacing* (using tags to group data).

In [79]:
ns = {
    "default": "http://people.example.com",
    "role_played": "http://characters.example.com",
}

actors = root.findall("default:actor", ns)

for actor in actors:
    name = actor.find("default:name", ns)
    print(name.text)
    for character in actor.findall("role_played:character", ns):
        print(f"-> {character.text}")

John Cleese
-> Lancelot
-> Archie Leach
Eric Idle
-> Sir Robin
-> Gunther
-> Commander Clement


You may also encounter data structured as CSV:

```csv
Name,Last Name,Address,County,City,ZIP Code
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
```

In [80]:
import csv

file_path = "addresses.csv"

with open(file_path, newline="") as csv_file:
    reader = csv.reader(csv_file, delimiter=",")
    for _ in range(6):
        print(next(reader))
# NOTE: notice that the reader object reads the CSV file one line at a time,
# this is useful if the file is large and you don't want to load it all into memory at once

['Name', 'Last Name', 'Address', 'County', 'City', 'ZIP Code']
['John', 'Doe', '120 jefferson st.', 'Riverside', ' NJ', ' 08075']
['Jack', 'McGinnis', '220 hobo Av.', 'Phila', ' PA', '09119']
['John "Da Man"', 'Repici', '120 Jefferson St.', 'Riverside', ' NJ', '08075']
['Stephen', 'Tyler', '7452 Terrace "At the Plaza" road', 'SomeTown', 'SD', ' 91234']
['', 'Blankman', '', 'SomeTown', ' SD', ' 00298']


Given the case that you do want to load the whole file at once, you can use Pandas, which is a data science library that has a data structure: "Data Frame" which acts like a database table.

Pandas must be installed via pip:
```bash
pip install pandas```

In [81]:
import pandas as pd

df = pd.read_csv("addresses.csv")
type(df)

pandas.core.frame.DataFrame

In [82]:
# return the top 3 rows of your file
df.head(3)

Unnamed: 0,Name,Last Name,Address,County,City,ZIP Code
0,John,Doe,120 jefferson st.,Riverside,NJ,8075
1,Jack,McGinnis,220 hobo Av.,Phila,PA,9119
2,"John ""Da Man""",Repici,120 Jefferson St.,Riverside,NJ,8075


In [83]:
# get statistical insights about your data
df.describe()

Unnamed: 0,ZIP Code
count,6.0
mean,19487.333333
std,35380.155843
min,123.0
25%,2242.25
50%,8075.0
75%,8858.0
max,91234.0


In [84]:
# view a single column of data
df["ZIP Code"]

0     8075
1     9119
2     8075
3    91234
4      298
5      123
Name: ZIP Code, dtype: int64