# B. Text Files(`.txt`, `.csv`) - Input/Output

**`.txt`** and **`.csv`** are the two most commonly used text file formats that we're likely to deal with on daily basis.<br>
Python provides many useful tools you can use when reading and writing text files or dealing with its data so that we can reduce the burden of our daily work.<br>
In section 1, we'll be learning how to handle data in `.txt` and `.csv` files.

## _Objective_
1. **`.txt` file**: Understanding Data I/O(Input/Output) for `.txt`files.
2. **`.csv` file**: Understanding Data I/O(Input/Output) for `.csv`files.

# [1. Text File I/O with Python] 

First off, Python provides a number of built-in functions for reading and writing text files. 
Let's read and write a file using them.

## 1. Python `.txt`

- First of all, you need to open a file you're going to work on. Use `open()` function to open a file by passing the `filename` and setting the `file opening mode`.<br><br>

- The file opening mode literally refers to in which mode you're going to open the file, and here is the list of all available options. <br>

> |File opening mode|Description|
|:---:|:---|
|r|Read - opens a file in read-only (Default value)|
|w|Write - opens a file for write-only|
|a|Append - opens a file for appending |


<br><br>
- When opening a file, you can use a function `open()` as follows. 
 
> ```variable = open(filename, mode)```


※ Once opened with `open()`, the file must be closed by `.close()` at the end. 

※ Alternatively, you can open a file using `with` statement. It automatically creates a block for the file opened and doesn't require the use of `.close` at the end. 

### (1) Reading a `.txt` file 
In order to read a file, you have to open a `.txt` file in read-only mode as <code>open(file_to_read, **r**)</code>.<br>Thereafter, three different methods come under consideration for reading the file.

#### Method 1 - `.read()`

- `.read()` reads the entire contents of the file into a string.

In [1]:
# Reading a file in a child directory
with open('./data/text/txt_example.txt', 'r') as file:
    data = file.read()
print(data)

This is example of txt file! first line
This is example of txt file! second line
This is example of txt file! third line
This is example of txt file! fourth line


In [4]:
# Check the length of the file contents
len(data)

201

In [5]:
type(data)

str

- Since the entire contents of the file is returned as a string, there is no differentiation between non-strings, strings, newline escape and spaces.

<br><br><br>

#### Method 2 - `.readline()`
- `.readline()` reads a text file line by line and returns a single line per execution.
- Each line is separated by a newline escape(**\n**)

In [42]:
file = open('./data/text/txt_example.txt', 'r')
file.readline()

'This is example of txt file! first line\n'

Let's execute one more `.readline()`.

In [43]:
file.readline()

'This is example of txt file! second line\n'

In case you want to read the file line by line yet print the entire contents at a single execution, you can proceed it by using a loop.

In [47]:
file.close()

In [48]:
with open('./data/text/txt_example.txt', 'r') as file:
    while True:
        line = file.readline()
        if not line: 
            break
        print(line)

This is example of txt file! first line

This is example of txt file! second line

This is example of txt file! third line

This is example of txt file! fourth line



<br><br><br><br>


### Method 3 - `.readlines()`

- `.readlines()` reads and returns a list of lines from the stream.


In [2]:
with open('./data/text/txt_example.txt', 'r') as file:
    lines = file.readlines()
    print(lines)
    print("\n\n`.readlines()` returned the entire contents at a single execution\n\n")
    print("Now, we're going to print the contents line by line using a for loop.\n\n")
    for line in lines:
        print(line)

['This is example of txt file! first line\n', 'This is example of txt file! second line\n', 'This is example of txt file! third line\n', 'This is example of txt file! fourth line']


`.readlines()` returned the entire contents at a single execution


Now, we're going to print the contents line by line using a for loop.


This is example of txt file! first line

This is example of txt file! second line

This is example of txt file! third line

This is example of txt file! fourth line


### (2) Writing a file

In addition to reading a text file, you can also write a file using Python.<br>In order to do so, you need to open a file in write-only mode as 
1. <code>open(file_to_write, 'w')</code>, or 
2. <code>with open(file_to_write, 'w') as file</code>




Let's create a file to write and save a text in it. 

In [11]:
with open('./data/text/txt_write.txt','w') as file: # txt_write.txt is the file you'll be writing on
    data = "Write a new text!"
    file.write(data) # the string in `data` is written to the file.

Let's read what's written to the file.

In [12]:
with open('./data/text/txt_write.txt', 'r') as file:
    data = file.read()
    print(data)

Write a new text!


If you want to append new data append to the end of an existing file, open the file in append-only mode. <br>
we'll be add more data to `txt_sample.txt` and save it. <br>

Syntax for append mode: ```file_object = open(file_directory, 'a')```

In [13]:
with open('./data/text/txt_example.txt','a') as file:
    data = "This is example of txt file! fourth line"
    file.write(data)

Let's check the newly saved content.

In [3]:
with open('./data/text/txt_example.txt', 'r') as file:
    data = file.read()
    print(data)

This is example of txt file! first line
This is example of txt file! second line
This is example of txt file! third line
This is example of txt file! fourth line


You can see that "This is example of txt file! fourth line" has been added to the end of the file.

### (5) pickle

The `pickle` module implements binary protocols for serializing and de-serializing a Python object structure.<br>
With pickle, you can input and output even data structures like **lists and classes**.<br>
To see how it works, we will first import `pickle` and create a list.

In [15]:
import pickle
sample_list = ['a', 'b', 'c']

Now let's save the list data type using `pickle`.<br>
When saving or loading data with pickle, you must enter `wb` or `rb` as the argument of the `open()`.

In [16]:
with open('list.txt', 'wb') as f:
    pickle.dump(list,f) # `pickle.dump()` takes 2 objects,an object to be pickled and the file the object will be saved to. 

Let's load the txt file you just saved.

In [17]:
with open('list.txt', 'rb') as f:
    data = pickle.load(f) # `pickle.load()` reads a single line from the contents stream.
print(data)
data

<class 'list'>


list

## 2. Python `.csv`
- `csv` stands for comma-separated values, in which every value is separated by a comma.<br>
- You can load csv files to Python using a built-in library `csv`.

### (1) Loading a csv file
 You can load a csv file using the `.reader()` method of the `csv` library. It converts text data into a csv-reader object. 

In [18]:
# importing csv library
import csv

Let's read the following tabular data from a csv file.


Open the csv file in read-only mode ('r') and pass the file object into `.reader()`.<br>
In this step, each line of the csv-reader object will be printed in the form of a list.

In [19]:
with open('./data/text/csv_example.csv','r',encoding='utf-8-sig') as file:
    lines = csv.reader(file)
    for line in lines:
        print(line)

['john', 'male', '23', 'data analyst']
['jake', 'male', '23', 'teacher']
['sally', 'female', '22', 'police officer']
['brian', 'male', '27', 'dancer']
['elise', 'female', '24', 'singer']
['elise', 'female', '24', 'singer']


You can see that csv.reader() has read the csv file by listing each line.

### (2) How to edit and save a csv file


The `csv`'s `.writer()` converts the current working file to a csv-writer object which you can write and save data to.<br>
Let's append a list row to a csv-writer object using `.writerow()` 

In [20]:
with open('./data/text/csv_example.csv', 'a', newline = '') as file:
    writer = csv.writer(file)
    writer.writerow(['elise', 'female', 24, 'singer']) # writing a row to the `writer` object.

Let's check the line you just wrote to the writer object.

In [21]:
with open('./data/text/csv_example.csv','r',encoding='utf-8-sig') as file:
    lines = csv.reader(file)
    for line in lines:
        print(line)

['john', 'male', '23', 'data analyst']
['jake', 'male', '23', 'teacher']
['sally', 'female', '22', 'police officer']
['brian', 'male', '27', 'dancer']
['elise', 'female', '24', 'singer']
['elise', 'female', '24', 'singer']
['elise', 'female', '24', 'singer']


You can see a new line is added to the existing data table.

# [2. Text Data I/O with Numpy]

`Numpy` provides the number of tools to manipulate text data.<br>

## 1. Numpy `.txt`

### (1) Saving a `.txt` file with Numpy

In [22]:
# import Numpy
import numpy as np

Create a Numpy array named simple_array<br>
You can use `np.savetxt()` for saving a text file. We're going to save a text file under a name `write_numpy`

In [23]:
simple_array = np.array([1,2,3,4,5])
np.savetxt('./data/text/write_numpy.txt',simple_array)

### (2) Loading a `.txt` file with Numpy
The saved text file can be loaded by `np.loadtxt()`.

In [24]:
load_array = np.loadtxt('./data/text/write_numpy.txt')
print(load_array)

[1. 2. 3. 4. 5.]


In [25]:
sum(load_array)

15.0

## 2. Numpy `.csv`

### (1) Saving a `csv` file with Numpy

Import the `random` library to create random arrays.<br>
Create an array in order to save to a file.

In [26]:
import random
random_array = np.zeros((5,3))
for i in range(5):
    for j in range(3):
        random_array[i][j] = random.randint(1000,2000)
print(random_array)

[[1361. 1021. 1710.]
 [1438. 1176. 1003.]
 [1815. 1938. 1725.]
 [1862. 1748. 1333.]
 [1081. 1001. 1790.]]


Use `.savetxt()` to save it as a csv file.

In [27]:
np.savetxt('./data/text/random_array.txt',random_array, delimiter=',')

The saved file `random_array.txt` will look like the following.

### (2) Loading csv file with Numpy
Use `.loadtxt()` to read csv files.

In [28]:
random_array = np.loadtxt('./data/text/random_array.txt',
                          dtype='float',delimiter=',')
print(random_array)

[[1361. 1021. 1710.]
 [1438. 1176. 1003.]
 [1815. 1938. 1725.]
 [1862. 1748. 1333.]
 [1081. 1001. 1790.]]


You can see that the comma separated data from a csv file are loaded as an array with Numpy.

# [3. Text Data I/O with Pandas]

The last part of this session is on how to input and output data using Pandas.<br>

## 1. Pandas `.txt` & `.csv`

Pandas provides tools to read tabular text data (mainly `.csv`) as a DataFrame.<br>

### (1) How to load csv file with Pandas
`.read_csv()` reads a comma-separated values (csv) file and returns it as two-dimensional data structure with labeled axes (DataFrame).

In [29]:
import pandas as pd 
data = pd.read_csv("./data/text/pandas_csv_example.csv")
data

Unnamed: 0,name,sex,age,salary
0,john,male,23,data analyst
1,jake,male,23,teacher
2,sally,female,22,police officer
3,brian,male,27,dancer


※ **Pandas does not distinguish whether the file format is `.txt` or `.csv`.**<br>

In [30]:
data = pd.read_csv("./data/text/pandas_txt_example.txt", delimiter='\t')
data

Unnamed: 0,name,sex,age,salary
0,john,male,23,data analyst
1,jake,male,23,teacher
2,sally,female,22,police officer
3,brian,male,27,dancer


By passing `delimiter='\t'`, `.read_csv()` reads the tab-delimited txt file `pandas_txt_example.txt` and displays the data in a DataFrame.

If you want to read by adding new column labels to a file without column labels, you can add them manually by passing a list of column labels to the `names` keyword in `.read_csv()`.

In [31]:
data = pd.read_csv("./data/text/csv_example.csv", names=['A','B','C','D'])
data

Unnamed: 0,A,B,C,D
0,john,male,23,data analyst
1,jake,male,23,teacher
2,sally,female,22,police officer
3,brian,male,27,dancer
4,elise,female,24,singer
5,elise,female,24,singer
6,elise,female,24,singer


By passing a list of column labels to `names` when reading a csv file with `.read_csv()`, you can see that the new DataFrame is created with the user-defined column labels in place.

### (2) How to save a csv file with Pandas
The `.to_csv()` method writes a DataFrame object to a comma-separated values (csv) file.

In [32]:
df = pd.DataFrame(np.random.randint(0,20, size=(5,4)),
                  columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,18,13,19,16
1,0,17,0,16
2,4,15,11,16
3,6,6,19,1
4,5,17,2,5


In [37]:
df.to_csv("./data/text/to_csv_example.csv", index = False) # saving the DataFrame to a csv file; by setting 'index = False', the row names or row labels will not be included in the csv file.

The file `to_csv_example.csv` will look as the following when opened.

In [36]:
pd.read_csv("./data/text/to_csv_example.csv") # opening the file you just wrote.

Unnamed: 0,A,B,C,D
0,18,13,19,16
1,0,17,0,16
2,4,15,11,16
3,6,6,19,1
4,5,17,2,5
