___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Working with Text Files
In this section we'll cover
 * Working with f-strings (formatted string literals) to format printed text
 * Working with Files - opening, reading, writing and appending text files

## Formatted String Literals (f-strings)

Introduced in Python 3.6, <strong>f-strings</strong> offer several benefits over the older `.format()` string method. <br>For one, you can bring outside variables immediately into to the string rather than pass them through as keyword arguments:

In [1]:
person = 'Jose'

In [2]:
# Using the old .format() method:
print('My name is {var}.'.format(var=person))

# Using f-strings:
print(f'My name is {person}.')

#Alternative way
print("My name is {}.".format(person))

My name is Jose.
My name is Jose.
My name is Jose.


In [3]:
#If you forget to type the f in front of string
print('My name is {person}.')

My name is {person}.


Pass `!r` to get the <strong>string representation</strong>:

In [4]:
print(f'My name is {person!r}')

My name is 'Jose'


Be careful not to let quotation marks in the replacement fields conflict with the quoting used in the outer string:

In [5]:
d = {'a':123,'b':456}

print (f"My number is {d['a']}")

My number is 123


In [6]:
print(f'Address: {d['a']} Main Street')

SyntaxError: invalid syntax (<ipython-input-6-62d3f16afd44>, line 1)

Instead, use different styles of quotation marks:

In [8]:
print(f"Address: {d['a']} Main Street")

Address: 123 Main Street


In [9]:
mylist = [0, 1, 2]

print (f"My number is {mylist[1]}")

My number is 1


### Minimum Widths, Alignment and Padding
You can pass arguments inside a nested set of curly braces to set a minimum width for the field, the alignment and even padding characters.

In [10]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]
library

[('Author', 'Topic', 'Pages'),
 ('Twain', 'Rafting', 601),
 ('Feynman', 'Physics', 95),
 ('Hamilton', 'Mythology', 144)]

In [11]:
for book in library:
    print(book)

('Author', 'Topic', 'Pages')
('Twain', 'Rafting', 601)
('Feynman', 'Physics', 95)
('Hamilton', 'Mythology', 144)


In [12]:
for book in library:
    print(f'Author is {book[0]}')
    
print("----------\n")   
#-----------------Another way----------
print("----------")
for author,topics,pages in library:
    print(f'Author is {author}')

Author is Author
Author is Twain
Author is Feynman
Author is Hamilton
----------

----------
Author is Author
Author is Twain
Author is Feynman
Author is Hamilton


In [13]:
for author,topics in library:
    print(f'Author is {author}')

ValueError: too many values to unpack (expected 2)

In [14]:
for author,topics,pages in library:
    print(f'{author} {topics} {pages}')

Author Topic Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144


In [15]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting in water alone', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]
for author,topics,pages in library:
    print(f'{author} {topics} {pages}')

Author Topic Pages
Twain Rafting in water alone 601
Feynman Physics 95
Hamilton Mythology 144


# f string ilteral formating

In [16]:
for author,topics,pages in library:
    print(f'{author:{10}} {topics:{30}} {pages:{10}}')

Author     Topic                          Pages     
Twain      Rafting in water alone                601
Feynman    Physics                                95
Hamilton   Mythology                             144


In [17]:
for book in library:
    print(f'{book[0]:{10}} {book[1]:{8}} {book[2]:{7}}')

Author     Topic    Pages  
Twain      Rafting in water alone     601
Feynman    Physics       95
Hamilton   Mythology     144


In [18]:
for author,topics,pages in library:
    print(f'{author:{10}} {topics:{30}} {pages:{10}}')

Author     Topic                          Pages     
Twain      Rafting in water alone                601
Feynman    Physics                                95
Hamilton   Mythology                             144


In [19]:
for author,topics,pages in library:
    print(f'{author:{10}} {topics:{30}} {pages:>{10}}')

Author     Topic                               Pages
Twain      Rafting in water alone                601
Feynman    Physics                                95
Hamilton   Mythology                             144


In [20]:
for author,topics,pages in library:
    print(f'{author:{10}} {topics:{30}} {pages:->{10}}')

Author     Topic                          -----Pages
Twain      Rafting in water alone         -------601
Feynman    Physics                        --------95
Hamilton   Mythology                      -------144


In [21]:
for author,topics,pages in library:
    print(f'{author:{10}} {topics:{30}} {pages:.>{10}}')

Author     Topic                          .....Pages
Twain      Rafting in water alone         .......601
Feynman    Physics                        ........95
Hamilton   Mythology                      .......144


Here the first three lines align, except `Pages` follows a default left-alignment while numbers are right-aligned. Also, the fourth line's page number is pushed to the right as `Mythology` exceeds the minimum field width of `8`. When setting minimum field widths make sure to take the longest item into account.

To set the alignment, use the character `<` for left-align,  `^` for center, `>` for right.<br>
To set padding, precede the alignment character with the padding character (`-` and `.` are common choices).

Let's make some adjustments:

In [22]:
for book in library:
    print(f'{book[0]:{10}} {book[1]:{10}} {book[2]:.>{7}}') # here .> was added

Author     Topic      ..Pages
Twain      Rafting in water alone ....601
Feynman    Physics    .....95
Hamilton   Mythology  ....144


### Date Formatting

In [23]:
from datetime import datetime

In [24]:
today = datetime(year=2019, month=2, day=28)

print(f'{today}')

2019-02-28 00:00:00


In [25]:
today

datetime.datetime(2019, 2, 28, 0, 0)

### For time formating go to https://strftime.org/ 

In [26]:
print(f'{today:%B}')

February


In [27]:
print(f'{today:%B %d}')

February 28


In [28]:
print(f'{today:%B %d, %Y}')

February 28, 2019


For more info on formatted string literals visit https://docs.python.org/3/reference/lexical_analysis.html#f-strings

***

# Files

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc. Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available. (We will cover downloading modules later on in the course).

Python has a built-in open function that allows us to open and play with basic file types. First we will need a file though. We're going to use some IPython magic to create a text file!

## Creating a File with IPython
#### This function is specific to jupyter notebooks! Alternatively, quickly create a simple .txt file with Sublime text editor.

In [29]:
%%writefile test.txt 
Hello, this is a quick test file.
This is the second line of the file.

Overwriting test.txt


In [30]:
# %% magic command for jupyter notebook only works in jupyter

## Python Opening a File

### Know Your File's Location

It's easy to get an error on this step:

In [31]:
myfile = open('whoops.txt')

To avoid this error, make sure your .txt file is saved in the same location as your notebook. To check your notebook location, use **pwd**:

In [32]:
pwd

'/home/msjahid/Jupyter /NLP---Natural-Language-Processing-with-Python/00-Python-Text-Basics'

**Alternatively, to grab files from any location on your computer, simply pass in the entire file path. **

For Windows you need to use double \ so python doesn't treat the second \ as an escape character, a file path is in the form:

    myfile = open("C:\\Users\\YourUserName\\Home\\Folder\\myfile.txt")

For MacOS and Linux you use slashes in the opposite direction:

    myfile = open("/Users/YourUserName/Folder/myfile.txt")

In [33]:
# Open the text.txt file we created earlier
myfile = open('test.txt')

In [34]:
myfile

<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>

`myfile` is now an open file object held in memory. We'll perform some reading and writing exercises, and then we have to close the file to free up memory.

### .read() and .seek()

In [35]:
# We can now read the file
myfile.read()

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

In [36]:
# But what happens if we try to read it again?
myfile.read()

''

This happens because you can imagine the reading "cursor" is at the end of the file after having read it. So there is nothing left to read. We can reset the "cursor" like this:

In [37]:
# Seek to the start of file (index 0)
myfile.seek(0)

0

In [38]:
# Now read again
myfile.read()

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

In [39]:
myfile.seek(0)

0

In [40]:
content = myfile.read()

In [41]:
print(content)

Hello, this is a quick test file.
This is the second line of the file.



In [42]:
content

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

In [43]:
myfile.close()

### .readlines()
You can read a file line by line using the readlines method. Use caution with large files, since everything will be held in memory. We will learn how to iterate over large files later in the course.

In [44]:
myfile = open('test.txt')

In [45]:
# Readlines returns a list of the lines in the file
myfile.readlines()

['Hello, this is a quick test file.\n',
 'This is the second line of the file.\n']

In [46]:
myfile.seek(0)

0

In [47]:
mylines = myfile.readlines()
mylines

['Hello, this is a quick test file.\n',
 'This is the second line of the file.\n']

In [48]:
for line in mylines:
    print(line[0])

H
T


In [49]:
for line in mylines:
    print(line.split()[0])

Hello,
This


When you have finished using a file, it is always good practice to close it.

In [50]:
myfile.close()

## Writing to a File

By default, the `open()` function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:

In [51]:
# Add a second argument to the function, 'w' which stands for write.
# Passing 'w+' lets us read and write to the file

myfile = open('test.txt','w+')

<div class="alert alert-danger" style="margin: 20px">**Use caution!**<br>
Opening a file with 'w' or 'w+' *truncates the original*, meaning that anything that was in the original file **is deleted**!</div>

In [52]:
myfile.read()

''

In [53]:
# Write to the file
myfile.write('MY BRAND NEW TEXT')

17

In [54]:
# Read the file
myfile.seek(0)
myfile.read()

'MY BRAND NEW TEXT'

In [55]:
myfile.close()  # always do this when you're done with a file

## Appending to a File
Passing the argument `'a'` opens the file and puts the pointer at the end, so anything written is appended. Like `'w+'`, `'a+'` lets us read and write to a file. If the file does not exist, one will be created.

In [56]:
myfile = open('test.txt','a+')
myfile.write('\nThis line is being appended to test.txt')
myfile.write('\nAnd another line here.')

23

In [57]:
myfile.seek(0)
print(myfile.read())

MY BRAND NEW TEXT
This line is being appended to test.txt
And another line here.


In [58]:
myfile.close()

In [59]:
myfile = open('whoops.txt', 'a+')

In [60]:
myfile.write('MY FIRST LINE IN A+ OPENING')

27

In [61]:
myfile.close()

In [62]:
newfile = open('whoops.txt')

In [63]:
newfile.read()

'MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\nThis is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENING'

In [64]:
newfile.write('try to write something with only read permissions')

UnsupportedOperation: not writable

In [65]:
newfile.close()

In [66]:
myfile = open('whoops.txt',mode='a+')

In [67]:
myfile.write('This is an added line, because I used a+ mode')

45

In [68]:
myfile.seek(0) 

0

In [69]:
myfile.read()

'MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\nThis is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode'

In [70]:
myfile.write('\nThis is a real new line, on the next line')

42

In [71]:
myfile.seek(0) 

0

In [72]:
myfile.read() 

'MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\nThis is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\nThis is a real new line, on the next line'

In [73]:
myfile.seek(0)

0

In [74]:
print(myfile.read())

MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode
This is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode
This is a real new line, on the next line


### Appending with `%%writefile`
Jupyter notebook users can do the same thing using IPython cell magic:

In [75]:
%%writefile -a test.txt

This is more text being appended to test.txt
And another line here.

Appending to test.txt


Add a blank space if you want the first line to begin on its own line, as Jupyter won't recognize escape sequences like `\n`

## Aliases and Context Managers
You can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a context manager:

In [76]:
with open('whoops.txt','r') as mynewfile:
    myvariable = mynewfile.readlines()

In [77]:
myvariable

['MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\n',
 'This is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\n',
 'This is a real new line, on the next line']

In [78]:
print(myvariable)

['MY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\n', 'This is a real new line, on the next lineMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGMY FIRST LINE IN A+ OPENINGThis is an added line, because I used a+ mode\n', 'This is a real new line, on the next line']


In [79]:
with open('test.txt','r') as txt:
    first_line = txt.readlines()[0]
    
print(first_line)

MY BRAND NEW TEXT



Note that the `with ... as ...:` context manager automatically closed `test.txt` after assigning the first line of text to first_line:

In [80]:
txt.read()

ValueError: I/O operation on closed file.

## Iterating through a File

In [81]:
with open('test.txt','r') as txt:
    for line in txt:
        print(line, end='')  # the end='' argument removes extra linebreaks

MY BRAND NEW TEXT
This line is being appended to test.txt
And another line here.
This is more text being appended to test.txt
And another line here.


Great! Now you should be familiar with formatted string literals and working with text files.
## Next up: Working with PDF Text