# Introduction to Python Text Basics

- Understand how to open a normal `.txt` and `.pdf` files with basic Python libraries.
- Learn some basic regular expressions.

## Working with text files with Python - Part One

Welcome back. In this lecture for part one of working text files we're going to focus on some basic print formatting,  specifically with the f-string literal.

- Let's go over some basic print formatting (f-string literal).

- We'll also discuss alignment options with f-string literals.

Let's get started.

I'm going to begin by showing you how before Python 3.6, you had to perform print formatting. And the way you would do this is if you had a variable for example:

In [1]:
person = "Jose"

And then you want to insert that variable into a string you were going to print out, the way you used that to do it is by saying something like:

In [2]:
print("my name is {}".format(person))

my name is Jose


The way to do this is `My name is` and then you would have curly braces. And after the string you would say the format and then pass in the variable name you want to insert in place of the curly braces.

So if you run this I now see `my name is Jose`. This is the old way before python 3.6. Now 3.6 and higher, now includes what's known as **formatted string literals** or **f-string literals** for short. And it actually simplifies this entire process a lot more.

So all you need to say is print and then type a single `f` and then start typing out your string:

In [6]:
print(f"my name is {person}")

my name is Jose


And then when you run this you get back the same result.

Now the other thing I want to point out that's nice about this method is, if you have a Python object such as a dictionary.

In [7]:
d = {'a':123,'b':456}

You can actually do sort of operations within the f-string literal. Inside the curly braces I can actually perform operations: 

In [10]:
print(f"my number is {d['a']}")

my number is 123


Or even that:

In [11]:
print(f"my number is {d['a']*2}")

my number is 246


It works the same for list:

In [15]:
mylist = [0,1,2]

I could from my list actually index an item like 0 and then run that:

In [16]:
print(f"my number is {mylist[0]}")

my number is 0


### Alignments and Padding

Now the last thing I want to go over are actually using alignments and padding when you're dealing with multiple items that you're trying to printout. And we're actually going to be using this later on in the Course specifically when we start talking up parts of speech tagging Spacey.

Spacey often prints out things in a format that doesn't look that nice and is hard to read, but luckily we can format it ourselves.

In [19]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting', 601), ('Feyman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]

In [21]:
library

[('Author', 'Topic', 'Pages'),
 ('Twain', 'Rafting', 601),
 ('Feyman', 'Physics', 95),
 ('Hamilton', 'Mythology', 144)]

This list is just a list of tuples. I have a variable which is a list called Library. And inside this library I have tuples just three at a time or really just three objects and each tuple `author`, `topic` and `pages` and then some examples here. So what I could do is I could say:

In [22]:
for book in library:
    print(book)

('Author', 'Topic', 'Pages')
('Twain', 'Rafting', 601)
('Feyman', 'Physics', 95)
('Hamilton', 'Mythology', 144)


I get to see all the tuples being printed up.

The other thing I could do is start using **string literals** to get a little fancy here and then index maybe just want the authors:

In [23]:
for book in library:
    print(f"Author is {book[0]}")

Author is Author
Author is Twain
Author is Feyman
Author is Hamilton


Now I could use tuple in packing to do the following:

In [24]:
for author,topic,pages in library:
    print(f"Author is {author}")

Author is Author
Author is Twain
Author is Feyman
Author is Hamilton


So what I'm going to do now is print out all three of those items.

In [25]:
for author,topic,pages in library: 
    print(f"{author} {topic} {pages}")

Author Topic Pages
Twain Rafting 601
Feyman Physics 95
Hamilton Mythology 144


Again I'm just using tuple and packing here #1 to unpack the tuple and then passing in these variables into the curly braces.

Now you may notice right away that it looks like the formatting is slightly sloppy here and that's because we essentially are actually taking into account any sort of padding or any sort of space.

So if I come back to this library list and make one of these titles really long such as rafting and water alone let me rerun this.

In [26]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting in water alone', 601), ('Feyman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]

In [27]:
library

[('Author', 'Topic', 'Pages'),
 ('Twain', 'Rafting in water alone', 601),
 ('Feyman', 'Physics', 95),
 ('Hamilton', 'Mythology', 144)]

In [28]:
for author,topic,pages in library: 
    print(f"{author} {topic} {pages}")

Author Topic Pages
Twain Rafting in water alone 601
Feyman Physics 95
Hamilton Mythology 144


So my formatting is even worst. 

And there's a variety of ways that we can add using string literal formatting to actually fix this sort of issue.

So the first thing you can do is you can pass in an argument for a minimum with essentially a minimum with that each of these so-called columns could take. So it is inside this curly braces after the variable name. You provide a colon a another set of curly braces and then a number of minimum number spaces that should be taken up by this variable.

In [29]:
for author,topic,pages in library: 
    print(f"{author:{10}} {topic:{30}} {pages:{10}}")

Author     Topic                          Pages     
Twain      Rafting in water alone                601
Feyman     Physics                                95
Hamilton   Mythology                             144


So now when you run this you can see they've already begun formatting it a lot better.

You probably noticed that pages. It looks a little strangely aligned here and that's because of the way that it's trying to combine this string with an integer. The following is one example:

In [30]:
for author,topic,pages in library: 
    print(f"{author:{10}} {topic:{30}} {pages:>{10}}")

Author     Topic                               Pages
Twain      Rafting in water alone                601
Feyman     Physics                                95
Hamilton   Mythology                             144


Or just pass and a character in between the colon and the greater than sign. So I can pass in a dash and if you run this it's going to fill this up with dashes.

In [31]:
for author,topic,pages in library: 
    print(f"{author:{10}} {topic:{30}} {pages:->{10}}")

Author     Topic                          -----Pages
Twain      Rafting in water alone         -------601
Feyman     Physics                        --------95
Hamilton   Mythology                      -------144


So now you notice that there's five dashes here. And then my five letters equaling a total of 10 characters full.

In [32]:
for author,topic,pages in library: 
    print(f"{author:{10}} {topic:{30}} {pages:.>{10}}")

Author     Topic                          .....Pages
Twain      Rafting in water alone         .......601
Feyman     Physics                        ........95
Hamilton   Mythology                      .......144


Then I'm going to create a date time object which is a specialized object in Python that can hold dates and time information.

In [34]:
today = datetime(year=2019,month=2,day=28)

In [36]:
print(f"{today}")

2019-02-28 00:00:00


Run that and then we can go ahead and print out and I could say with that string literals just print out today run that and it prints out the date time object.

You'll notice right away that the date time object is standard. So even if you just actually grab the object itself today it's a date time object and the standardized way of printing it out is year month date. And it also has time which is our minutes and seconds. We actually didn't provide that information at that time so it just defaulted to zero hour 0 minutes and 0 seconds.

In [37]:
today

datetime.datetime(2019, 2, 28, 0, 0)

Often you're going to want to format this yourself so that it looks nicer and you can do that is simply after your variable name for whatever the date time object is type of colon and then you type in theformatting you want and what you're going to do is you're going to use a specific as `strf time` code or a time reference and you can go to as [strftime](http://strftime.org/) to check out a table of the directives here which essentially means there are special codes of a percent sign.

In [40]:
print(f"{today:%B %d, %Y}")

February 28, 2019


So that concludes our discussion on print formatting. Coming up next we're going to talk about working of text files creating a file opening the file and then reading and writing to it. We'll see you at the next lecture.