# Working with Text Files - Part Two

We're going to do now is go over how to read and write to basic text files with just the built in capabilities of the standard python library.

Let's begin by actually creating a text file. The method I'm going to use is specific to Jupiter notebooks and it's a magic command.


Keep in mind if you're not using Jupiter notebook just use a standard text editor like subline text editor or Adam text editor to save a text file. Again this specific line and what's written here in the cell only works for the Jupiter notebook's system. So please if you're not using Jupiter notebooks just open up a basic text editor and say this is a text file.

In [1]:
%%writefile test.txt
Hello, this is a quick test file.
This is the second line of the file.

Writing test.txt


Now that I have this test that `test.txt` I should be able to open it.

The first thing you should do whatever you're opening a text file regardless of what library using is make sure you understand the files location.

To open up a file you first create the variable and then you say it equals two. And Python has a built in keyword called Open and then in parentheses you passen the path to the file.

```python
myfile = open('')
```

If you happen to provide a file that isn't there. For example let's go ahead and make up a file name like:

In [2]:
myfile = open('whoops.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'whoops.txt'

If you run this you'll get a file not found error and say `Error Number 2 No such file or directory: whoops.txt` That means you're either providing the wrong file path or you misspelled the actual file name. So keep in mind to avoid this error you should have a good awareness of where this notebook is currently located as well as where your text file is currently located.

To figure out where your notebook is currently located you can simply type PWT into a cell Run it and it will tell you where you're currently located.

In [3]:
pwd

'/Users/marcosaguilerakeyser/Projects/advanced-nlp-with-spacy'

So now let's actually grab the test text file that we wrote.

In [4]:
myfile = open('test.txt')

and when you run this you now have your file open and if you actually look at my file object it should just tell you something like this is a wrapper ready to go for this text file and by default it reads it with a mode r which means reading. 

In [5]:
myfile

<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>

But keep in mind if you ever get error number two it means you're either passing in the wrong file path or you misspelled the actual file name.

In general the easiest way to go about this is to make sure your text files are in the same location as your notebook is and you can always figure out the location of your notebook with `pwd`.  

If you want open files that are in another location on your computer, you can just passen the entire file path.

For Windows you need to use double \ so python doesn't treat the second \ as an escape character, a file path is in the form:

```python
myfile = open("C:\\Users\\YourUserName\\Home\\Folder\\myfile.txt")
```

For MacOS and Linux you use slashes in the opposite direction:

```python
myfile = open("/Users/YourUserName/Home/Folder/myfile.txt")
```

Always keep in mind that sometimes text files don't actually explicitly call this `.txt` extension, maybe is just `myfile`.

Ok so now that we've opened up the file we have it saved as this. My file and it's just text IO wrapper object. Well we can do is we can read the file and the way you do that is simply by:

In [6]:
myfile.read()

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

And noticed what happened here. It says "Hello, This is a quick text file." and then we get this backslash `\n` that backslash `\n` is an indicator for a new line. 

Now what's interesting is if I were to try to read this file again it suddenly shows up blank:

In [7]:
myfile.read()

''

So this is often a point of confusion for beginners with Python or reading in textfiles of Python is that you can't just call read multiple times on a file. 

Essentially what's going on is you have a cursor at the very beginning of a text file and after you call the read the cursor goes throughout the entire text file and then returns the entire file as a string.

As we can see here and then the cursor is sitting at the end of the file which means when you call read again it's just going to read from the cursor all the way to the end of the file which in this case it's already called read. There's nothing there. Just an empty string.

In order to fix this we need to do is call my file thought seek and then you can change the cursor to index position 0 which essentially resets the cursor.

In [8]:
myfile.seek(0)

0

I'll report back OK the cursor now is at index 0.

I mean the cursor is now at the beginning of the text file and I can call read again so my file read. And here I see the entire stream again:

In [9]:
myfile.read()

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

So `.read()` is really useful for smaller files where you just want to grab everything and then save as a string.

So let's go ahead and seek zero again so I can say maybe content is equal to my file:

In [12]:
myfile.seek(0)

0

In [13]:
content = myfile.read()

And now content is going to be this save string of the file:

In [14]:
content

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

Content is going to be this save string of the file and since it has its backslash `\n` if ever what actually want to print the content:

In [15]:
print(content)

Hello, this is a quick test file.
This is the second line of the file.



I'll take that into account and then I'll display it as necessary so I'll take into account the escape characters and print out the new line and keep in mind now content is actually the literal string Ill take that into account and then Ill display it as necessary so I'll take into account the escape characters and print out the new line and keep in mind now content is actually the literal string. I no longer need to worry about this seek reading capability off the file OK so that's how you can read in the file as a string.

In [16]:
content

'Hello, this is a quick test file.\nThis is the second line of the file.\n'

The last thing you should know is you should always close the file once you're done working with it. So you can close the file simply by:

In [17]:
myfile.close()

and now that file object has been closed.

The reason you want to make sure to do that is just in case you try opening that text file for another program, if you still have it open in Python, it may cause issues of your operating system. 

If you've ever tried to pull out a USB drive and it said hey some files are still under use. That's the sort of thing that happens if you don't close a file. They'll basically say hey Python is using this text file, I can't open it right now. So always make sure to close your files.

Now we just saw how the dot read method off a file object returns the entire text file as one large string. Often it would be nice if we could iterate through each line in the text file. Notice that each line is essentially segmented by this backslash `\n`. Luckily Python actually has a read line method in addition to the read method which will actually read in each line as a separate item in a python list. Let's go ahead and show you what that looks like.

In [18]:
myfile = open('test.txt')

In [19]:
myfile.readlines()

['Hello, this is a quick test file.\n',
 'This is the second line of the file.\n']

So I just opened the text file and instead of saying read I'm actually going to hit tab here and notice that I have this read lines. That's another really useful method and what this does is if you run this it's going to read in each line as a separate item in the list and it's going to use this `\n` as the last item for.

So now what I could do is let me go back seeker 0.

In [20]:
myfile.seek(0)

0

In [21]:
mylines = myfile.readlines()

In [22]:
mylines

['Hello, this is a quick test file.\n',
 'This is the second line of the file.\n']

And then I could say my lines is equal to my Fylde read lines run that and now I have this my lines object which is a list of every string are essentially every line as a string and then it could iterate through it.

In [23]:
for line in mylines:
    print(line[0])

H
T


Maybe you want to split that actual string into separate words and then grab the first item there and that's going to return back essentially the first set of characters before a space for each line:

In [24]:
for line in mylines:
    print(line.split()[0])

Hello,
This


And you can begin to experiment but that's why read lines is also a very useful method and it's often more common to see red lines than just read. But it really depends on the situation.

>So `read()` is to grab everything as one giant string `readlines()` is the grab every line as a separate string for a list.