# Introduction - File Opening

## Opening a file

<p>To open a file for writing use the built-i open() function. open() returns a file object, and is most commonly used with two arguments.<br/>

The syntax is:<br/>
file_object = open(filename, mode)<br/>
where file_object is the variable to put the file object. The second argument describes the way in which the file will be used.
</p>

In [6]:
file = open('Data/Basics_NLP/gaur.txt', 'r')

**Note-:** The command "open('newfile.txt', 'r')" doesn't return the contents of the file. It actually makes something called a "file object." You can think of a file like an old tape drive that you saw on mainframe computers in the 1950s, or even like a DVD player from today. You can move around inside them, and then "read" them, but the DVD player is not the DVD the same way the file object is not the file's contents.

## Reading a file

If you want to return a string containing all characters in the file, you can
use file.read().

In [1]:
file = open('Data/Basics_NLP/gaur.txt', 'r')

We can also specify how many characters the string should return, by using
file.read(n), where "n" determines number of characters.<br/>

This reads the first 5 characters of data and returns it as a string.

In [2]:
file = open('Data/Basics_NLP/gaur.txt', 'r')

print file.read(7)

Love is


## Reading multiple files from a folder

The method listdir() returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

In [3]:
import os

for fileName in os.listdir("Data/Basics_NLP/dirForBasics_NLP"):
    if fileName.endswith(".txt"):
        print(fileName)

test2.txt
test1.txt


## Reading a file with size greater than RAM

If the size of the file is very large then it can't be opened directly and the system may hang in an attempt to do so. So we use lazy loading of the file that reads the data in chunks

In [2]:
def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


f = open('Data/Basics_NLP/nitai.pdf')
for piece in read_in_chunks(f):
    read_in_chunks(piece)

## Lazy loading of a gzip file

In [3]:
import gzip
f=gzip.open('Data/Basics_NLP/errors.css.gz','rb')
file_content=f.read()
# print file_content

### Reading the Jupyter Way

In [4]:
a = !ls Data/Basics_NLP/dirForBasics_NLP
    
for files in a:
    f = open('Data/Basics_NLP/dirForBasics_NLP/'+files).read()
    print f[:10]

void show_
void datat


### Practice

 1. How do you read a file to the memory and split it line by line
 2. How do you read line by line to the memory?
 
 Hint : `splitlines() or split()`, `with .. as ..`

### ANSWERS

In [12]:
"""
1. To read a file into the memory and split it line by line
"""
file = open('Data/Basics_NLP/gaur.txt', 'r')

print file.read().splitlines()

['Love is everlasting Forgiveness.', "Having free time is not an opulence, it's a danger.", 'Krishna is the Supreme Personality of Godhead. ']


In [19]:
"""
2. To read line by line to the memory
"""
with open('Data/Basics_NLP/gaur.txt', 'r') as f:
    for line in f:
        print line

Love is everlasting Forgiveness.

Having free time is not an opulence, it's a danger.

Krishna is the Supreme Personality of Godhead. 
