# File Handling

To open a file, one can use the in-built `open()` function. 
To open a file one can use the following statment:

`file_object = open(filename, mode)`

When you use the open function, it returns something called a **file object**. 
**File objects** contain methods and attributes that can be used to operate or manipulate on the file you opened. 

`open()` function accepts two arguments.
- **filename** is the name of the file that is to be opened
- **mode** describes the way in which one wants to open a file. Most commonly used "modes" to open a file are **read**, **write** and **append**. To open the file in 
    - **read** mode, the parameter is used is `r`
    - **write** mode, the parameter is used is `w`,
    - **append** mode, the parameter is used is `a`,

To have a look at some other modes one can refer the docs https://docs.python.org/3/tutorial/inputoutput.html

Following is an example:

In [27]:
fPtr = open('WhipHoverTweets.txt', 'r')
print fPtr

<open file 'WhipHoverTweets.txt', mode 'r' at 0x7f55b8138e40>


**Note-:** The command `open('README.md', 'r')` doesn't return the contents of the file. It actually makes something called a `file object` -- https://docs.python.org/3/glossary.html#term-file-object.

## Reading a file

The method `file.read()` reads the content of the file and returns a string containing all characters in the file.
**Note-1:** The size of the file could be large enough to fit in the main memory. In such cases one can specify the number of characters to be read, or maybe read one line at a time.
**Note-2:** Its always better to `close()` the file once you are done processing the file contents.

In [28]:
fPtr = open('WhipHoverTweets.txt', 'r')

print fPtr.read()

fPtr.close()

1500128720 Happy_Birthday_to_You Germanic_strong_verb Human_voice Middle_class 
1500132526 Politics Partisan_(political) 
1500215327 Happy_Birthday_to_You Friendship Chairman Manufacturing Caucus 
1500218055 Happy_Birthday_to_You Friendship Chairman Australian_Democrats Party_leaders_of_the_United_States_House_of_Representatives Task_force Poverty Income Economic_inequality Opportunity_(rover) 
1500227651 File_sharing Narrative Impact_event 
1500297327 History_of_the_United_States_Republican_Party History_of_the_United_States_Republican_Party Arab_Spring Impact_event United_States Family United_States Familie 
1500308034 Today_(U.S._TV_program) Floor United_States_House_of_Representatives United_States_House_of_Representatives Act_of_Parliament 
1500308105 United_States_federal_budget Complete_game 
1500309658 Happy_Birthday_to_You Friendship Family Social_vulnerability 
1500320032 United_States Patient_Protection_and_Affordable_Care_Act Film_adaptation Walter_Cronkite Insurance 



To read only specific number of characters from a file, one can add an **int** argument to the `read()` function as follows:

In [29]:
fPtr = open('WhipHoverTweets.txt', 'r')

print fPtr.read(7)

1500128


In [30]:
fPtr = open('WhipHoverTweets.txt', 'r')

for line in fPtr:
    print line.strip()

1500128720 Happy_Birthday_to_You Germanic_strong_verb Human_voice Middle_class
1500132526 Politics Partisan_(political)
1500215327 Happy_Birthday_to_You Friendship Chairman Manufacturing Caucus
1500218055 Happy_Birthday_to_You Friendship Chairman Australian_Democrats Party_leaders_of_the_United_States_House_of_Representatives Task_force Poverty Income Economic_inequality Opportunity_(rover)
1500227651 File_sharing Narrative Impact_event
1500297327 History_of_the_United_States_Republican_Party History_of_the_United_States_Republican_Party Arab_Spring Impact_event United_States Family United_States Familie
1500308034 Today_(U.S._TV_program) Floor United_States_House_of_Representatives United_States_House_of_Representatives Act_of_Parliament
1500308105 United_States_federal_budget Complete_game
1500309658 Happy_Birthday_to_You Friendship Family Social_vulnerability
1500320032 United_States Patient_Protection_and_Affordable_Care_Act Film_adaptation Walter_Cronkite Insurance


## Reading multiple files from a folder

Method `listdir(path)` returns a list of elements present in the directory given by path. 
The list is in arbitrary order. It does not include the special entries **.** and **..**.
To use the `listdir(path)` method one has to import **os** module.

In [31]:
import os

for fileName in os.listdir("./"):
    # getting the list of text files only...
    if fileName.endswith(".txt"): 
        print(fileName)

WhipHoverTweets.txt
aguilarpeteTweets.txt
WhipHoverTweets_copy.txt


## Writing to a file

To write contents to a file one can use the in-built `write()` method which accepts a string as an argument.
Here is an example which writes the content of one file to another line by line:

In [32]:
fPtrWrite = open('WhipHoverTweets_copy.txt', 'w')
fPtrRead = open('WhipHoverTweets.txt', 'r')

for line in fPtrRead:
    fPtrWrite.write(line)
    
fPtrWrite.close()
fPtrRead.close()

# check if the contents are copied to the new file...
fPtrRead = open('WhipHoverTweets_copy.txt', 'r')

for line in fPtrRead:
    print(line.strip())
    
fPtrRead.close()

1500128720 Happy_Birthday_to_You Germanic_strong_verb Human_voice Middle_class
1500132526 Politics Partisan_(political)
1500215327 Happy_Birthday_to_You Friendship Chairman Manufacturing Caucus
1500218055 Happy_Birthday_to_You Friendship Chairman Australian_Democrats Party_leaders_of_the_United_States_House_of_Representatives Task_force Poverty Income Economic_inequality Opportunity_(rover)
1500227651 File_sharing Narrative Impact_event
1500297327 History_of_the_United_States_Republican_Party History_of_the_United_States_Republican_Party Arab_Spring Impact_event United_States Family United_States Familie
1500308034 Today_(U.S._TV_program) Floor United_States_House_of_Representatives United_States_House_of_Representatives Act_of_Parliament
1500308105 United_States_federal_budget Complete_game
1500309658 Happy_Birthday_to_You Friendship Family Social_vulnerability
1500320032 United_States Patient_Protection_and_Affordable_Care_Act Film_adaptation Walter_Cronkite Insurance


## split() method

`someString.split(' ')` method is one of the most commonly used method to process the contents of the file.
`split()` method accpets a **delimiter** character.
In order to **split** the contents of the line on **space**, following is an example:

In [33]:
fPtr = open('WhipHoverTweets.txt', 'r')

for line in fPtr:
    fields = line.split(' ')
    print fields
    
fPtr.close()

['1500128720', 'Happy_Birthday_to_You', 'Germanic_strong_verb', 'Human_voice', 'Middle_class', '\n']
['1500132526', 'Politics', 'Partisan_(political)', '\n']
['1500215327', 'Happy_Birthday_to_You', 'Friendship', 'Chairman', 'Manufacturing', 'Caucus', '\n']
['1500218055', 'Happy_Birthday_to_You', 'Friendship', 'Chairman', 'Australian_Democrats', 'Party_leaders_of_the_United_States_House_of_Representatives', 'Task_force', 'Poverty', 'Income', 'Economic_inequality', 'Opportunity_(rover)', '\n']
['1500227651', 'File_sharing', 'Narrative', 'Impact_event', '\n']
['1500297327', 'History_of_the_United_States_Republican_Party', 'History_of_the_United_States_Republican_Party', 'Arab_Spring', 'Impact_event', 'United_States', 'Family', 'United_States', 'Familie', '\n']
['1500308034', 'Today_(U.S._TV_program)', 'Floor', 'United_States_House_of_Representatives', 'United_States_House_of_Representatives', 'Act_of_Parliament', '\n']
['1500308105', 'United_States_federal_budget', 'Complete_game', '\n']


The `split()` method returns a **list** of values that are split on the delimiter **space** 