# Python File I/O

In this session, you'll learn about Python file Methods.

## Python File Methods



Here is the complete list of methods in text mode with a brief description:

| Method | Description |
|:----| :--- |
| **`close()`** |   Closes an opened file. It has no effect if the file is already closed.   | 
| **`detach()`** |   Separates the underlying binary buffer from the **`TextIOBase`** and returns it.   | 
| **`fileno()`** |   Returns an integer number (file descriptor) of the file.   | 
| **`flush()`** |   Flushes the write buffer of the file stream.   | 
| **`isatty()`** |   Returns **`True`** if the file stream is interactive.   | 
| **`read(n)`** |   Reads at most `n` characters from the file. Reads till end of file if it is negative or `None`.   | 
| **`readable()`** |   Returns **`True`** if the file stream can be read from.   | 
| **`readline(n=-1)`** |   Reads and returns one line from the file. Reads in at most **`n`** bytes if specified.   | 
| **`readlines(n=-1)`** |   Reads and returns a list of lines from the file. Reads in at most **`n`** bytes/characters if specified.   | 
| **`seek(offset,from=SEEK_SET)`** |   Changes the file position to **`offset`** bytes, in reference to `from` (start, current, end).   | 
| **`seekable()`** |   Returns **`True`** if the file stream supports random access.   | 
| **`tell()`** |   Returns the current file location.   | 
| **`truncate(size=None)`** |   Resizes the file stream to **`size`** bytes. If **`size`** is not specified, resizes to current location..   | 
| **`writable()`** |   Returns **`True`** if the file stream can be written to.   | 
| **`write(s)`** |   Writes the string **`s`** to the file and returns the number of characters written..   | 
| **`writelines(lines)`** |   Writes a list of **`lines`** to the file..   | 

## Deleting Files



In [1]:
import os
os.remove('example.txt')

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'example.txt'

In [2]:
import os
if os.path.exists('./files/example.txt'):
    os.remove('./files/example.txt')
else:
    print('The file does not exist')

The file does not exist


## File Types

### File with txt Extension

File with **txt** extension is a very common form of data and we have covered it in the previous section. Let us move to the JSON file.

### File with json Extension

**JSON** stands for **J**ava**S**cript **O**bject **N**otation. Actually, it is a stringified JavaScript object or Python dictionary.

In [3]:
# dictionary
person_dct= {
    "name":"Ajantha",
    "country":"India",
    "city":"Chennai",
    "skills":["Python", "MATLAB","R"]
}
# JSON: A string form a dictionary
person_json = "{'name': 'Ajantha', 'country': 'India', 'city': 'Chennai', 'skills': ['Python', 'MATLAB','R']}"

# we use three quotes and make it multiple line to make it more readable
person_json = '''{
    "name":"Ajantha",
    "country":"India",
    "city":"Chennai",
    "skills":["Python", "MATLAB","R"]
}'''

### Changing JSON to Dictionary

To change a JSON to a dictionary, first we import the json module and then we use **`loads`** method.

In [4]:
import json
# JSON
person_json = '''{
    "name":"Ajantha",
    "country":"India",
    "city":"Chennai",
    "skills":["Python", "MATLAB","R"]
}'''
# let's change JSON to dictionary
person_dct = json.loads(person_json)
print(type(person_dct))
print(person_dct)
print(person_dct['name'])

<class 'dict'>
{'name': 'Ajantha', 'country': 'India', 'city': 'Chennai', 'skills': ['Python', 'MATLAB', 'R']}
Ajantha


### Changing Dictionary to JSON

To change a dictionary to a JSON we use **`dumps`** method from the json module.

In [5]:
import json
# python dictionary
person = {
    "name":"Ajantha",
    "country":"India",
    "city":"Chennai",
    "skills":["Python", "MATLAB","R"]
}
# let's convert it to  json
person_json = json.dumps(person, indent=4) # indent could be 2, 4, 8. It beautifies the json
print(type(person_json))
print(person_json)

# when you print it, it does not have the quote, but actually it is a string
# JSON does not have type, it is a string type.

<class 'str'>
{
    "name": "Ajantha",
    "country": "India",
    "city": "Chennai",
    "skills": [
        "Python",
        "MATLAB",
        "R"
    ]
}


### Saving as JSON File



In [6]:
import json
# python dictionary
person = {
    "name":"Ajantha",
    "country":"India",
    "city":"Chennai",
    "skills":["Python", "MATLAB","R"]
}
with open('json_example.json', 'w', encoding='utf-8') as f:
    json.dump(person, f, ensure_ascii=False, indent=4)

### File with csv Extension

For example, create **csv_example.csv** in your working directory with the following contents:

```csv
"name","country","city","skills"
"Ajantha","India","Chennai","Python"
```

In [7]:
import csv
with open('csv_example.csv') as f:
    csv_reader = csv.reader(f, delimiter=',') # w use, reader method to read csv
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are :{", ".join(row)}')
            line_count += 1
        else:
            print(
                f'\t{row[0]} is a Researcher. She lives in {row[1]}, {row[2]}.')
            line_count += 1
    print(f'Number of lines:  {line_count}')

Column names are :name, country, city, skills
	Ajantha is a Researcher. She lives in India, Chennai.
Number of lines:  2


### File with xlsx Extension

To read excel files we need to install **`xlrd`** package. We will cover this after we cover package installing using **pip**.

```py
import xlrd
excel_book = xlrd.open_workbook('sample.xls)
print(excel_book.nsheets)
print(excel_book.sheet_names)
```

### File with xml Extension

**XML** is another structured data format which looks like HTML. In XML the tags are not predefined. The first line is an XML declaration. The person tag is the root of the XML. The person has a gender attribute.

```xml
<?xml version="1.0"?>
<person gender="female">
  <name>Ajantha</name>
  <country>India</country>
  <city>Chennai</city>
  <skills>
    <skill>AI</skill>
    <skill>ML</skill>
    <skill>Python</skill>
  </skills>
</person>
```

In [8]:
import xml.etree.ElementTree as ET
tree = ET.parse('xml_example.xml')
root = tree.getroot()
print('Root tag:', root.tag)
print('Attribute:', root.attrib)
for child in root:
    print('field: ', child.tag)

Root tag: person
Attribute: {'gender': 'female'}
field:  name
field:  country
field:  city
field:  skills


## 💻 Exercises ➞ <span class='label label-default'>Files</span>

### Exercises ➞ <span class='label label-default'>Level 1</span>

1. Write a function which count number of lines and number of words in a text.
  - a) Read **[speech_barack_obama.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_barack_obama.txt)** file and count number of lines and words
  - b) Read **[speech_michelle_obama.txt ](https://github.com/milaan9/05_Python_Files/blob/main/speech_michelle_obama.txt)** file and count number of lines and words
  - c) Read **[speech_donald_trump.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_donald_trump.txt)** file and count number of lines and words
  - d) Read **[speech_melina_trump.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_melina_trump.txt)** file and count number of lines and words
    
2. Read the **[countries_data.json](https://github.com/milaan9/05_Python_Files/blob/main/countries_data.json)** data file in data directory, create a function that finds the ten most spoken languages

    - ```py
   # Your output should look like this:
   print(most_spoken_languages(filename='./countries_data.json', 10))
   [(91, 'English'),
   (45, 'French'),
   (25, 'Arabic'),
   (24, 'Spanish'),
   (9, 'Russian'),
   (9, 'Portuguese'),
   (8, 'Dutch'),
   (7, 'German'),
   (5, 'Chinese'),
   (4, 'Swahili'),
   (4, 'Serbian')]
   # Your output should look like this:
   print(most_spoken_languages(filename='./countries_data.json', 3))
   [(91, 'English'),
   (45, 'French'),
   (25, 'Arabic')]
   ```

3. Read the **[countries_data.json](https://github.com/milaan9/05_Python_Files/blob/main/countries_data.json)** data file and create a function that creates a list of the ten most populated countries

    - ```py
   # Your output should look like this:
   print(most_populated_countries(filename='./countries_data.json', 10))
   [
   {'country': 'China', 'population': 1377422166},
   {'country': 'India', 'population': 1295210000},
   {'country': 'United States of America', 'population': 323947000},
   {'country': 'Indonesia', 'population': 258705000},
   {'country': 'Brazil', 'population': 206135893},
   {'country': 'Pakistan', 'population': 194125062},
   {'country': 'Nigeria', 'population': 186988000},
   {'country': 'Bangladesh', 'population': 161006790},
   {'country': 'Russian Federation', 'population': 146599183},
   {'country': 'Japan', 'population': 126960000}
   ]
   # Your output should look like this:
   print(most_populated_countries(filename='./countries_data.json', 3))
   [
   {'country': 'China', 'population': 1377422166},
   {'country': 'India', 'population': 1295210000},
   {'country': 'United States of America', 'population': 323947000}
   ]
   ```


### Exercises ➞ <span class='label label-default'>Level 2</span>

1. Extract all incoming email addresses as a list from the **[email_exchanges_big.txt](https://github.com/milaan9/05_Python_Files/blob/main/email_exchanges_big.txt)** file.

2. Find the most common words in the English language. Call the name of your function **`find_most_common_words`**, it will take two parameters - a string or a file and a positive integer, indicating the number of words. Your function will return an array of tuples in descending order. Check the output

    - ```py
    # Your output should look like this
    print(find_most_common_words('sample.txt', 10))
    [(10, 'the'),
    (8, 'be'),
    (6, 'to'),
    (6, 'of'),
    (5, 'and'),
    (4, 'a'),
    (4, 'in'),
    (3, 'that'),
    (2, 'have'),
    (2, 'I')]

    # Your output should look like this
    print(find_most_common_words('sample.txt', 5))

    [(10, 'the'),
    (8, 'be'),
    (6, 'to'),
    (6, 'of'),
    (5, 'and')]
    ```

3. Use the function, find_most_frequent_words to find:
  - a) The ten most frequent words used in **[Barack Obama's Speech.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_barack_obama.txt)**
  - b) The ten most frequent words used in **[Michelle Obama's Speech.txt ](https://github.com/milaan9/05_Python_Files/blob/main/speech_michelle_obama.txt)**
  - c) The ten most frequent words used in **[Donald Trump's Speech.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_donald_trump.txt)**
  - d) The ten most frequent words used in **[Melina Trump's Speech.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_melina_trump.txt)**
  
4. Write a python application that checks similarity between two texts. It takes a file or a string as a parameter and it will evaluate the similarity of the two texts. For instance check the similarity between the transcripts of **[Michelle Obama's Speech.txt ](https://github.com/milaan9/05_Python_Files/blob/main/speech_michelle_obama.txt)** and **[Melina Trump's Speech.txt](https://github.com/milaan9/05_Python_Files/blob/main/speech_melina_trump.txt)** speech. You may need a couple of functions, function to clean the text (**`clean_text`**), function to remove support words (**`remove_support_words`**) and finally to check the similarity (**`check_text_similarity`**). List of **[support_words](https://github.com/milaan9/05_Python_Files/blob/main/support_words.py)**.

5. Find the 10 most repeated words in the **[romeo_and_juliet.txt](https://github.com/milaan9/05_Python_Files/blob/main/romeo_and_juliet.txt)**.

6. Read the **[hacker_news.csv](https://github.com/milaan9/05_Python_Files/blob/main/hacker_news.csv)** file and find out:
  - a) Count the number of lines containing **`python`** or **`Python`**
  - b) Count the number lines containing **`JavaScript`**, **`javascript`** or **`Javascript`**
  - c) Count the number lines containing **`Java`** and not **`JavaScript`**