## File Handling

So far we have seen different Python data types. We usually store our data in different file formats. In addition to handling files, we will also see different file formats(.txt, .json, .xml, .csv, .tsv, .excel) in this section. First, let us get familiar with handling files with common file format(.txt).

File handling is an import part of programming which allows us to create, read, update and delete files. In Python to handle data we use _open()_ built-in function.

```py
# Syntax
open('filename', mode) # mode(r, a, w, x, t,b)  could be to read, write, update
```

- "r" - Read - Default value. Opens a file for reading, it returns an error if the file does not exist
- "a" - Append - Opens a file for appending, creates the file if it does not exist
- "w" - Write - Opens a file for writing, creates the file if it does not exist
- "x" - Create - Creates the specified file, returns an error if the file exists
- "t" - Text - Default value. Text mode
- "b" - Binary - Binary mode (e.g. images)

### Opening Files for Reading

The default mode of _open_ is reading, so we do not have to specify 'r' or 'rt'. I have created and saved a file named reading_file_example.txt in the files directory. Let us see how it is done:

```py
f = open('./files/reading_file_example.txt')
print(f) # <_io.TextIOWrapper name='./files/reading_file_example.txt' mode='r' encoding='UTF-8'>
```

As you can see in the example above, I printed the opened file and it gave  some information about it. Opened file has different reading methods: _read()_, _readline_, _readlines_. An opened file has to be closed with _close()_ method.

- _read()_: read the whole text as string. If we want to limit the number of characters we want to read, we can limit it by passing int value to the *read(number)* method.

```py
f = open('./files/reading_file_example.txt')
txt = f.read()
print(type(txt))
print(txt)
f.close()
```

```sh
# output
<class 'str'>
This is an example to show how to open a file and read.
This is the second line of the text.
```

Instead of printing all the text, let us print the first 10 characters of the text file.

```py
f = open('./files/reading_file_example.txt')
txt = f.read(10)
print(type(txt))
print(txt)
f.close()
```

```sh
# output
<class 'str'>
This is an
```

- _readline()_: read only the first line

```py
f = open('./files/reading_file_example.txt')
line = f.readline()
print(type(line))
print(line)
f.close()
```

```sh
# output
<class 'str'>
This is an example to show how to open a file and read.
```

- _readlines()_: read all the text line by line and returns a list of lines

```py
f = open('./files/reading_file_example.txt')
lines = f.readlines()
print(type(lines))
print(lines)
f.close()
```

```sh
# output
<class 'list'>
['This is an example to show how to open a file and read.\n', 'This is the second line of the text.']
```

Another way to get all the lines as a list is using _splitlines()_:

```py
f = open('./files/reading_file_example.txt')
lines = f.read().splitlines()
print(type(lines))
print(lines)
f.close()
```

```sh
# output
<class 'list'>
['This is an example to show how to open a file and read.', 'This is the second line of the text.']
```

After we open a file, we should close it. There is a high tendency of forgetting to close them. There is a new way of opening files using _with_ - closes the files by itself. Let us rewrite the the previous example with the _with_ method:

```py
with open('./files/reading_file_example.txt') as f:
    lines = f.read().splitlines()
    print(type(lines))
    print(lines)
```

```sh
# output
<class 'list'>
['This is an example to show how to open a file and read.', 'This is the second line of the text.']
```

### Opening Files for Writing and Updating

To write to an existing file, we must add a mode as parameter to the _open()_ function:

- "a" - append - will append to the end of the file, if the file does not it creates a new file.
- "w" - write - will overwrite any existing content, if the file does not exist it creates.

Let us append some text to the file we have been reading:

```py
with open('./files/reading_file_example.txt','a') as f:
    f.write('This text has to be appended at the end')
```

The method below creates a new file, if the file does not exist:

```py
with open('./files/writing_file_example.txt','w') as f:
    f.write('This text will be written in a newly created file')
```

### Deleting Files

We have seen in previous section, how to make and remove a directory using _os_ module. Again now, if we want to remove a file we use _os_ module.

```py
import os
os.remove('./files/example.txt')

```

If the file does not exist, the remove method will raise an error, so it is good to use a condition like this:

```py
import os
if os.path.exists('./files/example.txt'):
    os.remove('./files/example.txt')
else:
    print('The file does not exist')
```

## File Types

### File with txt Extension

File with _txt_ extension is a very common form of data and we have covered it in the previous section. Let us move to the JSON file

### File with json Extension

JSON stands for JavaScript Object Notation. Actually, it is a stringified JavaScript object or Python dictionary.

_Example:_

```py
# dictionary
person_dct= {
    "name":"Asabeneh",
    "country":"Finland",
    "city":"Helsinki",
    "skills":["JavaScrip", "React","Python"]
}
# JSON: A string form a dictionary
person_json = "{'name': 'Asabeneh', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}"

# we use three quotes and make it multiple line to make it more readable
person_json = '''{
    "name":"Asabeneh",
    "country":"Finland",
    "city":"Helsinki",
    "skills":["JavaScrip", "React","Python"]
}'''
```

### Changing JSON to Dictionary

To change a JSON to a dictionary, first we import the json module and then we use _loads_ method.

```py
import json
# JSON
person_json = '''{
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}'''
# let's change JSON to dictionary
person_dct = json.loads(person_json)
print(type(person_dct))
print(person_dct)
print(person_dct['name'])
```

```sh
# output
<class 'dict'>
{'name': 'Asabeneh', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}
Asabeneh
```

### Changing Dictionary to JSON

To change a dictionary to a JSON we use _dumps_ method from the json module.

```py
import json
# python dictionary
person = {
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}
# let's convert it to  json
person_json = json.dumps(person, indent=4) # indent could be 2, 4, 8. It beautifies the json
print(type(person_json))
print(person_json)
```

```sh
# output
# when you print it, it does not have the quote, but actually it is a string
# JSON does not have type, it is a string type.
<class 'str'>
{
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": [
        "JavaScrip",
        "React",
        "Python"
    ]
}
```

### Saving as JSON File

We can also save our data as a json file. Let us save it as a json file using the following steps. For writing a json file, we use the json.dump() method, it can take dictionary, output file, ensure_ascii and indent.

```py
import json
# python dictionary
person = {
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}
with open('./files/json_example.json', 'w', encoding='utf-8') as f:
    json.dump(person, f, ensure_ascii=False, indent=4)
```

In the code above, we use encoding and indentation. Indentation makes the json file easy to read.

### File with csv Extension

CSV stands for comma separated values. CSV is a simple file format used to store tabular data, such as a spreadsheet or database. CSV is a very common data format in data science.

**Example:**

```csv
"name","country","city","skills"
"Asabeneh","Finland","Helsinki","JavaScript"
```

**Example:**

```py
import csv
with open('./files/csv_example.csv') as f:
    csv_reader = csv.reader(f, delimiter=',') # w use, reader method to read csv
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are :{", ".join(row)}')
            line_count += 1
        else:
            print(
                f'\t{row[0]} is a teachers. He lives in {row[1]}, {row[2]}.')
            line_count += 1
    print(f'Number of lines:  {line_count}')
```

```sh
# output:
Column names are :name, country, city, skills
        Asabeneh is a teacher. He lives in Finland, Helsinki.
Number of lines:  2
```

### File with xlsx Extension

To read excel files we need to install _xlrd_ package. We will cover this after we cover package installing using pip.

```py
import xlrd
excel_book = xlrd.open_workbook('sample.xls)
print(excel_book.nsheets)
print(excel_book.sheet_names)
```

### File with xml Extension

XML is another structured data format which looks like HTML. In XML the tags are not predefined. The first line is an XML declaration. The person tag is the root of the XML. The person has a gender attribute.
**Example:XML**

```xml
<?xml version="1.0"?>
<person gender="female">
  <name>Asabeneh</name>
  <country>Finland</country>
  <city>Helsinki</city>
  <skills>
    <skill>JavaScrip</skill>
    <skill>React</skill>
    <skill>Python</skill>
  </skills>
</person>
```

For more information on how to read an XML file check the [documentation](https://docs.python.org/2/library/xml.etree.elementtree.html)

```py
import xml.etree.ElementTree as ET
tree = ET.parse('./files/xml_example.xml')
root = tree.getroot()
print('Root tag:', root.tag)
print('Attribute:', root.attrib)
for child in root:
    print('field: ', child.tag)
```

```sh
# output
Root tag: person
Attribute: {'gender': 'male'}
field: name
field: country
field: city
field: skills
```


In [1]:
open(r"E:\Besant Python\my_text.txt.txt", 'r')

<_io.TextIOWrapper name='E:\\Besant Python\\my_text.txt.txt' mode='r' encoding='cp1252'>

In [2]:
open(r"E:\Besant Python\my_text.txt.txt", 'a')

<_io.TextIOWrapper name='E:\\Besant Python\\my_text.txt.txt' mode='a' encoding='cp1252'>

In [3]:
open(r"E:\Besant Python\new_text.txt", 'a')

<_io.TextIOWrapper name='E:\\Besant Python\\new_text.txt' mode='a' encoding='cp1252'>

In [4]:
a = open("E:\Besant Python\my_text.txt.txt", 'r')

In [5]:
print(a)

<_io.TextIOWrapper name='E:\\Besant Python\\my_text.txt.txt' mode='r' encoding='cp1252'>


In [6]:
a.close()

In [7]:
f = open('read_open.txt')
txt = f.read(20)
print(type(txt))
print(txt)
f.close()

<class 'str'>
Hello, this is a dem


In [8]:
f = open('read_open.txt')
text = f.readline()
print(text)
f.close()

Hello, this is a demo file for file handling in Python.


In [14]:
a = open('read_open.txt')
text = a.read(10)
print(text)
a.close()

Hello, thi


In [13]:
print(open('read_open.txt').readline()) # Don't use this, was not able to close.

Hello, this is a demo file for file handling in Python.


In [16]:
print(open('read_open.txt').readlines()) # For multiple lines.

['Hello, this is a demo file for file handling in Python.\n', 'This is the 2nd line.\n', 'This is the third line.']


In [18]:
# Print using single line of code
print(open('read_open.txt').read().splitlines()[0])

Hello, this is a demo file for file handling in Python.


In [23]:
# Read wikipedia file
f = open(r"E:\Besant Python\Wikipedia.txt", 'r')
text = f.readlines()
print(text)
f.close()

["The Category consists of more than 200 images, thus divided to two pages. But not only the image list: the category list is also divided into two parts. The second part can be seen on the second page, including Astronomical object's category. With the custom sorting also make it more chaotic, it's impossible to find it for people who don't know the system and its logic well enough."]


In [25]:
# Popular syntax for reading
with open(r"E:\Besant Python\Wikipedia.txt", 'r') as f:
    text = f.readlines()
    print(text[0])

The Category consists of more than 200 images, thus divided to two pages. But not only the image list: the category list is also divided into two parts. The second part can be seen on the second page, including Astronomical object's category. With the custom sorting also make it more chaotic, it's impossible to find it for people who don't know the system and its logic well enough.


In [26]:
# Appendind
with open('append.txt','a') as f:
    f.write('This is my sentence.')

In [27]:
import os # To get current working directory.

os.getcwd() # append.txt will be saved in this path.

'E:\\Besant Python'

In [29]:
# Appending - Adds line to existing lines.
with open(r"E:\Besant Python\Wikipedia.txt", 'a') as f:
    f.write('My name is Naveen.')

In [30]:
# Reading
with open(r"E:\Besant Python\Wikipedia.txt", 'r') as f:
    text = f.read().splitlines()
    print(text)

["The Category consists of more than 200 images, thus divided to two pages. But not only the image list: the category list is also divided into two parts. The second part can be seen on the second page, including Astronomical object's category. With the custom sorting also make it more chaotic, it's impossible to find it for people who don't know the system and its logic well enough.My name is Naveen.My name is Naveen."]


In [32]:
# Writing - Overwrites everything
with open(r"E:\Besant Python\Wikipedia.txt", 'w') as f:
    f.write('My name is Naveen.')

In [33]:
# Reading
with open(r"E:\Besant Python\Wikipedia.txt", 'r') as f:
    text = f.read().splitlines()
    print(text)

['My name is Naveen.']


In [34]:
# Creating
a = open('create.txt', 'x')
a.close()

In [35]:
import os
os.getcwd()

'E:\\Besant Python'

In [36]:
# Deleting
import os
os.remove('create.txt')

In [1]:
import pandas as pd

In [2]:
pd.read_csv(r"C:\Users\navi\Downloads\SampleData.csv")

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
0,1/6/21,East,Jones,Pencil,95,1.99,189.05
1,1/23/21,Central,Kivell,Binder,50,19.99,999.5
2,2/9/21,Central,Jardine,Pencil,36,4.99,179.64
3,2/26/21,Central,Gill,Pen,27,19.99,539.73
4,3/15/21,West,Sorvino,Pencil,56,2.99,167.44
5,4/1/21,East,Jones,Binder,60,4.99,299.4
6,4/18/21,Central,Andrews,Pencil,75,1.99,149.25
7,5/5/21,Central,Jardine,Pencil,90,4.99,449.1
8,5/22/21,West,Thompson,Pencil,32,1.99,63.68
9,6/8/21,East,Jones,Binder,60,8.99,539.4


In [25]:
js = '{"Name": "Naveen", "subject": "Python"}'

In [26]:
js

'{"Name": "Naveen", "subject": "Python"}'

In [27]:
import json

In [28]:
dict1 = json.loads(js)

In [30]:
dict1['Name']

'Naveen'

In [31]:
type(dict1)

dict

In [43]:
js = json.dumps(dict1)

In [42]:
print(json.dumps(dict1,indent = 3))

{
   "Name": "Naveen",
   "subject": "Python"
}


In [44]:
type(js)

str

In [46]:
dict1

{'Name': 'Naveen', 'subject': 'Python'}

In [49]:
with open('My_json.json', 'w') as f:
    json.dump(dict1, f)

In [None]:
import pandas as pd
csv = read_csv('path')

In [1]:
import xml.etree.ElementTree as ET
tree = ET.parse(r"E:\Besant Python\file.xml")
root = tree.getroot()
print('Root tag:', root.tag)
print('Attribute:', root.attrib)
for child in root:
    print('field: ', child.tag)

Root tag: catalog
Attribute: {}
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book
field:  book


In [3]:
f = open(r"E:\Besant Python\Image2.jpg", 'rb')

for i in f:
    print(i)

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00H\x00H\x00\x00\xff\xe2\x03\xbcICC_PROFILE\x00\x01\x01\x00\x00\x03\xacKCMS\x02\x10\x00\x00mntrRGB XYZ \x07\xce\x00\x0c\x00\x01\x00\x12\x00:\x00\x15acspMSFT\x00\x00\x00\x00KODAROMM\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf6\xd6\x00\x01\x00\x00\x00\x00\xd3+KODA\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0ccprt\x00\x00\x01\x14\x00\x00\x00Hdesc\x00\x00\x01\\\x00\x00\x00\x83wtpt\x00\x00\x01\xe0\x00\x00\x00\x14rTRC\x00\x00\x01\xf4\x00\x00\x00\x0egTRC\x00\x00\x01\xf4\x00\x00\x00\x0ebTRC\x00\x00\x01\xf4\x00\x00\x00\x0erXYZ\x00\x00\x02\x04\x00\x00\x00\x14gXYZ\x00\x00\x02\x18\x00\x00\x00\x14bXYZ\x00\x00\x02,\x00\x00\x00\x14dmnd\x00\x00\x02@\x00\x00\x00ndmdd\x00\x00\x02\xb0\x00\x00\x00\xd1mmod\x00\x00\x03\x84\x00\x00\x00(text\x00\x00\x00\x00Copyright (c) Eastman Kodak Company, 1999, all r

In [4]:
from PIL import Image

In [5]:
img = Image.open(r"E:\Besant Python\Image2.jpg")

In [6]:
print(img)

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1000x1250 at 0x10A305E0CA0>


In [7]:
img.show()

In [5]:
import os # Changing working directory

In [6]:
os.getcwd()

'E:\\Datasets\\archive (4)'

In [7]:
os.chdir(r"E:\Datasets\archive (4)")

In [8]:
os.getcwd()

'E:\\Datasets\\archive (4)'