### Some Theory

##### Types of data used for I/O:
- Text - '12345' as a sequence of unicode chars
- Binary - 12345 as a sequence of bytes of its binary equivalent

##### Hence there are 2 file types to deal with
- Text files - All program files are text files
- Binary Files - Images,music,video,exe files

### How File I/O is done in most programming languages

- Open a file
- Read/Write data
- Close the file

Writing to a file


In [25]:
#case 1 - If the file is not present

f = open('sample.txt','w')
f.write('hello world')
f.close()
# since file is closed hence this will not work
f.write('heelo2')

ValueError: I/O operation on closed file.

In [1]:
#write multiline string
f = open('sample2.txt','w')
f.write('hello world')
f.write('\nhow are you')
f.close()

In [None]:
# case 2 - if the file is already present
f = open('sample.txt','w')
f.write('salman khan')
f.close()

In [None]:
# how exactly open() works ?


In [None]:
#Problem with w mode
# introducing the append mode

f= open('sample.txt','a')
f.write('\nI am fine')
f.close()

In [2]:
# write lines

L = ['hello\n','hi\n','how are you\n','I am fine']

f = open('sample2.txt','w')
f.writelines(L)
f.close()

In [None]:
# reading from lines
f = open('sample2.txt','r')
s = f.read()
print(s)
f.close()

hello
hi
how are you
I am fine


In [None]:
#reading upto n chars
f = open('sample2.txt','r')
s = f.read(10)
print(s)
f.close()

hello
hi
h


In [None]:
# readline() ->to read line by line
f = open('sample2.txt','r')
print(f.readline(),end='')
print(f.readline(),end=' ')
f.close()

hello
hi
 

In [None]:
#reading entire using readline
f = open('sample2.txt','r')

while True:

    data = f.readline()

    if data=='':
        break
    else:
        print(data,end='')

f.close()

hello
hi
how are you
I am fine

### Using Context Manager (With)

- It's a good idea to close a file after usage as it will free up the resources
- If we dont close it, garbage collector would close it
- with keyword closes the file as soon as the usage is over


In [3]:
#with

with open('sample2.txt','a') as f :
    f.write('\nselmon bhai')

In [None]:
# try f.read() now
with open('sample2.txt','r') as f :
    print(f.read())

hello
hi
how are you
I am fineselmon bhai
selmon bhai


In [None]:
#moving within a file -> 10 char then  10  char
with open('sample2.txt','r') as f :
    print(f.read(10),end='')
    print(f.read(10))

hello
hi
how are you


In [None]:
# benefit? -> to load a big file in memory
big_L = ['helloworld' for i in range(20)]

print(len(big_L))
with open('big.txt','w') as f:
  f.writelines(big_L)


20


In [None]:
with open('big.txt','r') as f:
  #print(f.read())
  chunk_size = 10
  s = f.read(chunk_size)
  while len(s) > 0:
    print(s,end='\n')
    s = f.read(chunk_size)

helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld
helloworld


In [None]:
#seek and tell function

with open('sample2.txt','r') as f:
    f.seek(12)
    print(f.read(10))
    print(f.tell())
    f.seek(0)
    print(f.read(10))
    print(f.tell())

ow are you
22
hello
hi
h
12


In [None]:
# seek during write
with open('sample.txt','w') as f:
    f.write('hello')
    f.seek(0)
    f.write('x')

Problems with working in text mode

- can't work with binary files like images
- not good for other data types like int/float/list/tuples

In [None]:
#working with binary file :
with open('ss.png','rb') as f :
    with open('ss1.png','wb') as wf:
        wf.write(f.read())

In [None]:
#working with other dat types

with open('sample.txt','a') as f:
    f.write(5)

TypeError: write() argument must be str, not int

In [None]:
# more complex data

d = {
    'name' : 'nitish',
    'age' : 33,
    'gender':'male'
    }

with open('sample.txt','a') as f :
    f.write(str(d))

Serialization and Deserialization

- Serilization - process of converting data types to JSON format
- Deserialization - process of converting JSON to python data types

In [None]:
# serialization using JSON module
# list

import json

l = [1,2,3,4]

with open('demo.json','w') as f:
    json.dump(l,f)

In [None]:
# dict

d = {
    'name' : 'nitish',
    'age' : 33,
    'gender':'male'
    }

with open('demo.json','w') as f:
    json.dump(d,f,indent = 4)

In [None]:
# deserializaation
import json

with open('demo.json','r') as f:
    d = json.load(f)
    print(d)
    print(type(d))

{'name': 'nitish', 'age': 33, 'gender': 'male'}
<class 'dict'>


In [None]:
# serialize and deserialize tuple:
# tuple load and dump as list
import json

t = (1,2,3,4,5)

with open('demo1.json','w') as f:
    json.dump(t,f)

Serializing and Deserializing custom objects

In [6]:
class Person :

    def __init__(self,fname,lname,age,gender):
        self.fname = fname
        self.lname = lname
        self.age = age
        self.gender = gender

# format to printed in
# -> Nitish Singh age -> 33 gender -> male

In [7]:
person = Person('Nitish','Singh',33,'male')

In [8]:
# As a string
import json

def show_object(person):
  if isinstance(person,Person):
    return "{} {} age -> {} gender -> {}".format(person.fname,person.lname,person.age,person.gender)

with open('demo2.json','w') as f:
  json.dump(person,f,default=show_object)

In [33]:
# As a dict
import json

def show_object(person):
  if isinstance(person,Person):
    return {'name':person.fname + ' ' + person.lname,'age':person.age,'gender':person.gender}

with open('demo.json','w') as f:
  json.dump(person,f,default=show_object,indent=4)

In [34]:
# deserializing
import json

with open('demo.json','r') as f:
  d = json.load(f)
  print(d)
  print(type(d))

{'name': 'Nitish Singh', 'age': 33, 'gender': 'male'}
<class 'dict'>


### Pickling
`Pickling` is the process whereby a Python object hierarchy is converted into a byte stream, and `unpickling` is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.


### Pickle Vs Json

- Pickle lets the user to store data in binary format. JSON lets the user store data in a human-readable text format.

In [35]:
class Person:

  def __init__(self,name,age):
    self.name = name
    self.age = age

  def display_info(self):
    print('Hi my name is',self.name,'and I am ',self.age,'years old')

In [36]:
p = Person('nitish',33)

In [37]:
# pickle dump
import pickle
with open('person.pkl','wb') as f:
  pickle.dump(p,f)

In [38]:
# pickle load
import pickle
with open('person.pkl','rb') as f:
  p = pickle.load(f)

p.display_info()

Hi my name is nitish and I am  33 years old


HW

In [None]:
#1
def get_final_line(filename):
    s = open(filename,'r')
    str2 =""
    p = open(filename,'r')
    while True:
        str1 = s.readline()
        
        if str1=='':
            return str2
        else:
            str2 = p.readline()
         

print(get_final_line("sample2.txt"))

selmon bhai


In [15]:
for  i in open('sample2.txt','r'):
    print(i)

hello

hi

how are you

I am fine

selmon bhai


In [17]:
#2
def vowels_count(filename):
    vowel = ['a','e','i','o','u']
    dic = {i:0 for i in vowel}
    print(dic)
    for current_line in open(filename,'r'):
        for i in current_line:
            if i in vowel:
                dic[i] += 1

    return dic
vowels_count("sample2.txt")

{'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}


{'a': 3, 'e': 4, 'i': 3, 'o': 4, 'u': 1}

In [27]:
f = open("hw1.txt",'w')
for i in range(1,11,2):
    line = "{}\t{}\n".format(i,i+1)
    f.writelines(line)
f.close()

with open("hw1.txt",'r') as f :
    lines = f.read().splitlines()
    print(lines)

total = 0

with open("hw1.txt",'w') as f:
    for line in lines:
        a,b = line.split("\t")
        res = int(a)*int(b)
        total += res
        a = "{}\t{}\t{}\n".format(a,b,res)
        f.write(a)
    f.write("total\t"+str(total))

['1\t2', '3\t4', '5\t6', '7\t8', '9\t10']


In [40]:
def new_file(file1,file3):
    f1 = open(file1,'r')
    f2 = open(file3,'w')

    lines = f1.read().splitlines()
    for line in lines:
        f2.write(line[::-1]+"\n")

    f1.close()
    f2.close()

f = open("hw4.txt",'w')
for i in range(2):
    first = input("Ente first string")
    second = input("Ente second string")
    f.write(first+" "+second+"\n")
f.close()

new_file("hw4.txt","hw4-1.txt")


In [2]:
strings = """Alice was beginning to get very tired of sitting by her sister
            on the bank, and of having nothing to do:  once or twice she had
            peeped into the book her sister was reading, but it had no
            pictures or conversations in it, `and what is the use of a book,'
            thought Alice `without pictures or conversation?'

            So she was considering in her own mind (as well as she could,
            for the hot day made her feel very sleepy and stupid), whether
            the pleasure of making a daisy-chain would be worth the trouble
            of getting up and picking the daisies, when suddenly a White
            Rabbit with pink eyes ran close by her.

            There was nothing so VERY remarkable in that; nor did Alice
            think it so VERY much out of the way to hear the Rabbit say to
            itself, `Oh dear!  Oh dear!  I shall be late!'  (when she thought
            it over afterwards, it occurred to her that she ought to have
            wondered at this, but at the time it all seemed quite natural);
            but when the Rabbit actually TOOK A WATCH OUT OF ITS WAISTCOAT-
            POCKET, and looked at it, and then hurried on, Alice started to
            her feet, for it flashed across her mind that she had never
            before seen a rabbit with either a waistcoat-pocket, or a watch to
            take out of it, and burning with curiosity, she ran across the
            field after it, and fortunately was just in time to see it pop
            down a large rabbit-hole under the hedge."""

word_list = ['alice', 'wonder', 'natural']

word_lists = dict()

for i in strings.lower().split(' '):
    try:
        word_lists[i] += 1
    except:
        word_lists[i] = 1

import pickle

with open("word_counthw.pkl",'wb') as f:
    pickle.dump(word_lists,f)

with open("word_counthw.pkl",'rb') as f:
    word_count = pickle.load(f)

for s in word_list:
    try:
        print(s,word_count[s])
    except :
        print(s,0)


alice 3
wonder 0
natural 0
