<h2>Python class init</h2>

Whenever a beginner starts learning the Python programming language, they come across something like __init__ which usually they don’t fully understand.

In [1]:
# Understanding python class init function
# Let’s see a short code snippet and see what we’re trying to understand
class Student(object):

    def __init__(self, name):
        print("Init called.")
        self.name = name

    def method(self):
        return self.name 

my_object = Student('Harsha')
my_object.method()

Init called.


'Harsha'

<b>What does the python init method do?</b>

When a new instance of a python class is created, it is the __init__ method which is called and proves to be a very good place where we can modify the object after it has been created.

This means that when we create a new instance of the class like:

In [2]:
my_object = Student('Harsha')

Init called.


<b>Is __init__ the constructor?</b>

Actually yes. __init__ is an oop construct. __init__ is the constructor for a class. Just like mentioned above, the __init__ method is called as soon as the memory for the object is allocated. Let’s see what we did above in our snippet:

In [3]:
def __init__(self, something):
    self.something = something

In [4]:
#Using self is important because if you don’t and implement your method like:
def __init__(self, something):
    _something = something
#The something parameter would be stored in variables on the stack and would be discarded as soon as the __init__ method goes out of scope.


<b>How __init__ works with Inheritance?</b>

In [9]:
#When we have a class inheriting from a superclass, __init__ method works the same way. 
#Let us try to demonstrate what happens when we try to initialise a child class:
class User(object):
    def __init__(self, something):
        print("User Init called.")
        self.something = something

    def method(self):
        return self.something 

class Student(User):
    def __init__(self, something):
        User.__init__(self, something)
        print("Student Init called.")
        self.something = something

    def method(self):
        return self.something 

my_object = Student('Harsha')
my_object.method()

User Init called.
Student Init called.


'Harsha'

In [None]:
#n above code, when we initialised the Student object, this will be the output which is created when we ran the above program:
#So, before the child class, the parent’s class init was called. You can control this by modifying the order in which the init is called for a parent or a child class.
#To summarise, python __init__ is what is called as a constructor in other OOPs languages such as C++ and Java. 
#The basic idea behind this is, it a special method which is automatically called when an object of that Class is created.

In [None]:
<h2>Packages and Modules with PIP</h2>

<h3>Python Modules</h3>

Modules refer to a file containing Python statements and definitions.

A file containing Python code, for e.g.: example.py, is called a module and its module name would be example.

We use modules to break down large programs into small manageable and organized files. Furthermore, modules provide reusability of code.

We can define our most used functions in a module and import it, instead of copying their definitions into different programs.

# Python Module example
def add(a, b):
   """This program adds two
   numbers and return the result"""
   result = a + b
   return result

1.Created a file called example.py in the current folder with above definition
2.Now we can import this moudle into our python notebook

import example
example.add(4,5.5)

1.Python has 100's of standard modules available.U can import it in the same way.

<h3>Ways to import python module</h3>

# import statement example
# to import standard module math

import math
print("The value of pi is", math.pi)

# Import with renaming
import math as m
print("The value of pi is", m.pi)

#Python from...import statement
from math import pi
print("The value of pi is", pi)

#Import all names
from math import *
print("The value of pi is", pi)

<h3>The dir() built-in function</h3>

We can use the dir() function to find out names that are defined inside a module

dir(math)

import math
math.__name__

dir()

import sys
sys.path

import example
example.__file__

import re
re.__file__

<h2>Python Package</h2>

Suppose you have developed a very large application that includes many modules. As the number of modules grows, it becomes difficult to keep track of them all if they are dumped into one location. This is particularly so if they have similar names or functionality. You might wish for a means of grouping and organizing them.

Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules help avoid collisions between global variable names, packages help avoid collisions between module names.

Creating a package is quite straightforward, since it makes use of the operating system’s inherent hierarchical file structure. Consider the following arrangement:

![pkg1.9af1c7aea48f.png](attachment:pkg1.9af1c7aea48f.png)

#mod1.py
def foo():
    print('[mod1] foo()')

class Foo:
    pass

#mod2.py
def bar():
    print('[mod2] bar()')

class Bar:
    pass

import pkg.mod1, pkg.mod2
from pkg.mod1 import foo

from pkg.sub_pkg1.mod1 import foo

![image.png](attachment:image.png)

import pkg.sub_pkg1.mod1

from pkg.sub_pkg1 import mod2

from pkg.sub_pkg2.mod3 import baz

<h2>Install Packages with PIP</h2>

#Install a package from PyPI:
pip install SomePackage
#Install a package that’s already been downloaded from PyPI or obtained from elsewhere. This is useful if the target machine does not have a network connection:
pip install SomePackage-1.0-py2.py3-none-any.whl
#Show what files were installed:
    pip show --files SomePackage
#List what packages are outdated:
pip list --outdated
#Upgrade a package:
pip install --upgrade SomePackage
#Uninstall a package:
pip uninstall SomePackage

<h2>exceptions in Python</h2>

Python has many built-in exceptions which forces your program to output an error when something in it goes wrong.

When these exceptions occur, it causes the current process to stop and passes it to the calling process until it is handled. If not handled, our program will crash.

For example, if function A calls function B which in turn calls function C and an exception occurs in function C. If it is not handled in C, the exception passes to B and then to A.

If never handled, an error message is spit out and our program come to a sudden, unexpected halt.

<h2>Catching Exceptions in Python</h2>

# import module sys to get the type of exception
import sys

randomList = ['a', 0, 2]

for entry in randomList:
    try:
        print("The entry is", entry)
        r = 1/int(entry)
        break
    except:
        print("Oops!",sys.exc_info()[0],"occured.")
        print("Next entry.")
        print()
print("The reciprocal of",entry,"is",r)

#If no exception occurs, except block is skipped and normal flow continues. But if any exception occurs, 

#it is caught by the except block.

#Here, we print the name of the exception using ex_info() function inside sys module and ask the user to try again.
#We can see that the values 'a' and '1.3' causes ValueError and '0' causes ZeroDivisionError.

<h2>Catching Specific Exceptions in Python</h2>

#This is not a good programming practice as it will catch all exceptions and handle every case in the same way. 
#We can specify which exceptions an except clause will catch.

#A try clause can have any number of except clause to handle them differently but only one will be executed 
#in case an exception occurs.

#We can use a tuple of values to specify multiple exceptions in an except clause. Here is an example pseudo code.

try:
   # do something
   pass
except ValueError:
   # handle ValueError exception
   pass
except (TypeError, ZeroDivisionError):
   # handle multiple exceptions
   # TypeError and ZeroDivisionError
   pass
except:
   # handle all other exceptions
   pass

<h2>Raising Exceptions</h2>

#In Python programming, exceptions are raised when corresponding errors occur at run time, 
#but we can forcefully raise it using the keyword raise.

#We can also optionally pass in value to the exception to clarify why that exception was raised.

raise KeyboardInterrupt
Traceback (most recent call last):
...
KeyboardInterrupt
raise MemoryError("This is an argument")
Traceback (most recent call last):
...
MemoryError: This is an argument
try:
...     a = int(input("Enter a positive integer: "))
...     if a <= 0:
...         raise ValueError("That is not a positive number!")
... except ValueError as ve:
...     print(ve)
...    
Enter a positive integer: -2
That is not a positive number!

<h2>try and finally</h2>

The try statement in Python can have an optional finally clause. This clause is executed no matter what, and is generally used to release external resources.

For example, we may be connected to a remote data center through the network or working with a file or working with a Graphical User Interface (GUI).

In all these circumstances, we must clean up the resource once used, whether it was successful or not. These actions (closing a file, GUI or disconnecting from network) are performed in the finally clause to guarantee execution.

try:
   f = open("test.txt",encoding = 'utf-8')
   # perform file operations
finally:
   f.close()

<h2>Python User Defined Exceptions</h2>

Python has many built-in exceptions which forces your program to output an error when something in it goes wrong.

However, sometimes you may need to create custom exceptions that serves your purpose.

In Python, users can define such exceptions by creating a new class. This exception class has to be derived, either directly or indirectly, from Exception class. Most of the built-in exceptions are also derived form this class.

class CustomError(Exception):
...     pass
...
>>> raise CustomError
Traceback (most recent call last):
...
__main__.CustomError
>>> raise CustomError("An error occurred")
Traceback (most recent call last):
...
__main__.CustomError: An error occurred

Here, we have created a user-defined exception called CustomError which is derived from the Exception class. This new exception can be raised, like other exceptions, using the raise statement with an optional error message.

When we are developing a large Python program, it is a good practice to place all the user-defined exceptions that our program raises in a separate file. Many standard modules do this. They define their exceptions separately as exceptions.py or errors.py (generally but not always).

User-defined exception class can implement everything a normal class can do, but we generally make them simple and concise. Most implementations declare a custom base class and derive others exception classes from this base class. This concept is made clearer in the following example.

<h2>User-Defined Exception in Python</h2>

# define Python user-defined exceptions
class Error(Exception):
   """Base class for other exceptions"""
   pass
class ValueTooSmallError(Error):
   """Raised when the input value is too small"""
   pass
class ValueTooLargeError(Error):
   """Raised when the input value is too large"""
   pass
# our main program
# user guesses a number until he/she gets it right
# you need to guess this number
number = 10
while True:
   try:
       i_num = int(input("Enter a number: "))
       if i_num < number:
           raise ValueTooSmallError
       elif i_num > number:
           raise ValueTooLargeError
       break
   except ValueTooSmallError:
       print("This value is too small, try again!")
       print()
   except ValueTooLargeError:
       print("This value is too large, try again!")
       print()
print("Congratulations! You guessed it correctly.")

In [None]:
# File Operations 

<h3>What is a file?</h3>

<b>A file is some information or data which stays in the computer storage devices. You already know about different kinds of file , like your music files, video files, text files. Python gives you easy ways to manipulate these files. Generally we divide files in two categories, text file and binary file. Text files are simple text where as the binary files contain binary data which is only readable by computer.</b>

<h3>End of Line Character in a text file</h3>

<b>Each line is terminated with a special character, called the EOL or End of Line character. There are several types, but the most common is the comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. 

A backslash character can also be used, and it tells the interpreter that the next character – following the slash – should be treated as a new line. This character is useful when you don’t want to start a new line in the text itself but in the code. </b>

<b>Ex: Harsha.txt, config.py, param.xml, data.csv</b>

<h3>Directories and file management in python</h3>

<b>Directory is nothing but a folder/location where it can store files, subfolders.
To acces the folders and files in python, we have a module named os.</b>
<br>

<b>os module has diffrent methods to handle both directories and files in python</b>

<h3>Create , Delete and rename directories in python using os module. Navigating directories in python</h3>

import os
#get current working directory
os.getcwd()

'D:\\\Data_Science\\\Batch1_Lessons'--> The '\\' indicates the escape sequence of the file path. If we want to print only single slash , we can print it using print() function

print(os.getcwd())

#Finding list of files in the directory 
os.listdir("D:\Work")
#it lists all the files in the current working directory

#Finding list of files in a specific directory
os.listdir("D:\\Data_Science")

#Change the current working directory
os.chdir('D:\\Data_Science')
print(os.getcwd())

os.chdir('D:\\Data_Science\\Batch1_Lessons')
print(os.getcwd())

#Create a directory
os.mkdir('test')
#by default if dont specify the fully qualified path, then it will create a folder in the current working directory

#Create a directory under specific folder
os.mkdir('D:\\Data_Science\\DS_Task\\test1')

os.listdir()

#Rename a folder
os.rename('test','test_changed')
os.listdir()

os.listdir("D:\\Data_Science\\Batch1_Lessons\\test_changed")

#Delete a folder, file 
os.rmdir('test_changed')
#You can't delete a not empty directory, u should delete the files inside the directory and delete the folder

os.remove('D:\\Data_Science\\Batch1_Lessons\\test_changed\\abc.txt')

#Delete a folder, file 
#os.chmod('D:\\Data_Science\\Batch1_Lessons\\test_changed', 0777)
os.rmdir('D:\\Data_Science\\Batch1_Lessons\\test_changed')

os.listdir("D:\\Data_Science\\Batch1_Lessons")

 #To remove the directory with all files inside it, we can use shutil module
import shutil
shutil.rmtree('test_changed')

os.listdir()

# File Operations 

<h3>What is a file?</h3>

<b>A file is some information or data which stays in the computer storage devices. You already know about different kinds of file , like your music files, video files, text files. Python gives you easy ways to manipulate these files. Generally we divide files in two categories, text file and binary file. Text files are simple text where as the binary files contain binary data which is only readable by computer.</b>

<h3>End of Line Character in a text file</h3>

<b>Each line is terminated with a special character, called the EOL or End of Line character. There are several types, but the most common is the comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. 

A backslash character can also be used, and it tells the interpreter that the next character – following the slash – should be treated as a new line. This character is useful when you don’t want to start a new line in the text itself but in the code. </b>

<b>Ex: Harsha.txt, config.py, param.xml, data.csv</b>

<h3>Directories and file management in python</h3>

<b>Directory is nothing but a folder/location where it can store files, subfolders.
To acces the folders and files in python, we have a module named os.</b>
<br>

<b>os module has diffrent methods to handle both directories and files in python</b>

<h3>Create , Delete and rename directories in python using os module. Navigating directories in python</h3>

In [3]:
import os
#get current working directory
os.getcwd()

'D:\\Data_Science\\Batch1_Lessons'

'D:\\\Data_Science\\\Batch1_Lessons'--> The '\\' indicates the escape sequence of the file path. If we want to print only single slash , we can print it using print() function

In [4]:
print(os.getcwd())

D:\Data_Science\Batch1_Lessons


In [11]:
#Finding list of files in the directory 
os.listdir("D:\Work")
#it lists all the files in the current working directory

['Ashwinv_status_Harsha.pptx',
 'GE_Ship_Building',
 'Henkel_Video_Analytics',
 'L&T_Ship_Building_IPMS',
 'LNT_Data_Analytics',
 'PDF_Text_Classification',
 'Predective_Maintenance',
 'Video_Analytics']

In [4]:
#Finding list of files in a specific directory
os.listdir("D:\\Data_Science")

['Anaconda',
 'Analytics_Vidya_Hackathon',
 'Analytics_Vidya_Hackathon.zip',
 'Assignments_DonorsChoose_2018',
 'Assignments_DonorsChoose_2018-20190618T112223Z-002.zip',
 'Batch1_Lessons',
 'Blindness_Detection',
 'DS_Materials',
 'DS_Task',
 'DS_Task_Mohan',
 'Machine_Hack_Hackathon',
 'Twitter_Sentiment_Anaysis']

In [5]:
#Change the current working directory
os.chdir('D:\\Data_Science')
print(os.getcwd())

D:\Data_Science


In [14]:
os.chdir('D:\\Data_Science\\Batch1_Lessons')
print(os.getcwd())

D:\Data_Science\Batch1_Lessons


In [381]:
#Create a directory
os.mkdir('test')
#by default if dont specify the fully qualified path, then it will create a folder in the current working directory

In [6]:
#Create a directory under specific folder
os.mkdir('D:\\Data_Science\\DS_Task\\test1')

In [20]:
os.listdir()

['.ipynb_checkpoints',
 'example.py',
 'img1.PNG',
 'img2.PNG',
 'test',
 'Topics_16_hrs.xlsx',
 'Untitled.ipynb',
 'Week2.ipynb',
 'Week_1.ipynb',
 '__pycache__',
 '~$Topics_16_hrs.xlsx']

In [384]:
#Rename a folder
os.rename('test','test_changed')
os.listdir()

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'test' -> 'test_changed'

In [386]:
os.listdir("D:\\Data_Science\\Batch1_Lessons\\test_changed")

['abc.txt']

In [387]:
#Delete a folder, file 
os.rmdir('test_changed')
#You can't delete a not empty directory, u should delete the files inside the directory and delete the folder

OSError: [WinError 145] The directory is not empty: 'test_changed'

In [388]:
os.remove('D:\\Data_Science\\Batch1_Lessons\\test_changed\\abc.txt')

In [389]:
#Delete a folder, file 
#os.chmod('D:\\Data_Science\\Batch1_Lessons\\test_changed', 0777)
os.rmdir('D:\\Data_Science\\Batch1_Lessons\\test_changed')

In [390]:
os.listdir("D:\\Data_Science\\Batch1_Lessons")

['.ipynb_checkpoints',
 '100 Sales Records.csv',
 'example.py',
 'img1.PNG',
 'img2.PNG',
 'ML OCT Agenda.pdf',
 'my_file.txt',
 'test.txt',
 'test1.txt',
 'Topics_16_hrs.xlsx',
 'Untitled.ipynb',
 'Week2.ipynb',
 'Week_1.ipynb',
 '__pycache__',
 '~$Topics_16_hrs.xlsx']

In [52]:
 #To remove the directory with all files inside it, we can use shutil module
import shutil
shutil.rmtree('test_changed')

In [53]:
os.listdir()

['.ipynb_checkpoints',
 'example.py',
 'img1.PNG',
 'img2.PNG',
 'Topics_16_hrs.xlsx',
 'Untitled.ipynb',
 'Week2.ipynb',
 'Week_1.ipynb',
 '__pycache__',
 '~$Topics_16_hrs.xlsx']

In [None]:
<h2>File operations in python</h2>

#Opening a file 
Syntax : file_object  = open(“filename”, “mode”,encoding = 'utf-8')
#Filename : Name of the file to open and 
#file_object : Open method will returns a object which has inbuilt functions used for file handling.
#Mode   :   It tells the interpreter and developer which way the file will be used.
#Encoding : The default encoding is platform dependent. In windows, it is 'cp1252' but 'utf-8' in Linux.
#Modes of opening a file in python
# 'r'  : Open a file for reading. (default)
# 'w'  : Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
# 'x'  : Open a file for exclusive creation. If the file already exists, the operation fails.
# 'a'  : Open for appending at the end of the file without truncating it. Creates a new file if it does not exist.
# 't'  : Open in text mode. (default)
# 'b'  : Open in binary mode.
# 'r+'  :  Special read and write mode, which is used to handle both actions when working with a file 

<h3>Create a text file</h3>

file = open('test.txt','w') 
 
file.write('Hello All\n') 
file.write('Welcome to Verzeo online machine learning internship program\n') 
file.write('This course will help u in learning ML\n') 
file.write('Happy learning!\n') 
file.close() 

#Another way of writing a file 
with open("test1.txt",'w',encoding = 'utf-8') as f:
   f.write("Hello All\n")
   f.write("Good Morning\n\n")
   f.write("Welcome to the session\n")

#This program will create a new file named 'test1.txt' if it does not exist. If it does exist, it is overwritten.
#The advantage using 'with' is, u no need to close the file explicitly.

#Reading/opening a file
#The default mode is read only, ie if you do not provide any mode it will open the file as read only
file = open('test.txt', 'r') 
print(file.read())


#A file operation takes place in the following order.
#Open a file
#Read or write (perform operation)
#Close the file

#Always a opened file should be closed,otherwise it will be stored in the memory and if the same process contnus for multipe files ,
#it will keep adding and finally theprogram could crash

file.close()

#Reading individual lines/ reading with size
file = open("test1.txt",'r',encoding = 'utf-8')

# read the first 5 data.
file.read(3)
#if the text file reading is completed, it will return empty string

#Tell and Seek
file.tell() # Tell will return the current position of the cluster in the file reading.

file.read(5)

file.tell()

#Seek will bring the cursor to the required position back
file.seek(0)

print(file.read(5))
file.tell()

file.seek(5)
print(file.read(5))

#Reading lines in a loop
file1 = open("test1.txt",'r',encoding = 'utf-8')
for x in file1:
    print(x, end='')
file1.close()

with open('test1.txt') as f1:
    for line in f1:
        print(line)

#Readline, Readlines
file = open('test1.txt',encoding = 'utf-8')

#Readline will read line by line till the file ends, if we specify size it will work like read
print(file.readline())
print(file.tell())

#Return all the lines in a list, if all lines are done, it will return empty list
file.readlines()

#Append Data to a File
f=open("test.txt", "a+")
for i in range(2):
     f.write("New line %d\r\n" % (i+1))
f.close()

with open('test.txt') as f1:
    for line in f1:
        print(line)

def main():
    f= open("my_file.txt","w+")
    for i in range(10):
         f.write("This is line %d\r\n" % (i+1))
    f.close()
    #Open the file back and read the contents
    f=open("my_file.txt", "r")
    if f.mode == 'r':
        contents =f.read()
        print (contents)
    #or, readlines reads the individual line into a list
    fl =f.readlines()
    for x in fl:
        print(x)
if __name__== "__main__":
  main()

#List of file functions
# close()          : Close an open file. It has no effect if the file is already closed.
# detach()         : Separate the underlying binary buffer from the TextIOBase and return it.
# fileno()         : Return an integer number (file descriptor) of the file.
# flush()          : Flush the write buffer of the file stream.
# isatty()         : Return True if the file stream is interactive.
# read(n)          : Read atmost n characters form the file. Reads till end of file if it is negative or None.
# readable()       : Returns True if the file stream can be read from.
# readline(n=-1)   : Read and return one line from the file. Reads in at most n bytes if specified.
# readlines(n=-1)  : Read and return a list of lines from the file. Reads in at most n bytes/characters if specified.
# seek(offset,from=SEEK_SET) : Change the file position to offset bytes, in reference to from (start, current, end).
# seekable()       : Returns True if the file stream supports random access.
# tell()           : Returns the current file location.
# truncate(size=None) : Resize the file stream to size bytes. If size is not specified, resize to current location.
# writable()       : Returns True if the file stream can be written to.
# write(s)         : Write string s to the file and return the number of characters written.
# writelines(lines) : Write a list of lines to the file.

<h2>How to read data from csv</h2>

<b>What Is a CSV File?</b>

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value.

#For example , this is the table data
mydt

#You want to represent above table data in a csv format as below

ID,first_name,company,salary
11,David,Aon,74
12,Jamie,TCS,76
13,Steve,Google,96
14,Stevart,RBS,71
15,John,HCL,78

#In general, the separator character is called a delimiter, and the comma is not the only one used. 
#Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters. 
#Properly parsing a CSV file requires us to know which delimiter is being used.

<h3>Using Built-in CSV Library to parse CSV files</h3>

import csv
with open('100 Sales Records.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'''Region:{row[0]}--Country:{row[1]}--Item Type:{row[2]}--Order Date:{row[5]}--Total Profit:{row[12]} \n''')
            line_count += 1
    print(f'Processed {line_count} lines.')

#Loading csv data into a dictionary
import csv
with open('100 Sales Records.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'Region:{row["Region"]}--Country:{row["Country"]}--Item Type:{row["Item Type"]}--Order Date:{row["Order Date"]}--Total Profit:{row["Total Profit"]} \n')
        line_count += 1
    print(f'Processed {line_count} lines.')

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

-->delimiter specifies the character used to separate each field. The default is the comma (',').

-->quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').

-->escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

import pandas
df = pandas.read_csv('100 Sales Records.csv')
df.head(10)

#Pandas assumes the first line as the header
#Python by default creates a zero based index untill specified any column
#Date columns has to be specially handled in python, otherwise it will consider it as string
import pandas
df = pandas.read_csv('100 Sales Records.csv', index_col='Region')
df.head(10)


df.columns

#Handling date columns in pandas
import pandas
df = pandas.read_csv('100 Sales Records.csv', index_col='Region', parse_dates=['Order Date'])
df.head()

#Giving custom header names
import pandas
df = pandas.read_csv('100 Sales Records.csv', 
            index_col='Region', parse_dates=['Order_Date','Ship_Date'], 
            header=0, 
            names=['Region','Country', 'Item_Type', 'Channel', 'Order_Priority', 'Order_Date',
       'ID', 'Ship_Date', 'Units_Sold', 'Unit_Price', 'Unit_Cost',
       'Total_Revenue', 'Cost', 'Profit'])
df.head()

<h2>File operations in python</h2>

In [None]:
#Opening a file 
Syntax : file_object  = open(“filename”, “mode”,encoding = 'utf-8')
#Filename : Name of the file to open and 
#file_object : Open method will returns a object which has inbuilt functions used for file handling.
#Mode   :   It tells the interpreter and developer which way the file will be used.
#Encoding : The default encoding is platform dependent. In windows, it is 'cp1252' but 'utf-8' in Linux.
#Modes of opening a file in python
# 'r'  : Open a file for reading. (default)
# 'w'  : Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
# 'x'  : Open a file for exclusive creation. If the file already exists, the operation fails.
# 'a'  : Open for appending at the end of the file without truncating it. Creates a new file if it does not exist.
# 't'  : Open in text mode. (default)
# 'b'  : Open in binary mode.
# 'r+'  :  Special read and write mode, which is used to handle both actions when working with a file 

<h3>Create a text file</h3>

In [69]:
file = open('test.txt','w') 
 
file.write('Hello All\n') 
file.write('Welcome to Verzeo online machine learning internship program\n') 
file.write('This course will help u in learning ML\n') 
file.write('Happy learning!\n') 
file.close() 

In [8]:
#Another way of writing a file 
with open("test1.txt",'w',encoding = 'utf-8') as f:
   f.write("Hello All\n")
   f.write("Good Morning\n\n")
   f.write("Welcome to the session\n")

#This program will create a new file named 'test1.txt' if it does not exist. If it does exist, it is overwritten.
#The advantage using 'with' is, u no need to close the file explicitly.

In [393]:
#Reading/opening a file
#The default mode is read only, ie if you do not provide any mode it will open the file as read only
file = open('test.txt', 'r') 
print(file.read())


Hello All
Welcome to Verzeo online machine learning internship program
This course will help u in learning ML
Happy learning!
New line 1

New line 2

New line 1

New line 2




In [None]:
#A file operation takes place in the following order.
#Open a file
#Read or write (perform operation)
#Close the file

#Always a opened file should be closed,otherwise it will be stored in the memory and if the same process contnus for multipe files ,
#it will keep adding and finally theprogram could crash

In [417]:
file.close()

In [9]:
#Reading individual lines/ reading with size
file = open("test1.txt",'r',encoding = 'utf-8')

In [10]:
# read the first 5 data.
file.read(3)
#if the text file reading is completed, it will return empty string

'Hel'

In [11]:
#Tell and Seek
file.tell() # Tell will return the current position of the cluster in the file reading.

3

In [12]:
file.read(5)

'lo Al'

In [13]:
file.tell()

8

In [14]:
#Seek will bring the cursor to the required position back
file.seek(0)

0

In [15]:
print(file.read(5))
file.tell()

Hello


5

In [427]:
file.seek(5)
print(file.read(5))

 All



In [429]:
#Reading lines in a loop
file1 = open("test1.txt",'r',encoding = 'utf-8')
for x in file1:
    print(x, end='')
file1.close()

Hello All
Good Morning

Welcome to the session


In [430]:
with open('test1.txt') as f1:
    for line in f1:
        print(line)

Hello All

Good Morning



Welcome to the session



In [16]:
#Readline, Readlines
file = open('test1.txt',encoding = 'utf-8')

In [19]:
#Readline will read line by line till the file ends, if we specify size it will work like read
print(file.readline())
print(file.tell())



27


In [444]:
#Return all the lines in a list, if all lines are done, it will return empty list
file.readlines()

['Hello All\n', 'Good Morning\n', '\n', 'Welcome to the session\n']

In [33]:
#Append Data to a File
f=open("test.txt", "a+")
for i in range(2):
     f.write("New line %d\r\n" % (i+1))
f.close()

In [446]:
with open('test.txt') as f1:
    for line in f1:
        print(line)

Hello All

Welcome to Verzeo online machine learning internship program

This course will help u in learning ML

Happy learning!

New line 1



New line 2



New line 1



New line 2



New line 1



New line 2





In [447]:
def main():
    f= open("my_file.txt","w+")
    for i in range(10):
         f.write("This is line %d\r\n" % (i+1))
    f.close()
    #Open the file back and read the contents
    f=open("my_file.txt", "r")
    if f.mode == 'r':
        contents =f.read()
        print (contents)
    #or, readlines reads the individual line into a list
    fl =f.readlines()
    for x in fl:
        print(x)
if __name__== "__main__":
  main()

This is line 1

This is line 2

This is line 3

This is line 4

This is line 5

This is line 6

This is line 7

This is line 8

This is line 9

This is line 10




In [None]:
#List of file functions
# close()          : Close an open file. It has no effect if the file is already closed.
# detach()         : Separate the underlying binary buffer from the TextIOBase and return it.
# fileno()         : Return an integer number (file descriptor) of the file.
# flush()          : Flush the write buffer of the file stream.
# isatty()         : Return True if the file stream is interactive.
# read(n)          : Read atmost n characters form the file. Reads till end of file if it is negative or None.
# readable()       : Returns True if the file stream can be read from.
# readline(n=-1)   : Read and return one line from the file. Reads in at most n bytes if specified.
# readlines(n=-1)  : Read and return a list of lines from the file. Reads in at most n bytes/characters if specified.
# seek(offset,from=SEEK_SET) : Change the file position to offset bytes, in reference to from (start, current, end).
# seekable()       : Returns True if the file stream supports random access.
# tell()           : Returns the current file location.
# truncate(size=None) : Resize the file stream to size bytes. If size is not specified, resize to current location.
# writable()       : Returns True if the file stream can be written to.
# write(s)         : Write string s to the file and return the number of characters written.
# writelines(lines) : Write a list of lines to the file.

<h2>How to read data from csv</h2>

<b>What Is a CSV File?</b>

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value.

In [448]:
#For example , this is the table data
mydt

Unnamed: 0,ID,first_name,company,salary
0,11,David,Aon,74
1,12,Jamie,TCS,76
2,13,Steve,Google,96
3,14,Stevart,RBS,71
4,15,John,HCL,78


In [None]:
#You want to represent above table data in a csv format as below

ID,first_name,company,salary
11,David,Aon,74
12,Jamie,TCS,76
13,Steve,Google,96
14,Stevart,RBS,71
15,John,HCL,78

#In general, the separator character is called a delimiter, and the comma is not the only one used. 
#Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters. 
#Properly parsing a CSV file requires us to know which delimiter is being used.

<h3>Using Built-in CSV Library to parse CSV files</h3>

In [205]:
import csv
with open('100 Sales Records.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'''Region:{row[0]}--Country:{row[1]}--Item Type:{row[2]}--Order Date:{row[5]}--Total Profit:{row[12]} \n''')
            line_count += 1
    print(f'Processed {line_count} lines.')

Column names are Region, Country, Item Type, Sales Channel, Order Priority, Order Date, Order ID, Ship Date, Units Sold, Unit Price, Unit Cost, Total Revenue, Total Cost, Total Profit
Region:Australia and Oceania--Country:Tuvalu--Item Type:Baby Food--Order Date:5/28/2010--Total Profit:1582243.50 

Region:Central America and the Caribbean--Country:Grenada--Item Type:Cereal--Order Date:8/22/2012--Total Profit:328376.44 

Region:Europe--Country:Russia--Item Type:Office Supplies--Order Date:5/2/2014--Total Profit:933903.84 

Region:Sub-Saharan Africa--Country:Sao Tome and Principe--Item Type:Fruits--Order Date:6/20/2014--Total Profit:56065.84 

Region:Sub-Saharan Africa--Country:Rwanda--Item Type:Office Supplies--Order Date:2/1/2013--Total Profit:2657347.52 

Region:Australia and Oceania--Country:Solomon Islands--Item Type:Baby Food--Order Date:2/4/2015--Total Profit:474115.08 

Region:Sub-Saharan Africa--Country:Angola--Item Type:Household--Order Date:4/23/2011--Total Profit:2104134.98 



In [210]:
#Loading csv data into a dictionary
import csv
with open('100 Sales Records.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'Region:{row["Region"]}--Country:{row["Country"]}--Item Type:{row["Item Type"]}--Order Date:{row["Order Date"]}--Total Profit:{row["Total Profit"]} \n')
        line_count += 1
    print(f'Processed {line_count} lines.')

Column names are Region, Country, Item Type, Sales Channel, Order Priority, Order Date, Order ID, Ship Date, Units Sold, Unit Price, Unit Cost, Total Revenue, Total Cost, Total Profit
Region:Australia and Oceania--Country:Tuvalu--Item Type:Baby Food--Order Date:5/28/2010--Total Profit:951410.50 

Region:Central America and the Caribbean--Country:Grenada--Item Type:Cereal--Order Date:8/22/2012--Total Profit:248406.36 

Region:Europe--Country:Russia--Item Type:Office Supplies--Order Date:5/2/2014--Total Profit:224598.75 

Region:Sub-Saharan Africa--Country:Sao Tome and Principe--Item Type:Fruits--Order Date:6/20/2014--Total Profit:19525.82 

Region:Sub-Saharan Africa--Country:Rwanda--Item Type:Office Supplies--Order Date:2/1/2013--Total Profit:639077.50 

Region:Australia and Oceania--Country:Solomon Islands--Item Type:Baby Food--Order Date:2/4/2015--Total Profit:285087.64 

Region:Sub-Saharan Africa--Country:Angola--Item Type:Household--Order Date:4/23/2011--Total Profit:693911.51 

Reg

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

-->delimiter specifies the character used to separate each field. The default is the comma (',').

-->quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').

-->escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

In [213]:
import pandas
df = pandas.read_csv('100 Sales Records.csv')
df.head(10)

Unnamed: 0,Region,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit
0,Australia and Oceania,Tuvalu,Baby Food,Offline,H,5/28/2010,669165933,6/27/2010,9925,255.28,159.42,2533654.0,1582243.5,951410.5
1,Central America and the Caribbean,Grenada,Cereal,Online,C,8/22/2012,963881480,9/15/2012,2804,205.7,117.11,576782.8,328376.44,248406.36
2,Europe,Russia,Office Supplies,Offline,L,5/2/2014,341417157,5/8/2014,1779,651.21,524.96,1158502.59,933903.84,224598.75
3,Sub-Saharan Africa,Sao Tome and Principe,Fruits,Online,C,6/20/2014,514321792,7/5/2014,8102,9.33,6.92,75591.66,56065.84,19525.82
4,Sub-Saharan Africa,Rwanda,Office Supplies,Offline,L,2/1/2013,115456712,2/6/2013,5062,651.21,524.96,3296425.02,2657347.52,639077.5
5,Australia and Oceania,Solomon Islands,Baby Food,Online,C,2/4/2015,547995746,2/21/2015,2974,255.28,159.42,759202.72,474115.08,285087.64
6,Sub-Saharan Africa,Angola,Household,Offline,M,4/23/2011,135425221,4/27/2011,4187,668.27,502.54,2798046.49,2104134.98,693911.51
7,Sub-Saharan Africa,Burkina Faso,Vegetables,Online,H,7/17/2012,871543967,7/27/2012,8082,154.06,90.93,1245112.92,734896.26,510216.66
8,Sub-Saharan Africa,Republic of the Congo,Personal Care,Offline,M,7/14/2015,770463311,8/25/2015,6070,81.73,56.67,496101.1,343986.9,152114.2
9,Sub-Saharan Africa,Senegal,Cereal,Online,H,4/18/2014,616607081,5/30/2014,6593,205.7,117.11,1356180.1,772106.23,584073.87


In [216]:
#Pandas assumes the first line as the header
#Python by default creates a zero based index untill specified any column
#Date columns has to be specially handled in python, otherwise it will consider it as string
import pandas
df = pandas.read_csv('100 Sales Records.csv', index_col='Region')
df.head(10)


Unnamed: 0_level_0,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Australia and Oceania,Tuvalu,Baby Food,Offline,H,5/28/2010,669165933,6/27/2010,9925,255.28,159.42,2533654.0,1582243.5,951410.5
Central America and the Caribbean,Grenada,Cereal,Online,C,8/22/2012,963881480,9/15/2012,2804,205.7,117.11,576782.8,328376.44,248406.36
Europe,Russia,Office Supplies,Offline,L,5/2/2014,341417157,5/8/2014,1779,651.21,524.96,1158502.59,933903.84,224598.75
Sub-Saharan Africa,Sao Tome and Principe,Fruits,Online,C,6/20/2014,514321792,7/5/2014,8102,9.33,6.92,75591.66,56065.84,19525.82
Sub-Saharan Africa,Rwanda,Office Supplies,Offline,L,2/1/2013,115456712,2/6/2013,5062,651.21,524.96,3296425.02,2657347.52,639077.5
Australia and Oceania,Solomon Islands,Baby Food,Online,C,2/4/2015,547995746,2/21/2015,2974,255.28,159.42,759202.72,474115.08,285087.64
Sub-Saharan Africa,Angola,Household,Offline,M,4/23/2011,135425221,4/27/2011,4187,668.27,502.54,2798046.49,2104134.98,693911.51
Sub-Saharan Africa,Burkina Faso,Vegetables,Online,H,7/17/2012,871543967,7/27/2012,8082,154.06,90.93,1245112.92,734896.26,510216.66
Sub-Saharan Africa,Republic of the Congo,Personal Care,Offline,M,7/14/2015,770463311,8/25/2015,6070,81.73,56.67,496101.1,343986.9,152114.2
Sub-Saharan Africa,Senegal,Cereal,Online,H,4/18/2014,616607081,5/30/2014,6593,205.7,117.11,1356180.1,772106.23,584073.87


In [217]:
df.columns

Index(['Country', 'Item Type', 'Sales Channel', 'Order Priority', 'Order Date',
       'Order ID', 'Ship Date', 'Units Sold', 'Unit Price', 'Unit Cost',
       'Total Revenue', 'Total Cost', 'Total Profit'],
      dtype='object')

In [222]:
#Handling date columns in pandas
import pandas
df = pandas.read_csv('100 Sales Records.csv', index_col='Region', parse_dates=['Order Date'])
df.head()

Unnamed: 0_level_0,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Australia and Oceania,Tuvalu,Baby Food,Offline,H,2014-05-08,669165933,05-08-2014,9925,255.28,159.42,2533654.0,1582243.5,951410.5
Central America and the Caribbean,Grenada,Cereal,Online,C,2014-07-05,963881480,07-05-2014,2804,205.7,117.11,576782.8,328376.44,248406.36
Europe,Russia,Office Supplies,Offline,L,2014-05-08,341417157,05-08-2014,1779,651.21,524.96,1158502.59,933903.84,224598.75
Sub-Saharan Africa,Sao Tome and Principe,Fruits,Online,C,2014-07-05,514321792,07-05-2014,8102,9.33,6.92,75591.66,56065.84,19525.82
Sub-Saharan Africa,Rwanda,Office Supplies,Offline,L,2013-02-06,115456712,02-06-2013,5062,651.21,524.96,3296425.02,2657347.52,639077.5


In [225]:
#Giving custom header names
import pandas
df = pandas.read_csv('100 Sales Records.csv', 
            index_col='Region', parse_dates=['Order_Date','Ship_Date'], 
            header=0, 
            names=['Region','Country', 'Item_Type', 'Channel', 'Order_Priority', 'Order_Date',
       'ID', 'Ship_Date', 'Units_Sold', 'Unit_Price', 'Unit_Cost',
       'Total_Revenue', 'Cost', 'Profit'])
df.head()

Unnamed: 0_level_0,Country,Item_Type,Channel,Order_Priority,Order_Date,ID,Ship_Date,Units_Sold,Unit_Price,Unit_Cost,Total_Revenue,Cost,Profit
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Australia and Oceania,Tuvalu,Baby Food,Offline,H,2014-05-08,669165933,2014-05-08,9925,255.28,159.42,2533654.0,1582243.5,951410.5
Central America and the Caribbean,Grenada,Cereal,Online,C,2014-07-05,963881480,2014-07-05,2804,205.7,117.11,576782.8,328376.44,248406.36
Europe,Russia,Office Supplies,Offline,L,2014-05-08,341417157,2014-05-08,1779,651.21,524.96,1158502.59,933903.84,224598.75
Sub-Saharan Africa,Sao Tome and Principe,Fruits,Online,C,2014-07-05,514321792,2014-07-05,8102,9.33,6.92,75591.66,56065.84,19525.82
Sub-Saharan Africa,Rwanda,Office Supplies,Offline,L,2013-02-06,115456712,2013-02-06,5062,651.21,524.96,3296425.02,2657347.52,639077.5
