## <font color=grey>Lecture Notes: Day 03 - File System & Process </font>

## Topics covered
- Files
- Read Inputs
- CSV Files 
- Excel Spreadsheet
- Time 

## <font color=red>** Files **</font> 

**What**: To store data permanently, you have to put it in a **file**. When there are a large number of files, they are often organized into **directories** (also called "folders"). 

The following are different **modes** for files: 

* **w**: opening the file for writing 
* **r**: reading
* **a**: appending
* **+**: creates a new file if doesn't exist

In [1]:
help (open)


Help on built-in function open in module __builtin__:

open(...)
    open(name[, mode[, buffering]]) -> file object
    
    Open a file using the file() type, returns a file object.  This is the
    preferred way to open a file.  See file.__doc__ for further information.



In [6]:
f = open("months.txt", "r")
type (f)   #this means, its a file object
#help(f)    ## this means the f, file object, is a class; now look at the member functions of this class

file

In [8]:
f.name


'months.txt'

In [13]:
f = open ("months.txt", "r")
#file_contents = f.read()       #this creates a buffered string
file_contents = f.readlines()   #this creates list of strings
print (type(file_contents))
print (file_contents)
f.close()

<type 'list'>
['January\n', 'February\n', 'March\n', 'April\n', 'May\n', 'June\n', 'July\n', 'August\n', 'September\n', 'October\n', 'November\n', 'December']


### Opening a file - Writing in it - Closing the file 

** Opening the file ** 

* To open a file and write in it, the syntax is: 
* **Note**: if there is no file with that name, it will be created automatically 
* **Note**: in code, we refer to the file with the **variable name** NOT the file name 

![Opening%20a%20file.png](attachment:Opening%20a%20file.png)

In [16]:
# Opening a file - write in the file - close the file
# SYNTAX: open("file_name", "mode")

f = open("test.txt","w")
# if no file name "test.txt" exists, it will be created.

** Writing in the file ** 

* To write in the file, the syntax is: 
![write%20in%20file.png](attachment:write%20in%20file.png)

In [None]:
# Putting data in the file with write method on the file object

f.write("This is first sentence\nwelcome\n")
f.write("This is second sentence\ngood bye\n") 

** Closing the file ** 

* Remember we refer to files with the variable name (assigned when opening the file) **NOT** the file name (what is stored in the computer) 
* To close the file, the syntax is: 
![close%20file.png](attachment:close%20file.png)

In [None]:
# Closing the file

f.close() 

### Reading a file 

** Using file_name.read() syntax ** 

In [5]:
# Read method reads the entire contents of the file

f = open("test.txt","r")
text = f.read() 
print (text) 

f.close()

This is first sentence
welcome
This is second sentence
good bye



** Using file_name.readlines() syntax ** 
* This syntax returns all of the lines as a list of strings
* The output is in list format
* The strings appear with quotations ("") 

In [None]:
f = open("test.txt","r")
text = f.readlines()

print text
f.close()

** Using a for loop to process the file line by line ** 
* This means we do not load the whole file in the memory 

In [8]:
f = open("test.txt","r")
print type(f)
for line in f:    
    print line

<type 'file'>
This is first sentence

welcome

This is second sentence

good bye



** Separating lines into different variables ** 

* The readline method reads all the characters up to and including the next newline character (one entire line from the file) 

In [9]:
f = open("test.txt","r")

text1 = f.readline()
text2 = f.readline()
text3 = f.readline()

print ("First line: " + text1)
print ("Second line: " + text2)
print ("Third line: " + text3)

First line: This is first sentence

Second line: welcome

Third line: This is second sentence



# <font color = red> Read Inputs </font>

## - Accessing Command Line Arguments Inside Script

In [2]:
import sys

print "This is the name of the script: ", sys.argv[0]
print "Number of arguments: ", len(sys.argv)
print "The arguments are: " , str(sys.argv)

This is the name of the script:  /usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py
Number of arguments:  3
The arguments are:  ['/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py', '-f', '/home/bala/.local/share/jupyter/runtime/kernel-c6e1da60-c0cd-4266-a337-2be0a8ae6dd1.json']


## - Create File Using Python

In [16]:
# The available option beside "w" are "r" for read and "a" for append  
f = open("newFile.txt", "w") 

## - Append to File
Use the write() method to append something to the end of the file.

In [17]:
f = open("newFile.txt", "a") # remember: a stands for append
for i in range(5):
    f.write("Appended line %d\r\n" %(i+1))
        
f.close() # close file

## - Copy Contents From File & Append to Another File 

In [18]:
f_old = open("newFile.txt","r")
f_new = open("finalFile.txt", "w+")

for line in f_old.readlines():
    f_new.write(line)

f_new.close()
f_old.close()

## <font color=red>** CSV (Comma Separated Values) File **</font> 

CSV is a simple file format used to store tabular data, such as a spreadsheet or database. Files in the CSV format can be imported to and exported from programs that store data in tables, such as Microsoft Excel or OpenOffice Calc. 

**Easy way: **

In [39]:
#write to csv 
import csv
with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ') #elements of the csv file will be separated by spaces 
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

In [40]:
#csv reader:
import csv
with open('eggs.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ')
    for row in spamreader:
        print(row)

['Spam', 'Spam', 'Spam', 'Spam', 'Spam', 'Baked Beans']
['Spam', 'Lovely Spam', 'Wonderful Spam']


**Harder way:** Treat the CSV files as dictionaries. This way is **prefered** because it allows you to easily access each separate column. 

In [28]:
#Writer:
import csv

#name of csv file
name_of_file = "class.csv"
#open in write mode
csvfile = open(name_of_file, 'w') 

#set column headers in a list
fieldnames = ['student_name', 'attendance']
#configuration to write to csv
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#invoke method to read header
writer.writeheader()
#visual cue for header written
print("header written")
#set data to be written in a dictionary
writer.writerow({'student_name': 'Bala', 'attendance': 'present'})
writer.writerow({'student_name': 'Bala1', 'attendance': 'absent'})
#visual cue for row written
print("row written")

header written
row written


In [2]:
#Reader: 
import csv

#name of csv file
name = "class.csv"
#open in write mode
csvfile = open(name, 'r') 

#configuration to write to csv
reader = csv.DictReader(csvfile)
#invoke method to read header

for row in reader:
    print row
    print row["student_name"] #just to show you that you can choose to only print the data of only one column!  




{'student_name': 'Bala', 'attendance': 'present'}
Bala
{'student_name': 'Bala1', 'attendance': 'absent'}
Bala1


## <font color=red>** Excel Spreadsheet **</font> 

** Read and Write to Excel Spreadsheet ** 

Note: You will first have to do the following commands to install a third party module: 
        pip install pyexcel
        pip install pyexcel-xls
        
Full documentation: http://pyexcel-io.readthedocs.io/en/latest/

** Write excel file **

In [3]:
#Write
import pyexcel
import pyexcel_xls #Convention: The name is pyexcel-xls but you must import pyexcel_xls. Hyphens get turned to underscores!!

#Standard, one-sheet method: 
array = [["col 1", "col 2", "col 3"], [1, 2, 3], [4, 5, 6], [7, 8, 9]]
# "output.csv" "output.xlsx" "output.ods" "output.xlsm" show the different file types you can save as. 
sheet = pyexcel.Sheet(array)
sheet.save_as("output.xls")

** Read excel file as json **

In [58]:
#Read 
import json
# "example.csv","example.xlsx","example.xlsm"
sheet = pyexcel.get_sheet(file_name="output.xls")
print(json.dumps(sheet.to_array()))

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


** Read excel file as a dictionary **

In [4]:
#Read as a dictionary 
sheet = pyexcel.get_sheet(file_name="output.xls", name_columns_by_row=0)
print (sheet)

first_column=sheet.to_dict()
print (first_column["col 1"])

pyexcel sheet:
+-------+-------+-------+
| col 1 | col 2 | col 3 |
| 1     | 2     | 3     |
+-------+-------+-------+
| 4     | 5     | 6     |
+-------+-------+-------+
| 7     | 8     | 9     |
+-------+-------+-------+
[1, 4, 7]
2
5
8


** Write multi-sheet excel file **

In [75]:
#Write multiple-sheet excel file 
content = {
    'Sheet 1':
        [
            [1.0, 2.0, 3.0],
            [4.0, 5.0, 6.0],
            [7.0, 8.0, 9.0]
        ],
    'Sheet 2':
        [
            ['X', 'Y', 'Z'],
            [1.0, 2.0, 3.0],
            [4.0, 5.0, 6.0]
        ],
    'Sheet 3':
        [
            ['O', 'P', 'Q'],
            [3.0, 2.0, 1.0],
            [4.0, 3.0, 2.0]
        ]
}
book = pyexcel.get_book(bookdict=content)
book.save_as("multisheet.xls")

** Read multi-sheet file **

In [14]:
#Read multi-sheet file
book = pyexcel.get_book(file_name="multisheet.xls")
sheets = book.to_dict()
print book["Sheet 1"]  
print book["Sheet 2"]
print book["Sheet 3"]
print book["Sheet 3"]

Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
Sheet 2:
+---+---+---+
| X | Y | Z |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 3:
+---+---+---+
| O | P | Q |
+---+---+---+
| 3 | 2 | 1 |
+---+---+---+
| 4 | 3 | 2 |
+---+---+---+
Sheet 3:
+---+---+---+
| O | P | Q |
+---+---+---+
| 3 | 2 | 1 |
+---+---+---+
| 4 | 3 | 2 |
+---+---+---+


## <font color=red>** Time **</font> 

** Suspend Execution for Given Number of Seconds ** 
Import time module and use time.sleep() to suspend the execution for the desired time.

In [None]:
import time

while True:
    print("Tick tock")
    time.sleep(5)
    print("5 seconds passed")
    break

## - Get Current Time

In [None]:
from datetime import datetime
print(datetime.now())

## - Get CPU Usage & System Memory Stats

In [None]:
# install library first
#!conda install psutil
#!pip2.7 install psutil

In [None]:
import psutil
print(psutil.cpu_percent()) #cpu usage

In [None]:
print(psutil.virtual_memory().percent) #system memory

# <font color =red>EXERCISE</font>

## Case Model: 
#### 1. Run a program every 5 seconds
#### 2. Use psutil to collect metrics about CPU & memory
#### 3. Write the metrics into a csv. file

### 1. Run a program every 5 seconds


In [None]:
# install library
!conda install psutil 

In [None]:
# import library
import csv
import time
from datetime import datetime
import psutil

In [None]:
"""
This program's job is to run a function every 2 seconds. 
"""

import time

def tick():
    print('Tick! The time is: %s' % datetime.now())

print("starting...")
while True:
    time.sleep(2)
    tick()


In [None]:
def collect_cpu_metrics():
    print('CPU Usage: %s%%' % psutil.cpu_percent())

def collect_memory_metrics():
    print('Memory Usage: %s%%' % psutil.virtual_memory().percent)
    
collect_cpu_metrics()
collect_memory_metrics()

In [None]:
#VERSION 1
#HOW TO WRITE HEADER TO CSV
filepath = "C:\\Users\\issguest\\Downloads\\workspace\\stackup_python_course_code\\notebooks\\day03\\"
name = "cpu_and_memory.csv"
full_name = filepath + name
csvfile = open(full_name, 'w')

fieldnames = ['unix_time', 'cpu_percent', 'memory_percent', 'current_time']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)    
writer.writeheader()
print("header written")

#VERSION 2
#HOW TO WRITE HEADER TO CSV
filepath = "C:\\Users\\issguest\\Downloads\\workspace\\stackup_python_course_code\\notebooks\\day03\\"
name = "cpu_and_memory.csv"
full_name = filepath + name
with open(full_name, 'w') as csvfile:
    fieldnames = ['unix_time', 'cpu_percent', 'memory_percent', 'current_time']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)    
    writer.writeheader()
    print("header written")
    

In [None]:
import csv
from datetime import datetime
# WINDOWS
filepath = "C:\\Users\\issguest\\Downloads\\workspace\\stackup_python_course_code\\notebooks\\day03\\"
# LINUX / MAC
#filepath = "/home/std-user01/workspace/play/data-science/casemodel1/"


def collect_cpu_and_memory_metrics():
    cpu_percent    = psutil.cpu_percent()
    memory_percent = psutil.virtual_memory().percent
    current_time   = datetime.now()
    unix_time      = time.mktime(current_time.timetuple())
    print('%.f' % unix_time, '%s' % cpu_percent, memory_percent, '%s' % current_time)
    write_metrics_to_csv_file('%.f' % unix_time, '%s' % cpu_percent, memory_percent, '%s' % current_time)
    
def write_metrics_to_csv_file(col1, col2, col3, col4):
    #filepath = "/home/std-user01/workspace/play/data-science/casemodel1/"
    # Further improvement : create a file for every day
    name = "cpu_and_memory.csv"
    full_name = filepath + name
    f = open(full_name, 'at')
    try:
        writer = csv.writer(f)
        writer.writerow( (col1, col2, col3, col4) )
    finally:
        f.close()

collect_cpu_and_memory_metrics()

# Case Model: Code Answers

In [None]:
"""
This program's job is to run a function every 2 seconds. 
"""

import csv
import time
from datetime import datetime

# WINDOWS
filepath = "C:\\Users\\issguest\\Downloads\\workspace\\stackup_python_course_code\\notebooks\\day03\\"
# LINUX / MAC
#filepath = "/home/std-user01/workspace/play/data-science/casemodel1/"


def collect_cpu_and_memory_metrics():
    cpu_percent    = psutil.cpu_percent()
    memory_percent = psutil.virtual_memory().percent
    current_time   = datetime.now()
    unix_time      = time.mktime(current_time.timetuple())
    print('%.f' % unix_time, '%s' % cpu_percent, memory_percent, '%s' % current_time)
    write_metrics_to_csv_file('%.f' % unix_time, '%s' % cpu_percent, memory_percent, '%s' % current_time)
    
def write_metrics_to_csv_file(col1, col2, col3, col4):

    # Further improvement : create a file for every day
    name = "cpu_and_memory.csv"
    full_name = filepath + name
    f = open(full_name, 'at')
    try:
        writer = csv.writer(f)
        writer.writerow( (col1, col2, col3, col4) )
    finally:
        f.close()

print("starting to collect data every 5 seconds...")
   
while True:
    time.sleep(5)
    collect_cpu_and_memory_metrics()



# <font color = red> Folder</font>

## - Check If File Exists Based on Absolute Path

In [None]:
import os.path

# For windows
my_file = '/home/bala/workspace/stackup_python_course_code/task/file_process_code.ipynb'

# For Linux/Mac
# my_file = "Users/issguest/Downloads/workspace/stackup_python_course_code/notebooks/day01/Lectures/Lecture01.ipynb"

#this os.path.exists(file) will return true if there is an existing path
file_exists_or_not = os.path.exists(my_file)
if(file_exists_or_not):
    print("file is found")
else:
    print("file is not found")

## - Check Current Working Directory

In [None]:
print os.getcwd()

In [None]:
# print working directory in command line
!pwd

## - Check If Folder Exists

In [None]:
import os

#print os.direxists(os.path.join(os.getcwd(), 'new_folder'))

#os.path.join will join "new_folder" with os.getcwd() safely
#os.path.isdir(folder) will return true if there is an existing folder  
check_folder_exist = os.path.isdir(os.path.join(os.getcwd(), 'new_folder'))

if(check_folder_exist):
    print("Yes, the folder exists in current directory")
else:
    print("No, the folder does NOT exist in current directory")

## - Searching For File in Folder

In [None]:
import glob

os.chdir('/home/bala/workspace/stackup_python_course_code/')
for file in glob.glob("*.md"):
    print(file)

## - Backup Folder by tar/zip

In [None]:
import shutil

zip_name = 'newZip'
directory_name = os.getcwd()

# Create 'path\to\zip_file.zip'
shutil.make_archive(zip_name, 'zip', directory_name)

# <font color = red>EXERCISE</font>
## Walkthrough: File Search

## - Open a file and see if a word exists or not

In [None]:
import sys

file_name = raw_input('Name of file to read: ')
search_words = raw_input('Words to search for: ').split()

file = open(file_name, 'r').read()

for word in search_words:
    if (file.find(word) != -1):
        print "The word '%s' is found" % word
    else:
        print "The word '%s' is not found" % word

## - Read from a file and build a word dictionary, with words and its occurences

In [None]:
import sys

file_name = raw_input('Name of file to read: ')
search_words = raw_input('Words to search for: ').split()

file = open(file_name, 'r').read()

dictionary = {}

for word in file.split():
    if dictionary.get(word, None):
        dictionary[word] += 1
    else:
        dictionary[word] = 1

for word in search_words:
    print "There are %d occurrences of '%s'" % (dictionary.get(word, 0), word)

## - Search for a word in that file much faster using dictionary you've just created

In [None]:
import sys

class WordDictionary(dict):
    def __missing__(self, key):
        return 0

file_name = raw_input('Name of file to read: ')

file = open(file_name, 'r').read()
dictionary = WordDictionary()

for word in file.split():
    dictionary[word] += 1

print dictionary

# <font color = red> urllib2 </font>

In [None]:
import  urllib2
# Load json from network and parse
request = urllib2.Request('http://date.jsontest.com/')
response = urllib2.urlopen(request)
response_string = response.read()
print(response_string)

# <font color = red> Subprocess </font>

The subprocess module provides a consistent interface to creating and working with additional processes. It offers a higher-level interface than some of the other available modules, and is intended to replace functions such as os.system(), os.spawn*(), os.popen*(), popen2.*() and commands.*().

## - Call Subprocess and Run Command

To run an external command without interacting with it, such as one would do with os.system(), Use the call() function.

In [None]:
import subprocess

# List files using shell command ls
subprocess.call(["ls", "-1"], shell=True)

# Make a new file using touch
subprocess.call(["touch newfile.txt"], shell=True)

The command line arguments are passed as a list of strings, which avoids the need for escaping quotes or other special characters that might be interpreted by the shell.

Setting the shell argument to a true value causes subprocess to spawn an intermediate shell process, and tell it to run the command. The default is to run the command directly.

## - Collect Output From Subprocess

The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling program cannot capture the output of the command. Use check_output() to capture the output for later processing.

In [None]:
string = subprocess.check_output(["echo", "Hello World!"])
print (string)

o = subprocess.check_output(["cat /etc/passwd"], shell=True)
f = open('passwd', 'w')
f.write(o)
f.close()

# <font color = red> JSON Parsing </font>

## - Parse JSON Output

In [None]:
import json, urllib2
from pprint import pprint

# Load json from network and parse
request = urllib2.Request('http://date.jsontest.com/')
response = urllib2.urlopen(request)
response_string = response.read()
data = json.loads(response_string)

# Print entire json response
pprint(data)

# Prints date field from parsed json
print data['date']

# # Load json from file and print entire response
# data_file = open('data.json')
# file_contents = data_file.read()
# data2 = json.loads(file_contents)
# data_file.close()

# pprint(data2)
# print data2['name']

# <font color = red> EXERCISE </font>
# Walkthrough

* Download a file from the following url [url to be provided]
* Save the file
* Use subprocess to run the file using Python 
* Collect the output & parse it using JSON
* Write the response into file

In [None]:
import json
import urllib2
import subprocess

# Get the remote file
contents = urllib2.urlopen("https://goo.gl/OVrc1z").read()

# Save downloaded data as a local python file
filepy = open('runme.py', 'w')
filepy.write(contents)
filepy.close()

# Execute python file as subprocess
output = subprocess.check_output(["python2.7 runme.py"], shell=True)
json_output = json.loads(output)

# Get value of json key
url = (json_output["url"])

# Write value to new file
fileout = open('output.txt', 'w')
fileout.write(url)
fileout.close()

In [None]:
import csv
with open('class.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print row["student_name"]

In [None]:

data = [('ngoc', 18),('david', 24)]
head[]