# Opening and Reading Files

So far we've discussed how to open files manually, one by one. Let's explore how we can open files programatically. 

_____

### Review: Understanding File Paths

In [2]:
pwd

'C:\\Users\\Marcial\\Pierian-Data-Courses\\Complete-Python-3-Bootcamp\\12-Advanced Python Modules'

### Create Practice File

We will begin by creating a practice text file that we will be using for demonstration.

In [3]:
f = open('practice.txt','w+')

In [4]:
f.write('test')
f.close()

### Getting Directories

Python has a built-in [os module](https://docs.python.org/3/library/os.html) that allows us to use operating system dependent functionality.

You can get the current directory:

In [1]:
import os

In [6]:
os.getcwd()

'C:\\Users\\Marcial\\Pierian-Data-Courses\\Complete-Python-3-Bootcamp\\12-Advanced Python Modules'

### Listing Files in a Directory

You can also use the os module to list directories.

In [7]:
# In your current directory
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt',
 'practice.txt']

In [8]:
# In any directory you pass
os.listdir("C:\\Users")

['admin.DESKTOP-O64BPTC',
 'All Users',
 'Default',
 'Default User',
 'defaultuser0',
 'desktop.ini',
 'Marcial',
 'Public']

### Moving Files 

You can use the built-in **shutil** module to to move files to different locations. Keep in mind, there are permission restrictions, for example if you are logged in a User A, you won't be able to make changes to the top level Users folder without the proper permissions, [more info](https://stackoverflow.com/questions/23253439/shutil-movescr-dst-gets-me-ioerror-errno-13-permission-denied-and-3-more-e)

In [9]:
import shutil

In [10]:
shutil.move('practice.txt','C:\\Users\\Marcial')

'C:\\Users\\Marcial\\practice.txt'

In [11]:
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt']

In [12]:
shutil.move('C:\\Users\\Marcial\practice.txt',os.getcwd())

'C:\\Users\\Marcial\\Pierian-Data-Courses\\Complete-Python-3-Bootcamp\\12-Advanced Python Modules\\practice.txt'

In [13]:
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt',
 'practice.txt']

### Deleting Files
____
**NOTE: The os module provides 3 methods for deleting files:**
* os.unlink(path) which deletes a file at the path your provide
* os.rmdir(path) which deletes a folder (folder must be empty) at the path your provide
* shutil.rmtree(path) this is the most dangerous, as it will remove all files and folders contained in the path.
**All of these methods can not be reversed! Which means if you make a mistake you won't be able to recover the file. Instead we will use the send2trash module. A safer alternative that sends deleted files to the trash bin instead of permanent removal.**
___

Install the send2trash module with:

    pip install send2trash
    
at your command line.

In [14]:
import send2trash

In [15]:
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt',
 'practice.txt']

In [16]:
send2trash.send2trash('practice.txt')

In [17]:
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt']

### Walking through a directory

Often you will just need to "walk" through a directory, that is visit every file or folder and check to see if a file is in the directory, and then perhaps do something with that file. Usually recursively walking through every file and folder in a directory would be quite tricky to program, but luckily the os module has a direct method call for this called os.walk(). Let's explore how it works.

In [18]:
os.getcwd()

'C:\\Users\\Marcial\\Pierian-Data-Courses\\Complete-Python-3-Bootcamp\\12-Advanced Python Modules'

In [19]:
os.listdir()

['.ipynb_checkpoints',
 '00-Collections-Module.ipynb',
 '01-Datetime-Module.ipynb',
 '01-Opening-and-Reading-Files.ipynb',
 '02-Math-and-Random-Module.ipynb',
 '03-Python Debugger (pdb).ipynb',
 '04-Timing your code - timeit.ipynb',
 '05-Overview-of-Regular-Expressions.ipynb',
 '06-Unzipping-and-Zipping-Files.ipynb',
 '07-OS-Module.ipynb',
 '08-Advanced-Python-Module-Exercise',
 'comp_file.zip',
 'Example_Top_Level',
 'extracted_content',
 'new_file.txt',
 'new_file2.txt']

In [2]:
for folder , sub_folders , files in os.walk("Example_Top_Level"):
    
    print("Currently looking at folder: "+ folder)
    print('\n')
    print("THE SUBFOLDERS ARE: ")
    for sub_fold in sub_folders:
        print("\t Subfolder: "+sub_fold )
    
    print('\n')
    
    print("THE FILES ARE: ")
    for f in files:
        print("\t File: "+f)
    print('\n')
    
    # Now look at subfolders

Currently looking at folder: Example_Top_Level


THE SUBFOLDERS ARE: 
	 Subfolder: Mid-Example-One
	 Subfolder: Mid-Example-Two


THE FILES ARE: 
	 File: Mid-Example.txt


Currently looking at folder: Example_Top_Level\Mid-Example-One


THE SUBFOLDERS ARE: 
	 Subfolder: Bottom-Level-One
	 Subfolder: Bottom-Level-Two


THE FILES ARE: 
	 File: Mid-Level-Doc.txt


Currently looking at folder: Example_Top_Level\Mid-Example-One\Bottom-Level-One


THE SUBFOLDERS ARE: 


THE FILES ARE: 
	 File: One_Text.txt


Currently looking at folder: Example_Top_Level\Mid-Example-One\Bottom-Level-Two


THE SUBFOLDERS ARE: 


THE FILES ARE: 
	 File: Bottom-Text-Two.txt


Currently looking at folder: Example_Top_Level\Mid-Example-Two


THE SUBFOLDERS ARE: 


THE FILES ARE: 




___
Excellent, you should now be aware of how to work with a computer's files and folders in whichever directory they are in. Remember that the os module works for any oeprating system that supports Python, which means these commands will work across Linux,MacOs, or Windows without need for adjustment.

# datetime module

Python has the datetime module to help deal with timestamps in your code. Time values are represented with the time class. Times have attributes for hour, minute, second, and microsecond. They can also include time zone information. The arguments to initialize a time instance are optional, but the default of 0 is unlikely to be what you want.

## time
Let's take a look at how we can extract time information from the datetime module. We can create a timestamp by specifying datetime.time(hour,minute,second,microsecond)

In [1]:
import datetime

t = datetime.time(4, 20, 1)

# Let's show the different components
print(t)
print('hour  :', t.hour)
print('minute:', t.minute)
print('second:', t.second)
print('microsecond:', t.microsecond)
print('tzinfo:', t.tzinfo)

04:20:01
hour  : 4
minute: 20
second: 1
microsecond: 0
tzinfo: None


Note: A time instance only holds values of time, and not a date associated with the time. 

We can also check the min and max values a time of day can have in the module:

In [2]:
print('Earliest  :', datetime.time.min)
print('Latest    :', datetime.time.max)
print('Resolution:', datetime.time.resolution)

Earliest  : 00:00:00
Latest    : 23:59:59.999999
Resolution: 0:00:00.000001


The min and max class attributes reflect the valid range of times in a single day.

## Dates
datetime (as you might suspect) also allows us to work with date timestamps. Calendar date values are represented with the date class. Instances have attributes for year, month, and day. It is easy to create a date representing today’s date using the today() class method.

Let's see some examples:

In [3]:
today = datetime.date.today()
print(today)
print('ctime:', today.ctime())
print('tuple:', today.timetuple())
print('ordinal:', today.toordinal())
print('Year :', today.year)
print('Month:', today.month)
print('Day  :', today.day)

2020-06-10
ctime: Wed Jun 10 00:00:00 2020
tuple: time.struct_time(tm_year=2020, tm_mon=6, tm_mday=10, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=162, tm_isdst=-1)
ordinal: 737586
Year : 2020
Month: 6
Day  : 10


As with time, the range of date values supported can be determined using the min and max attributes.

In [4]:
print('Earliest  :', datetime.date.min)
print('Latest    :', datetime.date.max)
print('Resolution:', datetime.date.resolution)

Earliest  : 0001-01-01
Latest    : 9999-12-31
Resolution: 1 day, 0:00:00


Another way to create new date instances uses the replace() method of an existing date. For example, you can change the year, leaving the day and month alone.

In [5]:
d1 = datetime.date(2015, 3, 11)
print('d1:', d1)

d2 = d1.replace(year=1990)
print('d2:', d2)

d1: 2015-03-11
d2: 1990-03-11


# Arithmetic
We can perform arithmetic on date objects to check for time differences. For example:

In [6]:
d1

datetime.date(2015, 3, 11)

In [7]:
d2

datetime.date(1990, 3, 11)

In [8]:
d1-d2

datetime.timedelta(9131)

This gives us the difference in days between the two dates. You can use the timedelta method to specify various units of times (days, minutes, hours, etc.)

Great! You should now have a basic understanding of how to use datetime with Python to work with timestamps in your code!

# Math and Random Modules

Python comes with a built in math module and random module. In this lecture we will give a brief tour of their capabilities. Usually you can simply look up the function call you are looking for in the online documentation.

* [Math Module](https://docs.python.org/3/library/math.html)

* [Random Module](https://docs.python.org/3/library/random.html)

We won't go through every function available in these modules since there are so many, but we will show some useful ones.

## Useful Math Functions

In [3]:
import math

In [9]:
help(math)

Help on built-in module math:

NAME
    math

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the inverse hyperbolic cosine of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the inverse hyperbolic sine of x.
    
    atan(...)
        atan(x)
        
        Return the arc tangent (measured in radians) of x.
    
    atan2(...)
        atan2(y, x)
        
        Return the arc tangent (measured in radians) of y/x.
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(...)
        atanh(x)
        
        Return the inverse hyperbolic tangent of x.
    
    ceil(...)
        ceil(x)
        
 

### Rounding Numbers

In [5]:
value = 4.35

In [6]:
math.floor(value)

4

In [7]:
math.ceil(value)

5

In [8]:
round(value)

4

### Mathematical Constants

In [20]:
math.pi

3.141592653589793

In [21]:
from math import pi

In [22]:
pi

3.141592653589793

In [23]:
math.e

2.718281828459045

In [24]:
math.tau

6.283185307179586

In [25]:
math.inf

inf

In [26]:
math.nan

nan

### Logarithmic Values

In [10]:
math.e

2.718281828459045

In [15]:
# Log Base e
math.log(math.e)

1.0

In [12]:
# Will produce an error if value does not exist mathmatically
math.log(0)

ValueError: math domain error

In [13]:
math.log(10)

2.302585092994046

In [17]:
math.e ** 2.302585092994046

10.000000000000002

### Custom Base

In [18]:
# math.log(x,base)
math.log(100,10)

2.0

In [19]:
10**2

100

### Trigonometrics Functions

In [30]:
# Radians
math.sin(10)

-0.5440211108893698

In [31]:
math.degrees(pi/2)

90.0

In [32]:
math.radians(180)

3.141592653589793

# Random Module

Random Module allows us to create random numbers. We can even set a seed to produce the same random set every time.

The explanation of how a computer attempts to generate random numbers is beyond the scope of this course since it involves higher level mathmatics. But if you are interested in this topic check out:
* https://en.wikipedia.org/wiki/Pseudorandom_number_generator
* https://en.wikipedia.org/wiki/Random_seed

## Understanding a seed

Setting a seed allows us to start from a seeded psuedorandom number generator, which means the same random numbers will show up in a series. Note, you need the seed to be in the same cell if your using jupyter to guarantee the same results each time. Getting a same set of random numbers can be important in situations where you will be trying different variations of functions and want to compare their performance on random values, but want to do it fairly (so you need the same set of random numbers each time).

In [34]:
import random

In [41]:
random.randint(0,100)

62

In [42]:
random.randint(0,100)

10

In [45]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(101)
# You can run this cell as many times as you want, it will always return the same number
random.randint(0,100)

74

In [46]:
random.randint(0,100)

24

In [48]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(101)
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))

74
24
69
45
59


### Random Integers

In [49]:
random.randint(0,100)

6

### Random with Sequences

#### Grab a random item from a list

In [70]:
mylist = list(range(0,20))

In [71]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [72]:
random.choice(mylist)

12

In [73]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

### Sample with Replacement

Take a sample size, allowing picking elements more than once. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, **you place it back in the bag**, then continue picking another one.

In [77]:
random.choices(population=mylist,k=10)

[15, 14, 17, 8, 17, 2, 19, 17, 6, 1]

### Sample without Replacement

Once an item has been randomly picked, it can't be picked again. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, you **leave it out of the bag**, then continue picking another one.

In [78]:
random.sample(population=mylist,k=10)

[17, 19, 11, 14, 1, 3, 4, 10, 5, 15]

### Shuffle a list

**Note: This effects the object in place!**

In [79]:
# Don't assign this to anything!
random.shuffle(mylist)

In [80]:
mylist

[9, 11, 7, 12, 10, 16, 0, 2, 18, 13, 3, 5, 17, 1, 15, 6, 14, 19, 4, 8]

### Random Distributions

#### [Uniform Distribution](https://en.wikipedia.org/wiki/Uniform_distribution)

In [82]:
# Continuous, random picks a value between a and b, each value has equal change of being picked.
random.uniform(a=0,b=100)

23.852305703497635

#### [Normal/Gaussian Distribution](https://en.wikipedia.org/wiki/Normal_distribution)

In [83]:
random.gauss(mu=0,sigma=1)

-0.21390381464435643

Final Note: If you find yourself using these libraries a lot, take a look at the NumPy library for Python, covers all these capabilities with extreme efficiency. We cover this library and a lot more in our data science and machine learning courses.