We will cover the following modules:
- Collections
- OS and Datetime
- Math and Random
- Python Debugger
- Timeit
- Regular Expressions
- Unzipping and Zipping

# Collections Module

First object we will go over is the counter class

In [1]:
from collections import Counter

Imagine a situation where you have a list with unique values but there is also repeates of these unique values and you wanted a **count** of the number of unique items in the list. You could do it by looping through it and keeping track with a dictionary, which is slightly complicated. But Counter will do it in a single call. 

In [2]:
mylist = [1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3]

In [4]:
Counter(mylist)

Counter({1: 6, 2: 7, 3: 6})

Technically Counter is a dictionary subclass, which is why it looks like one. 

Coutner also works with strings. 

In [5]:
Counter('asadsfnokemeraasssdkmfeur')

Counter({'a': 4,
         's': 5,
         'd': 2,
         'f': 2,
         'n': 1,
         'o': 1,
         'k': 2,
         'e': 3,
         'm': 2,
         'r': 2,
         'u': 1})

In [6]:
sentence = 'this is a random sentence with words and a word words jared hello'

In [7]:
Counter(sentence.split())

Counter({'this': 1,
         'is': 1,
         'a': 2,
         'random': 1,
         'sentence': 1,
         'with': 1,
         'words': 2,
         'and': 1,
         'word': 1,
         'jared': 1,
         'hello': 1})

In [9]:
letters = 'aaaaabbbbbccccccccddddddddd'

In [10]:
c = Counter(letters)

In [11]:
c

Counter({'a': 5, 'b': 5, 'c': 8, 'd': 9})

There are a bunch of attributes and methods similar to dictionaries that we can run on Counter objects, but also has some unique ones. 

In [12]:
c.most_common()

[('d', 9), ('c', 8), ('a', 5), ('b', 5)]

In [13]:
list(c) # list of all the unique values

['a', 'b', 'c', 'd']

Now lets talk about the default dictionary. We are going to see what happens in a normal dictionary vs a default dictionary. 

In [14]:
from collections import defaultdict

In [15]:
# Normal dict
d = {'a': 10}

In [16]:
d

{'a': 10}

In [17]:
d['a']

10

Everything good so far. If we try to call a key that doesn't exist in a normal dictionary though, we get a KeyError. In certain situations, you may want to provide a *default* value in instances where a KeyError would have occured. 

In [19]:
d['b']

KeyError: 'b'

In [20]:
# we have to tell it what the default value is going to be
default_d = defaultdict(lambda: 0) 

In [21]:
default_d['correct'] = 100 #workds like a normal dict

In [22]:
default_d['correct']

100

In [23]:
default_d['wrong'] # this key doesn't exist

0

Notice we got our default value instead of a KeyError. 

Now lets go over NameTuple. 

In [24]:
mytuple = (10,20,30)

In [25]:
mytuple[0]

10

In certain cases, you may have a very large tuple, or you may not remember where a certain value is located at. The named tuple will have a named index for the value. So instead of calling mytuple[0], we can call some sort of string code. 

In [26]:
from collections import namedtuple

In [27]:
Dog = namedtuple('Dog', ['age','breed','name'])

In [28]:
Dog

__main__.Dog

In [29]:
daisy = Dog(age=4, breed = 'Schnoodle', name='Daisy')

In [30]:
type(daisy)

__main__.Dog

In [31]:
daisy

Dog(age=4, breed='Schnoodle', name='Daisy')

In [32]:
daisy.age

4

In [33]:
daisy[0]

4

# Shutil and OS Modules

We know how to open individual files with Python, but we still don't know how to do a few things:
- what if we have to open every file in a directory?
- what is we want to actually move files around on our computer?

Let's have a quick review first. 

In [35]:
pwd # current working directory - this is sort of a special jupyer/command line item

'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules'

In [36]:
f = open('practice.txt', 'w+')
f.write('This is a test string')
f.close()

The OS module is really useful because it allows you to do things like get the current working directory or list all the files. 

In [37]:
import os

In [39]:
os.getcwd() # will work in any python script

'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules'

In [40]:
os.listdir()

['.ipynb_checkpoints', 'Advanced Modules.ipynb', 'practice.txt']

We can also list items in a different directory. 

In [42]:
os.listdir('C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python')

['.ipynb_checkpoints',
 '00 Projects',
 '06 Functions',
 '08 OOP',
 '09 Modules',
 '10 Errors and Exception Handling',
 '12 Decorators',
 '13 Generators',
 '14 Advanced Modules',
 'Generators',
 'Input Output Files Basics.ipynb',
 'Projects']

Now lets move some files

In [43]:
import shutil

In [44]:
shutil.move('practice.txt', 'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\new_folder')

'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\new_folder\\practice.txt'

In [45]:
os.listdir('C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\new_folder')

['practice.txt']

In [46]:
import send2trash

In [48]:
shutil.move('C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\new_folder\\practice.txt',os.getcwd())

'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\practice.txt'

In [49]:
send2trash.send2trash('practice.txt')

# Datetime

In [50]:
import datetime

In [51]:
mytime = datetime.time(7,10)

In [52]:
mytime.minute

10

In [53]:
mytime.hour

7

In [54]:
print(mytime)

07:10:00


In [55]:
today = datetime.date.today()

In [56]:
print(today)

2021-01-27


In [57]:
today.ctime()

'Wed Jan 27 00:00:00 2021'

In [58]:
from datetime import datetime

In [61]:
mydatetime = datetime(2021,10,3,14,20,1) # year, month, day, hour, minute, second

In [62]:
print(mydatetime)

2021-10-03 14:20:01


In [64]:
mydatetime.replace(year=2022) # doesn't happen inplace

datetime.datetime(2022, 10, 3, 14, 20, 1)

In [65]:
# Date

from datetime import date

In [66]:
date1 = date(2021,2,8)
date2 = date(2021,9,8)

In [67]:
date2 - date1

datetime.timedelta(days=212)

In [68]:
result = date2 - date1

In [69]:
type(result)

datetime.timedelta

In [70]:
result.days

212

In [71]:
# Datetime

In [76]:
datetime1 = datetime(2021,2,8,22,0)
datetime2 = datetime(2022,9,15,12,30)

In [77]:
datetime2 - datetime1

datetime.timedelta(days=583, seconds=52200)

In [78]:
mydiff = datetime2 - datetime1

In [79]:
mydiff.total_seconds()

50423400.0

# Math and Random

In [80]:
import math

In [81]:
value = 4.25

In [82]:
math.floor(value)

4

In [83]:
math.ceil(value)

5

In [84]:
round(value)

4

In [85]:
round(4.5)

4

In [86]:
round(5.5)

6

Why the difference? In general you want to round to all evens or all odds. If you round down on all .5 splits, over time, your estimates over time would be lower than they should be. Or, if you rounded up, they would be higher than they should be. So if you choose a rule based on if the number is even or odd, over a long period of time, you start to even yourself out. 

In [87]:
math.pi

3.141592653589793

In [88]:
math.e

2.718281828459045

In [89]:
math.inf

inf

In [90]:
math.log(math.e)

1.0

In [91]:
math.log(100,10) # value, base

2.0

In [92]:
math.sin(10)

-0.5440211108893698

In [93]:
math.degrees(math.sin(10))

-31.17011361997944

In [94]:
import random

In [95]:
random.randint(0,100)

5

Setting a seed allows us to start from a seeded pseudo-random number. This allows us to use the same set of random numbers to duplicate experiments. 

In [96]:
random.seed(101) # the number is arbitrary

random.randint(0,100)

74

In [97]:
mylist = list(range(0,20))

In [98]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [99]:
random.choice(mylist)

6

We can sample from a list with or without replacement. 

In [100]:
# sample with replacement
random.choices(population=mylist, k=10)

[18, 10, 7, 0, 10, 12, 18, 9, 15, 6]

In [101]:
# sample without replacement
random.sample(population=mylist, k=10)

[14, 2, 8, 6, 5, 1, 7, 12, 17, 15]

In [102]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [103]:
# shuffle changes the list inplace
random.shuffle(mylist)

In [104]:
mylist

[0, 8, 17, 11, 2, 9, 5, 18, 13, 4, 16, 14, 1, 3, 19, 12, 7, 10, 15, 6]

In [105]:
random.uniform(a=0, b=100) # same likelyhood of being chosen

22.911748605196948

In [106]:
# normal / gaussian
random.gauss(mu=0, sigma=1)

1.463104539539442

# Debugger

In [107]:
x = [1,2,3]
y = 2
z = 3

In [108]:
import pdb

In [110]:
x = [1,2,3]
y = 2
z = 3

result1 = y + z

pdb.set_trace()

result2 = x + y

--Return--
None
> [1;32m<ipython-input-110-fc6d2b526938>[0m(7)[0;36m<module>[1;34m()[0m
[1;32m      5 [1;33m[0mresult1[0m [1;33m=[0m [0my[0m [1;33m+[0m [0mz[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      6 [1;33m[1;33m[0m[0m
[0m[1;32m----> 7 [1;33m[0mpdb[0m[1;33m.[0m[0mset_trace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      8 [1;33m[1;33m[0m[0m
[0m[1;32m      9 [1;33m[0mresult2[0m [1;33m=[0m [0mx[0m [1;33m+[0m [0my[0m[1;33m[0m[1;33m[0m[0m
[0m


ipdb>  x


[1, 2, 3]


ipdb>  y


2


ipdb>  x+y


*** TypeError: can only concatenate list (not "int") to list


ipdb>  q


BdbQuit: 

# Regular Expressions

Regular expressions allow us to search for patterns or structures of text, rather than exact text. For example, if we were searching for emails in a large list, we know that emails take the structure of "text" + "@" + "text" + ".com".

The re library allows us to create specialized pattern strings and then search for matches within text. The syntax is weird. It's best to focus on how to look up information to make the regular expression. For example...
- phone number: (555)-555-555
- Regex Pattern: r"(\d\d\d)-\d\d\d-\d\d\d\d"
- Regex Pattern: r"(\d{3})-\d{3}-\d{4}   (same thing)

# Timing Your Code

In [111]:
def func1(n):
    return [str(num) for num in range(n)]

In [112]:
func1(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [113]:
def func2(n):
    return list(map(str,range(n)))

In [114]:
func2(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [116]:
import timeit

In [125]:
stmt = '''
func1(100)
'''

setup = '''
def func1(n):
    return [str(num) for num in range(n)]
'''

timeit.timeit(stmt, setup, number = 1000000)

16.998414500000763

In [127]:
stmt = '''
func2(100)
'''

setup = '''
def func2(n):
    return list(map(str,range(n)))
'''

timeit.timeit(stmt, setup, number = 1000000)

12.241486400000213

In [129]:
## Magic function only works in jupyter

In [130]:
%%timeit
func1(100)

17 µs ± 662 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [131]:
%%timeit
func2(100)

12.7 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# Zipping Files

In [1]:
f = open('fileone.txt', 'w+')
f.write('ONE FILE')
f.close()

In [2]:
f = open('filetwo.txt', 'w+')
f.write('TWO FILE')
f.close()

In [3]:
import zipfile

In [5]:
comp_file = zipfile.ZipFile('comp_file.zip', 'w')

In [6]:
comp_file.write('fileone.txt', compress_type = zipfile.ZIP_DEFLATED)

In [7]:
comp_file.write('filetwo.txt', compress_type = zipfile.ZIP_DEFLATED)

In [8]:
comp_file.close()

In [9]:
# unzipping
zip_obj = zipfile.ZipFile('comp_file.zip', 'r')

In [10]:
zip_obj.extractall('extracted_content')

In [11]:
# Usually we are zipping or unzipping entire folders, not individual files
# Shell utility is better for that
import shutil

In [14]:
# We are going to zip the entire extracted_content folder
dir_to_zip = 'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\extracted_content'

In [15]:
output_filename = 'example'

In [16]:
shutil.make_archive(output_filename,'zip', dir_to_zip)

'C:\\Users\\Jared\\Documents\\GitHub\\python_data_structs_algorithms\\00 Python\\14 Advanced Modules\\example.zip'

In [18]:
shutil.unpack_archive('example.zip', 'final_unzip','zip')