# Advanced Modules

Python has several built in modules that we are yet to explore

In this document we will dive deeper into some useful built in modules and explore their use cases.

Modules Covered are:
1. Collections
2. Os module and Datetime
3. Math and Random
4. Python Debugger
5. Timeit
6. Regular Expressions
7. Unzipping and Zipping Modules

## 1. Collections Module

The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers(dict, list, set, and tuple).

### Counter
Sometimes you need to count the objects in a given data source to know how often they occur. In other words, you need to determine their frequency. For example, you might want to know how often a specific item appears in a list or sequence of values. When your list is short, counting the items can be straightforward and quick. However, when you have a long list, counting things can be more challenging.

---

***Counter*** is an unordered collection where elements are stored as Dict keys and their count as dict value.

Let's see how it can be used:

In [1]:
from collections import Counter

In [2]:
# Counter with lists
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]
Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

In [3]:
# Counter with strings
Counter('aabsbsbsbhshhbbsbs')

Counter({'a': 2, 'b': 7, 's': 6, 'h': 3})

In [5]:
# Counter with words in a sentence
s = 'How many times does each word show up in this sentence word times each each word'

words = s.split()

words

Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 2,
         'does': 1,
         'each': 3,
         'word': 3,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [6]:
# Methods with Counter()
c = Counter(words)

# most_common method checks words with most counts
c.most_common(2) # we specified the number of counts to return

[('each', 3), ('word', 3)]

### defaultdict
In a normal python dictionary, if you call a wrong key you usually get an error.

With a ***defaultdict*** it assigns a default value if there's an instance where there's a key error.

Let's look at this example:

In [1]:
from collections import defaultdict

In [12]:
# If a key that doesn't exist in a dictionary is called
# It assigns a value that is defined in the defaultdict 

d = defaultdict(lambda: 0)

"""

or we can print a message 

d = defaultdict(lambda: print("Key doesn't exist"))

"""

'\n\nor we can print a message \n\nd = defaultdict(lambda: print("Key doesn\'t exist"))\n\n'

#### Why is a lambda expression used here?
1. 
That's so we didn't need to define a function.

Remember that lambdas are something like an anonymous function, thus, this is the same as a function that returns 0.

`lambda: 0` 

is basically the same as

>`def myfunc():`

>>`return 0`
    
2. 
The first argument to defaultdict needs to be a callable (or  None) -- it can be a lambda, a regular named function, or even a class. That callable  needs to return a value that will be paired with a key, if that key has not been previously defined.

In a regular dictionary, if you try to access a key that doesn't exist, you will get an error. In a defaultdict, if you've provided the callable, it will assign the value returned by that callable to the key, and add that to the dictionary -- you won't get an error.

A ***callable*** is is any object that can be called like a function.

In [8]:
d["correct"] = 100

In [9]:
d["correct"]

100

In [10]:
d["Wrong Key"]

0

In [11]:
d

defaultdict(<function __main__.<lambda>()>, {'correct': 100, 'Wrong Key': 0})

### namedtuple
The standard tuple uses numerical indexes to access its members.
For simple use cases, this is usually enough. 

On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. 

***A namedtuple assigns names, as well as the numerical index, to each member.***

Each kind of namedtuple is represented by its ***own class, created by using the namedtuple() factory function.*** The arguments are the ***name of the new class and a list of strings containing the names of the elements.***

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields. 

For example:



In [13]:
from collections import namedtuple

In [14]:

Dog = namedtuple('Dog',['age','breed','name'])

# Instances of the Dog object

sam = Dog(age=2,breed='Lab',name='Sammy')

frank = Dog(age=2,breed='Shepard',name="Frankie")

In [15]:
type(sam)

__main__.Dog

In [17]:
# The output is basically a tuple with an attached class/type name
sam

Dog(age=2, breed='Lab', name='Sammy')

In [18]:
# we can call the value using its named variable
sam.age

2

In [19]:
sam.breed

'Lab'

In [22]:
# we can also call the value using its index
sam[0]

2

In [23]:
sam[1]

'Lab'

## 2. OS Module
The os module is useful because it allows you to do things like get the current working directory or list all files in a directory

Let look at these example:

In [26]:
import os

In [27]:
# get current working directory
os.getcwd()

'/Users/adedze/Documents/SE Docs/My Jupyter Notes /Python/Everything I Know About Python/09. Advanced Modules'

We could have used `pwd` but this only works in jupyter or your command line but the `os.getcwd()` works within any python scripts.


In [28]:
# Lists all items in current directory
# We can specify which directory by adding its location as an argument
os.listdir()


['#9 Advanced Modules.ipynb', '.ipynb_checkpoints']

### Moving files around
We import the ***shutil(shell utilities module)*** which can be used to move files to different locations

In [29]:
import shutil

In [31]:
# this takes in the file location and where you want to move it to
shutil.move("source", "destination")

### Deleting Files

1. `os.unlink("file path")` - deletes files
2. `os.rmdir("folder path")` - deletes empty folders
3. `send2trash.send2trash("filepath")` - sends file to trash bin

- with `send2trash` you need to install it using the pip install before importing it.

### Walking through a directory

Often you will just need to "walk" through a directory, that is visit every file or folder and check to see if a file is in the directory, and then perhaps do something with that file. Usually recursively walking through every file and folder in a directory would be quite tricky to program, but luckily the os module has a direct method call for this called os.walk(). 

4. `os.walk` - looks at every single thing in the file path, with this you can add in logic(***you can use for loop, tuple unpacking and if statements***)

Let's explore how it works:

In [38]:
os.getcwd()

'/Users/adedze/Documents/SE Docs/My Jupyter Notes /Python/Everything I Know About Python/09. Advanced Modules'

In [39]:
file_path = '/Users/adedze/Documents/SE Docs/My Jupyter Notes /Everything I Know About Python/walk'


In [40]:
for folder, sub_folders, files in os.walk(file_path):
    
    print("Currently looking at folder: "+ folder)
    print('\n')
    print("THE SUBFOLDERS ARE: ")
    
    for sub_folder in sub_folders:
        print("\t Subfolder: "+sub_folder )
    
    print('\n')
    
    print("THE FILES ARE: ")
    for f in files:
        print("\t File: "+f)
    print('\n')
    
    

## 3. Datetime Module
Allows you to create objects that have information on date, time, timezones, and operations between datetime objects like how many seconds have passed or how many days have passed.

In [41]:
import datetime

In [42]:
# creating datetime object
# time takes in arguments like hour, minute, second.....
mytime = datetime.time(2,35)

In [43]:
print(mytime)

02:35:00


In [44]:
mytime.hour

2

In [45]:
mytime.minute

35

In [46]:
# date objects
# we can use the method today() from date to get current date 
# instead of giving it the input
today = datetime.date.today()

In [47]:
print(today)

2023-07-19


In [48]:
today.year

2023

In [31]:
# when you want both date information and time information
# we import datetime from datetime
from datetime import datetime

In [32]:
mydatetime = datetime(2022,6,10,2)

In [33]:
print(mydatetime)

2023-06-10 02:00:00


In [34]:
# if we made a mistake we can use the replace( ) method and specify
# what you want to replace

mydatetime = mydatetime.replace(year = 2023)

In [36]:
print(mydatetime)

2023-06-10 02:00:00


## Arithmetics
We can perform arithmetic on date objects to check for time differences. For example:

In [47]:
d1 = datetime(2004,1,17)
d2 = datetime(2023,1,17)

In [48]:
d2-d1

datetime.timedelta(days=6940)

In [49]:
result = d2-d1

In [54]:
# checking the type returns a timedelta object which has its 
# methods and attributes
type(result)

datetime.timedelta

In [55]:
result.days

6940

## 4. Math and Random Module


In [56]:
import math

In [59]:
# if you want to discover what the math module has
help(math)

In [58]:
# rounding number
value = 4.35

In [60]:
# rounding to integer that is less or equal to the value
math.floor(value)

4

In [61]:
# rounds integer to the next value up regardless if its close to
# that value
math.ceil(value)

5

In [62]:
# Calling math constants
math.pi

3.141592653589793

In [63]:
math.e

2.718281828459045

There's log, trig functions, and more....

## Random Module

 

In [64]:
import random

## 5. Python Debugger

When trying to figure out errors within your code , sometimes we try using the `print( )` function to track down the error by printing out variable within a script.

A better way of doing this is by using Python's built-in debugger module (`pdb`). The `pdb` module implements an interactive debugging environment for Python programs. It includes features to let you pause your program, look at the values of variables, and watch program execution step-by-step, so you can understand what your program actually does and find bugs in the logic.

In [1]:
# Trying to debug with print statement

x = [1,2,3]
y = 2
z = 3

# running this cell leads to an error 
# we try using print function to trace the error

result1 = y + z
print(result1)
result2 = x + y
print(result12)

# we can see below that result1 gives it value, meaning the error
# is somewhere after result1, this can be tedious

5


TypeError: can only concatenate list (not "int") to list

In [2]:
# Using the Python Debugger
import pdb

In [None]:

x = [1,2,3]
y = 2
z = 3

result1 = y + z



# set a trace before the error line
# This will allow us to basically pause the code at 
# the point of the trace and check if anything is wrong.
pdb.set_trace()

result2 = x + y


--Return--
None
> [0;32m/var/folders/vj/vn07m31j1n7fs08rkytvgvj80000gn/T/ipykernel_76551/496148161.py[0m(12)[0;36m<module>[0;34m()[0m
[0;32m     10 [0;31m[0;31m# This will allow us to basically pause the code at[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m     11 [0;31m[0;31m# the point of the trace and check if anything is wrong.[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m---> 12 [0;31m[0mpdb[0m[0;34m.[0m[0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m     13 [0;31m[0;34m[0m[0m
[0m[0;32m     14 [0;31m[0mresult2[0m [0;34m=[0m [0mx[0m [0;34m+[0m [0my[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> x+y
*** TypeError: can only concatenate list (not "int") to list
ipdb> y+z
5


## 6. Regular Expression 

***Regular Expressions (sometimes called regex for short)*** allows a user to search for strings using almost any sort of rule they can come up. 
For example, finding all capital letters in a string, or finding a phone number in a document.

Regular expressions are notorious for their seemingly strange syntax. This strange syntax is a byproduct of their flexibility. Regular expressions have to be able to filter out any string pattern you can imagine, which is why they have a complex string pattern format.

### Searching for Basic Patterns
Let's imagine that we have the following string:

In [1]:
text = "The person's phone number is 408-555-1234. Call soon!"

We'll start off by trying to find out if the string "phone" is inside the text string. Now we could quickly do this with:

In [2]:
'phone' in text

True

But let's show the format for regular expressions, because later on we will be searching for patterns that won't have such a simple solution.

In [3]:
# importing regular expression built-in module
import re

In [4]:
pattern = 'phone'

In [5]:
re.search(pattern,text)

<re.Match object; span=(13, 18), match='phone'>

In [8]:
pattern = "NOT IN TEXT"

In [9]:
re.search(pattern,text)

Now we've seen that `re.search( )` will take the pattern, scan the text, and then returns a Match object. 

If no pattern is found, a ***None*** is returned (in Jupyter Notebook this just means that nothing is output below the cell).

Let's take a closer look at this ***Match object***.

In [10]:
pattern = 'phone'

In [11]:
match = re.search(pattern,text)

In [14]:
match

<re.Match object; span=(13, 18), match='phone'>

Notice the span has a ***start and end index information***.

We can use the `start( )` and `end( )` method to get its respective value.

The `span( )` itself is a method on the object

### A scenario where the pattern occurs more than once

In [17]:
text = "my phone is a new phone"

In [18]:
match = re.search("phone",text)

In [19]:
match.span()

(3, 8)

- Notice it only matches the first instance (matched result). 
- If we wanted a list of all matches, we use `.findall( )` re method instead of `.search()`

In [20]:
matches = re.findall("phone",text)

In [21]:
matches

['phone', 'phone']

In [22]:
len(matches)

2

In [27]:
# To get matched objects we use the finditer( ) iterator

for matched_item in re.finditer("phone",text):
    print(matched_item.span())

(3, 8)
(18, 23)


If you wanted the actual text that matched, you can use the `.group( )` method.

The `group()` method is then called on this match object to extract the actual matched phone number from the text. It returns the string that matched the pattern.

In [30]:
for matched_item in re.finditer("phone",text):
    print(f"{matched_item.group()} --> {matched_item.span()}")

phone --> (3, 8)
phone --> (18, 23)


### Using character identifier syntax for pattern searching

### dws-DWS
<img src = "../img/identifiers.png"
     height="400px"
     width="720px"
     >

In [70]:
# Examples
text = "My telephone number is 408-555-1234"
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d',text)
phone.group()

'408-555-1234'

Notice the repetition of `\d`. This can be stressful or bring too much work, especially if we are looking for very long strings of numbers.  We use the concept of quantifiers to shorten these patterns.

Let's explore the possible quantifiers.


<img src = "../img/quantifiers.png"
     height="400px"
     width="720px"
     >

In [71]:
# rewriting phone pattern using quantifiers
quan_pattern = re.search(r'\d{3}-\d{3}-\d{4}',text)
quan_pattern

<re.Match object; span=(23, 35), match='408-555-1234'>

In [72]:
quan_pattern.group()

'408-555-1234'

## Breaking dowm matched patterns

What if we wanted to do two tasks, find phone numbers, but also be able to quickly extract their area code (the first three digits). We can use groups for any general task that involves grouping together regular expressions (so that we ***can later break them down***).

Using the phone number example, we can separate groups of regular expressions using parenthesis:

In [73]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [74]:
results = re.search(phone_pattern,text)

In [75]:
# The entire result
# using group without specifying an index gives you the entire 
# result
results.group()

'408-555-1234'

In [76]:
# notice with group indexes don't start at 0 but 1
results.group(1)

'408'

### Understanding the code above:

1. The parentheses `( )` in the regular expression pattern `(\d{3})-(\d{3})-(\d{4})` are used to create capturing groups. 

- The pattern `\d{3}` matches three consecutive digits.
- Placing `\d{3}` inside parentheses `(\d{3})` creates a capturing group, which means the matched digits will be captured and made available as separate groups.

In this specific case, the regular expression pattern is designed to match a phone number in the format `###-###-####`. The three sets of parentheses create three capturing groups, each capturing three digits.

2. The `re.compile()` function was used in this case to pre-compile the regular expression pattern into a reusable object. However, as I mentioned earlier, for accessing the matched groups, using `re.compile()` is not necessary. You can directly use `re.search()` without pre-compiling the pattern.

3. The `results.group()` function is used to retrieve the entire matched string. 

- When you call `results.group()` without any arguments, it returns the entire match as a string.
- In this case, `results.group()` would return the complete phone number that matched the pattern.

Additionally, if you want to access the individual matched groups (the three sets of three digits in the phone number), you can use the `results.group(n)` function, where `n` corresponds to the group number.

For example:

```python
results.group(0)  # Returns the entire match
results.group(1)  # Returns the first captured group (first three digits)
results.group(2)  # Returns the second captured group (second three digits)
results.group(3)  # Returns the third captured group (last four digits)
```

Using `results.group(n)` allows you to extract specific portions of the matched string based on the capturing groups defined in the regular expression pattern.


### Understanding compile( )
The purpose of compiling a regular expression pattern using `re.compile()` is to convert the pattern from a string representation into a compiled pattern object.

Here are a few reasons why compiling a regular expression pattern can be beneficial:

1. Reusability: Compiling a pattern allows you to store it as a reusable object. You can assign it to a variable and use it multiple times in your code without the need to recompile the pattern each time. This can improve performance when working with large amounts of text or when the pattern needs to be used repeatedly.

2. Efficiency: Compiling a pattern can make the matching process faster. When a pattern is compiled, it is optimized by the regular expression engine, making subsequent matching operations more efficient compared to recompiling the pattern each time.

3. Readability: By compiling the pattern, you separate the process of pattern creation from its usage. This can make your code more readable and maintainable, especially when working with complex regular expressions.

Compiling a regular expression pattern into a compiled pattern object provides reusability, efficiency, and improved code organization, making it a useful technique when working with regular expressions.

Certainly! When we say "converting the pattern from a string representation into a compiled pattern object," it means that we take the regular expression pattern, which is initially written as a string, and process it to create a specialized object that represents the pattern in a more efficient and usable form.

Here's a simplified analogy to help explain the concept:

Imagine you have a recipe written in a language you don't understand. To make sense of it and cook the dish, you need to translate the recipe into a language you understand. In this analogy, the recipe is the regular expression pattern, and the translated version is the compiled pattern object.

So, by compiling the pattern, you're essentially taking the written recipe (the string representation of the pattern) and translating it into a form that the computer can efficiently understand and use to perform operations like searching, matching, or replacing text.

Once the pattern is compiled into an object, you can use it multiple times without the need to recompile it each time. It's like having the translated recipe ready to use whenever you want to cook that particular dish.

In summary, compiling a regular expression pattern involves transforming the pattern from a string form into a specialized object that the computer can interpret and use for efficient text manipulation operations.

### From Q/A


***Is re.compile( ) necessary?***

If I replace:

phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

with:

phone_pattern = r'(\d{3})-(\d{3})-(\d{4})'

it works just as fine. You can still extract certain groups like phone.group(1)

So my question is: is re.compile( ) function necessary?



***Caio Badran Kalil — Teaching Assistant***

compile() is recommended whenever you think you'll reuse the code.

It's more efficient in terms of speed and system resources.



***Alexander***

Reusing the code - you mean when it is a function, for example, which is being called several times?


***Abu Sayeed — Teaching Assistant***

Reuse mean, you want to use the same code again and again. You want to use the same pattern for multiple text. If you compile it once, you can save some time for your program.



***Eric***
"so why do we need re.compile()"

The answer is, you don't. re.compile is really only useful if you use many different patterns over and over many times. Check out the answers in this StackOverflow thread.
https://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile

## Additional Regex Syntax
### 1. or operator:
Use the ***pipe operator ( `|` )*** to have an `or` statment. 

For example:





In [77]:
re.search(r"man|woman","This man was here.")

<re.Match object; span=(5, 8), match='man'>

In [78]:
re.search(r"man|woman","This woman was here.")

<re.Match object; span=(5, 10), match='woman'>

### 2. wildcard character:

Use a "wildcard" as a placement that will match any character placed there. You can use a simple period `.`  for this. 

Meaning the period used acts as a ***wildcard***.


A ***wildcard*** is a symbol used to replace or represent one or more characters.


For example:


In [79]:
# The period matches any character followed by the letters "at"

re.findall(r".at","The cat in the hat sat here.")

['cat', 'hat', 'sat']

A single period ( . ) represents a single character before the pattern definition, so if only one period was used in the pattern definition, only a character before the pattern will get matched.

Look at this scenario below:


In [80]:
# we will get "lat" instead of "splat"
re.findall(r".at","The bat went splat")

['bat', 'lat']

In [81]:
# we will get "plat" instead of "splat"
re.findall(r"..at","The bat went splat")

[' bat', 'plat']

### Starts with and Ends With

We can use the caret sign `^` to signal starts with, and the dollar sign `$` to signal ends with:


In [82]:
# Starts with a number
re.findall(r'^\d','1 is the loneliest number.')

['1']

In [83]:
# Ends with a number
re.findall(r'\d$','This ends with a number 2')

['2']

***Note that this is for the entire string, not individual words!***

### Exclusion

To exclude characters, we can use the caret `^` symbol in conjunction with a set of brackets `[ ]`. Anything inside the brackets is excluded. 

Syntax:

`[^pattern]`

For example:



In [84]:
phrase = "there are 3 numbers 34 inside 5 this sentence."

In [85]:
re.findall(r'[^\d]',phrase)

['t',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'a',
 'r',
 'e',
 ' ',
 ' ',
 'n',
 'u',
 'm',
 'b',
 'e',
 'r',
 's',
 ' ',
 ' ',
 'i',
 'n',
 's',
 'i',
 'd',
 'e',
 ' ',
 ' ',
 't',
 'h',
 'i',
 's',
 ' ',
 's',
 'e',
 'n',
 't',
 'e',
 'n',
 'c',
 'e',
 '.']

To get the words back together, use a `+` sign

In [86]:
re.findall(r'[^\d]+',phrase)

['there are ', ' numbers ', ' inside ', ' this sentence.']

In [87]:
# We can use this to remove punctuation from a sentence.

test_phrase = "This is a string! But it has punctuation. How can we remove it?"


In [88]:
re.findall('[^!.? ]+',test_phrase)

['This',
 'is',
 'a',
 'string',
 'But',
 'it',
 'has',
 'punctuation',
 'How',
 'can',
 'we',
 'remove',
 'it']

In [89]:
clean = ' '.join(re.findall('[^!.? ]+',test_phrase))

In [90]:
clean

'This is a string But it has punctuation How can we remove it'

### Square Brackets For Grouping

We can use square brackets to group together options, for example if we wanted to find hyphenated words:

In [91]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'


In [92]:
# identifier placed inside square bracket 
# a "+" sign is added to indicate more than one character
re.findall(r'[\w]+-[\w]+',text)

['hypen-words', 'long-ish']

By placing the plus sign outside the brackets ([ \w ]+), it indicates that we want to match one or more occurrences of the entire character class. This ensures that we match sequences of word characters (letters, digits, or underscores) that are followed by a hyphen (-) and then followed by one or more word characters.

### Using Parenthesis and Or operator ( | ) for Multiple Options

If we have multiple options for matching, we can use parenthesis to list out these options. For Example:

In [71]:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw'
text = 'Hello, would you like some catfish?'
texttwo = "Hello, would you like to take a catnap?"
textthree = "Hello, have you seen this caterpillar?"

In [72]:
re.search(r'cat(fish|nap|claw)',text)

<re.Match object; span=(27, 34), match='catfish'>

In [73]:
re.search(r'cat(fish|nap|claw)',texttwo)

<re.Match object; span=(32, 38), match='catnap'>

In [74]:
# None returned
re.search(r'cat(fish|nap|claw)',textthree)

## 7. Timing Your Code

We usually find mutiple solutions for a single problem or task and you may find yourself trying to figure out the most efficient approach.

An easy way to do this is to time the code's performance.

Sometimes it's important to know how long your code is taking to run, or at least know if a particular line of code is slowing down your entire project. Python has a built-in timing module to do this.

There are three ways of timing your code:
1. simply tracking time elapsed before and after calling a function(usually impossible to use if the function is very fast)
2. using the timeit module
3. Using special %%timeit "magic"( only for jupyter notebook )


### Test Functions
Here we have two functions that do the same thing, but in different ways. How can we tell which one is more efficient? Let's time it!

In [18]:
def func_one(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    # using list comprehension
    return [str(num) for num in range(n)]

In [19]:
def func_two(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    # using map function
    return list(map(str,range(n)))

In [20]:
import time

In [21]:
# Take current time before function

start_time = time.time() # takes time from OS

# Run function

result = func_one(1000000)

# Take time after running code

end_time = time.time()

# Calculate elapsed time

elapsed_time = end_time - start_time

print(elapsed_time)

0.2624070644378662


In [22]:
# Take current time before function

start_time = time.time() # takes time from OS

# Run function

result = func_two(1000000)

# Take time after running code

end_time = time.time()

# Calculate elapsed time

elapsed_time = end_time - start_time

print(elapsed_time)

0.2160499095916748


### We usually use the timeit module which is specifically designed to time code

In [23]:
# we import timeit
import timeit

In [24]:
timeit.timeit

<function timeit.timeit(stmt='pass', setup='pass', timer=<built-in function perf_counter>, number=1000000, globals=None)>

### timeit.timeit ( stmt , setup, number )

The timeit module takes in two strings, a ***statement (stmt)*** and a ***setup***. It then runs the setup code and runs the stmt code some n number of times and reports back average length of time it took.

***NB: the statement and setup are passed in as strings***

In [25]:
stmt1 = "func_one(100)"

***The setup (anything that needs to be defined beforehand, such as def functions.)***

In [26]:
setup1 = """
def func_one(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    # using list comprehension
    return [str(num) for num in range(n)]
"""

In [27]:
# number specifies the number of times it runs over and over again
timeit.timeit(stmt1,setup1,number = 100000)

1.986692957999999

In [28]:
stmt2 = "func_two(100)"

In [29]:
setup2 = """
def func_two(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    # using map function
    return list(map(str,range(n)))
"""

In [30]:
timeit.timeit(stmt2,setup2,number = 100000)

1.7004305839999958

***With comaparison we can see that function 2 is faster than function one.***

### Using jupyter's built-in special %%timeit "magic"

In [33]:
%%timeit
func_one(100)

20 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [34]:
%%timeit
func_two(100)

20.6 µs ± 52.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## 8.Unzipping and Zipping Files

A zip file is a compressed file that saves space.

In this section we will...
1. create a zip file
2. compress a file (text file) and insert it into a zip file
3. unzip the information

In [36]:
# zipping individual files

# creating a sample file
f = open("fileone.txt","w+")
f.write("ONE FILE")
f.close()

In [37]:
f = open("filetwo.txt","w+")
f.write("TWO FILE")
f.close()

In [38]:
# import zipfile
# this allows you compress a file by allowing you to 
# create a zip file and then insert files
import zipfile

In [39]:
# creating zip file
comp_file = zipfile.ZipFile("comp_file.zip","w")

In [41]:
# next we write to the compressed file 
# we specify file we want to insert and specify the compression type
# most standard type is ZIP_DEFLATED.

comp_file.write("fileone.txt",compress_type=zipfile.ZIP_DEFLATED)

In [42]:
# inserting the other file
comp_file.write("filetwo.txt",compress_type=zipfile.ZIP_DEFLATED)

In [43]:
comp_file.close()

### Extracting items from zip file


In [44]:
# create a variable , call zipfile library and point it to
# zipfile that you are going to extract from

zip_obj = zipfile.ZipFile("comp_file.zip","r")

In [45]:
# we can extract all or specific files by using extract ("path")
# extractall("folder for extracted files")
zip_obj.extractall("extracted_content")

In [47]:
pwd

'/Users/adedze/Documents/SE Docs/My Jupyter Notes /Everything I Know About Python'

### Compressing & Extracting folders with shutil

### zipping folders

In [46]:
import shutil

In [48]:
# point out directory/folder you want to turn into a zip file
dir_to_zip = '/Users/adedze/Documents/SE Docs/My Jupyter Notes /Everything I Know About Python/extracted_content'

In [49]:
output_filename = "example"

In [50]:
shutil.make_archive(output_filename,"zip",dir_to_zip)

'/Users/adedze/Documents/SE Docs/My Jupyter Notes /Everything I Know About Python/example.zip'

### extracting zipped folders

In [51]:
shutil.unpack_archive("example.zip","final_unzip","zip")