## Unit #10  Misc. Topics

* Overview
* More Practice with Functions
* Timing our Code
* Creating our own Modules

<font color=blue>---------------------------------------------------------------</font>

## 10.1 More Practice with Functions


## Activity: Adding a Function to Existing Code

Earlier, we looked at writing a program to output the counts for several words.  
Take that program (called _Wednesday_activity.py_ in Collab under DH Learning Python Resources > Codes) and create a function that will read a file and perform the pre-processing activities.


<font color=blue>---------------------------------------------------------------</font>

## 10.2 Timing our Code


So far, we have worked with small text files, so that the programs can run very quickly.  But, what happens as we increase size of our text files?  How long would it take to read the file and pre-process it?

We can determine how long a chunk of code takes to run by capturing a computer time before running the chunk and again after running the chunk.  The difference in those two times will give us an estimate.

```
import time
. . .
start_time = time.time()
. . .  # Run some code here
stop_time = time.time()

elapsed_time = stop_time - start_time
print("The elapsed time is {:8.5f}.".format(elapsed_time)
```

Let's time how long it takes to read in "emma_chapter_one.txt" and do some pre-processing on it.

In [18]:
import nltk
import time

start_time = time.time()
#-----
with open("emma_chapter_one.txt") as f:
    raw_text = f.read()
text = nltk.word_tokenize(raw_text.lower())
#-----
stop_time = time.time()
elapsed = stop_time - start_time
print(elapsed)

0.04687380790710449


Note:  The time will vary from run to run and from computer to computer.  To get a better estimate, we usually average how long it takes for 3-5 runs.

We also will need to run the code for different sizes of data.

In [25]:
import nltk
import time

elapsed = 0
num_runs = 5
start_time = time.time()
#------
for i in range(num_runs):
    #-----
    with open("emma_chapter_one.txt") as f:
        raw_text = f.read()
    text = nltk.word_tokenize(raw_text.lower())
#-----
stop_time = time.time()
elapsed = stop_time - start_time
print(elapsed/num_runs)

0.04767231941223145


## 10.3 Creating our own Modules

When we use "import" in our code, we are telling Python to load functions from another file.
The beauty of this is that we can create modules with our own functions so that we don't have to keep copying and pasting the function definitions into new programs.

To do this, we simply place our functions in another file (e.g., my_tools.py).  NOTE:  It is very important that you give the file the ".py" extension.  Then, in your code, you would include the line 
```
import my_tools
```
This line will load all of the functions that you have put into the "my_tools.py" file.  To access those functions, you simply type `my_tools.` before the function name.

<font color=blue>---------------------------------------------------------------</font>


## Activity: Creating a Module

Earlier, we looked at writing a module `mytools` in which we included a function read_file().  Let's expands on that by including another function in the file.  You can call the function whatever you want, but the function should accept a filename as an input parameter and return a bag of words that have stop words removed.

When that is done, create a Python program that will read in the file "emma_chapter_one.txt" and uses the new function in  "mytools" to convert the text to a bag of words.  Then, print the length of the bag of words to the screen.


<font color=blue>---------------------------------------------------------------</font>

### An Aside

How can we capture the results that are displayed to our console in Spyder?

Suppose your program is in a file called _myProg.py_ and you want to save the results in a file called _results_1.out_.  Then, in the iPython console, type `!python myProg.py > results_1.out`