# Practical Python for Scientists and Engineers

Welcome!  The goal of these tutorials is to help you get familiar with basic aspects of Python that will allow you to be more productive in everyday work.  We will work on skills that will let you graph, manipulate, and manage data.  Our goal will be to take things one step at a time, learning only what is needed to accomplish a specific task.  The philosophy behind these tutorials is learning by doing, rather than learning to let you do something later.  Hopefully you will start learning tools right from day one that will be useful in other settings.  By the end of these tutorials, you will be able to make complicated applications that load and save data to and from files, manipulate data, run numerical simulations, make complex visualizations and more! 

Before we get started, we will load pylab - recall that this command just loads some useful tools for us.

## Tutorial 3: Functions (Part 2)
In part 1 of Tutorial 3, we learned about built-in functions in Python and how we can use modules to extend the functionality of standard Python by loading in many more functions that can help us do things like data anlaysis, mathematical computations, and visualization.  In __Part 2__ of the tutorial, you will learn how to create your own functions.

<U>Part 1</u>
- basic introduction to using functions
- built in Python functions
- using Python modules

<u>Part 2</u>
- creating your own custom functions
- creating your own modules to store functions
- using global versus local variables


### Step 1: Creating your own custom functions
It is easy to create your own functions in Python.  You do this using a function definition or `def` statement.  

### def <i>function_name</i>(<i>input_arg1, input_arg2, ...</i>):
<b>     multiple lines of commands</b><br>
<b>     return output_arg1, output_arg2, ...</b>

<i>A function definition consists of one or more lines containing the following:
- line 1: __def__ keyword followed by the function name, paraentheses containing input arguments, and ending with a colon.  If the function requires input arguments, these should be specified in the parentheses 
- line 2 to N-1: commands that the function will complete, including definition of variables, calculations, plotting, etc.
- line N: __return__ keyword followed by a list of output arguements for the function. 
</i>

While only line 1 is required to create a function, it doesn't really make sense to make a function if it doesn't actually do anything, so you should expect to have at least a few additional lines of code in your function.  Line N is required if your function will return a result.  In this case, you *must* create and use this variable in your function (i.e., on lines 2 to N-1).  In most cases, only variables that follow the __return__ keyword will be output by the function.  Any other variables you have created to store calculations or data within the function will be destroyed and lost from the computer's member once the function ends.

Let's try a simple example where we want to create a function to calculate the square of a number or array, $y=x^2$
We will name the function *xsquare*.

In [None]:
def xsquare(x):
    y=x**2
    return y

When you run the code block above, it seems like nothing has really happened.  In actuality, the funtion `xsquare()` was created and we can now use it:

In [None]:
xsquare(4)

Notice that in the code cell above we did not assign the output of `xsquare()` to a variable, so the result was just output directly to the screen.  We can, of course, assign the output of our function to a variable.

In [None]:
my_output = xsquare(4)
print(my_output)

As shown here, there is no need to assign the output of `xsquare()` to the variable `y`, eventhough that is the variable that we used when defining the function.  Also notice that after the function runs, we can no longer access the information that was stored in variable `y`:

In [None]:
print(y)

The error you see above says that *"name 'y' is not defined"*.

What that means is that y does not exist as a variable that you have created.  What this error shows us is that "what happens in a function, stays in a function" (just like Vegas!), unless you explicitly tell Python to output that information using the __return__ keyword.

Here is another example of a custom function where you ask a user for their name and that information is used to print out a special string using this information:



In [None]:
#define the function:
def hello():
    name = input('Enter your name:')
    print('Hello ' + name+'! It is nice to meet you!')
    
#now use the function (note that we don't need to be in a different code cell from where the function was defined) 
hello()

The last example illustrates a couple of interesting things.  First, notice that there are no input arguments or output arguments returned when the function terminates.  Second, in this case the use of the `input()` function provides an alternative way to insert custom information to be used within the function.  A third way to provide information for a function would be to read in data from a file.

One last point is that you can verify that there is no output provided by the function and that the `name` variable exists only within the function itself and is destroyed after the function is done running. 

In [None]:
check_output = hello()

In [None]:
print(check_output)

In [None]:
print(name)

Let's try creating a function that is not quite so basic.  First, let's imagine a case where you want to data from different experiments using the same format for all of your plots.  For this example, let's assume that the data were obtained from three different experiments that produce data following the three following equations:

- Experiment #1: $data = 0.5*t$
- Experiment #2: $data = t*sin(t/2)$
- Experiment #3: $data = 10*t*e^{-t}$

where t = observation time and we observed the experiment every minute between 0 and 10 minutes. 

Before we think about plotting, let's "create" our experimental data.

<i>(Note: remember that if we are going to be using arrays to store our data and then plot those data, we will need to import the NumPy module and the PyPlot module from Matplotlib.)</i>


In [None]:
#first we need to load the modules that we will use for this problem
#it is good practice to import all the modules that you will need as the first step of your code so that
#people will know what dependencies are present
import numpy as np
import matplotlib.pyplot as plt  #pyplot is the module within matplotlib package that contains the plot function

#create an array for all of the observation times:
t = np.arange(0,10)

#create data points we "observed" for each experiment:
data1 = 0.5*t           #experiment #1
data2 = t*np.sin(t/2)   #experiment #2
data3 = 10*t*np.exp(-t)    #experiment #3

Now that all of our hypothetical data have been created, we can focus on how we want to plot them.  Let's say that for each experiment we want to create a plot that:
- plots the data as black dots connected by a green line
- has a unique title for the experiment
- has the axis labels "Observation Time (min)" for x and "Observed Data" for y


In [None]:
#remember that we need to tell python to look in plt for the plot function

#let's start by plotting the first data set:
plt.plot(t,data1,'-g')   #we can't specify the point and line color seperately in the plot function so first we plot the green line
plt.plot(t,data1,'ok')   #now we plot the black dots on top of the green line
plt.xlabel('Observation Time (min)')  #now we add the axis labels we want for all of the plots
plt.ylabel('Observed Data') 
plt.title('Results from Experiment #1')

#next let's plot the second data set:
plt.figure()             #we use the figure function here to create a new figure window, otherwise all data would plot on the same graph
plt.plot(t,data2,'-g')   #we can't specify the point and line color seperately in the plot function so first we plot the green line
plt.plot(t,data2,'ok')   #now we plot the black dots on top of the green line
plt.xlabel('Observation Time (min)')  #now we add the axis labels we want for all of the plots
plt.ylabel('Observed Data') 
plt.title('Results from Experiment #2')

#next let's plot the third data set:
plt.figure()  
plt.plot(t,data3,'-g')   #we can't specify the point and line color seperately in the plot function so first we plot the green line
plt.plot(t,data3,'ok')   #now we plot the black dots on top of the green line
plt.xlabel('Observation Time (min)')  #now we add the axis labels we want for all of the plots
plt.ylabel('Observed Data') 
plt.title('Results from Experiment #3')


That worked great, but now let's see how we would do it by creating a function to plot our data.  First, we need to think about what inputs will our figure need?  In this example, we need to provide the x and y data series as well as a unique text string for the title.  So our function should have three input arguments, e.g., x, y, and title_text.

In [None]:
def my_plot(x,y,title_text):
    plt.figure()  
    plt.plot(x,y,'-g')   #we can't specify the point and line color seperately in the plot function so first we plot the green line
    plt.plot(x,y,'ok')   #now we plot the black dots on top of the green line
    plt.xlabel('Observation Time (min)')  #now we add the axis labels we want for all of the plots
    plt.ylabel('Observed Data') 
    plt.title(title_text)


Now that you have defined this function, you can simply call it for each data set.

In [None]:
my_plot(t,data1,'Experiment #1')
my_plot(t,data2,'Experiment #2')
my_plot(t,data3,'Experiment #3')

We get exactly the same result using only a small fraction of the code!  Plus it is much easier to understand what it is that we are doing with our data when using the function.

Using the function has several benefits: 
- functions often make it easier to read and understand complex code
- the amount of code you need to create the graphs is minimal
- the chance of errors in your code is less because you only write it once
- if you change your mind about a detail of the function you will only need to make the change in one place
- if something changed in your data (e.g., you removed outliers) you can rerun your code without making any changes
- you can reuse your function over and over again


Recall that the `info()` function provide details about what a function does.  If we use the `info()` function on our `my_plot()` function we find that it doesn't tell us much - it just says "None" because we didn't provide any useful help to let the user know how to use our function.

*(Note: The `info()` function is actually part of the NumPy module, so you will need to refer to it as `np.info()` since that is how we have imported NumPy in this tutorial.)*

In [None]:
np.info(my_plot)

There is one additional part of a function that we didn't discuss earlier that is important for giving users (and you!) some help about how to use your function.  You can add text documenting the function by adding a text block surrounded by three sets of quotation marks as shown below:  

In [None]:
def my_plot(x,y,title_text):
    """    
    Summary: This function plots a set of x versus y experimental data using a custom format.
    
    Inputs:
    -------
    x: observation times in the experiment (can be a single value, list, or array of values 
    y: data observed at each observation time (must be the same size as x)
    title_text: a text string containing the text for the title of the plot
    
    Outputs:
    --------
    No outputs are provided by this function
    
    
    The my_plot function creates a plot of x versus y with the title given by the string title_text.
    The plot will show the data as black dots connected by a green line.  By default the x-axis title is 
    "Observation Time (min)" and the y-axis title is "Observed Data".
    
    The following dependencies exist (these modules must have been loaded for the function to work):
      pyplot from matplotlib
      numpy
    
    Function created by S.Moysey, 3/1/2021, v0.1
    """
    plt.figure()  
    plt.plot(x,y,'-g')   #we can't specify the point and line color seperately in the plot function so first we plot the green line
    plt.plot(x,y,'ok')   #now we plot the black dots on top of the green line
    plt.xlabel('Observation Time (min)')  #now we add the axis labels we want for all of the plots
    plt.ylabel('Observed Data') 
    plt.title(title_text)

In [None]:
np.info(my_plot)

It is strongly recommended that you always provide this help information whenever you write a function.  The format above is a good one as it ensure that you communicate:
- a quick summary of what the function does
- the inputs required for the function to run
- the outputs produced by the function
- details on how to use the function (in many cases examples are really helpful too!)
- dependencies that are required to run the function
- who wrote the function, when it was created, and what version of the function it is (especially important as you revise functions over time

If you get in the habit of always communicating these things now, you will be much better off in the long run.

### Step 2: Creating your own modules to store functions
It is extremely easy to create your own modules that can store special functions that you use all of the time or are created to accomplish some special purpose (e.g., processing of a specific kind of data).

To create a module, all you have to do is simply create a text file that:
1. has a filename that ends in .py as in *module_name.py*
2. imports any other modules used by the functions in the module you are creating
3. contains a definition statement for the function for each function that you want to include in the module (just like we have done throughout this tutorial)  

For an example, take a look at the textfile called *my_module.py* that is available for download from this link: [my_module.py](https://www.dropbox.com/s/h8q1bbpm53qe7rh/my_module.py?dl=0)

Download the *my_module.py* file to your computer.  Then upload the *my_module.py* file to the Jupyter server just like you uploaded the notebook file you are working in now (i.e., go back to the main Jupyter tab of your browser where you can see the list of your notebooks, then click the upload button and select the *my_module.py* file.)


In [None]:
import my_module as my

This custom module contains all of the functions we have written in this tutorial.  You can really see the power of modules here.  If you had previously created the module, you could reproduce everything that we have done in this tutorial with only 5 lines of code (not including creating your experimental data).  

Pretty impressive!  

In [None]:
my.my_plot(t,data1,'Experiment #1')
my.my_plot(t,data2,'Experiment #2')
my.my_plot(t,data3,'Experiment #3')

In [None]:
a = my.xsquare(np.array([1,2,3,4]))
print(a)

In [None]:
my.hello()

### Step 3: Local versus Global Variables
Earlier we said that the variables created within a function are destroyed after the function runs.  This means that these are *local variables* that only 'live' in the scope of the function.  They cannont be accessed anywhere else in the program.  You can change this behavior by using the `global` keyword to make a variable exist outside of the scope of the function. 

Variables you create in the main workspace can be read within a function even if they are not passed in as input arguments.  However, a function cannot update the value of the variable unless it is declared to be a global variable within the function.

Consider the following example in detail to understand how global and local variables act differently.

In [None]:
x = 'test x'  #this variable is defined in the main workspace and can be accessed anywhere in a program (including a function)

#define a function that reads and prints x from the main workspace
def foo():
    print(x)  #this line will print the value of the variable x as read from the main workspace

#run the function foo
foo()  

Note that even though we didn't pass x in to the function as an input argument, the function was still able to read it.

See what happens when we now try to change the value of x:

In [None]:
#define a function that reads and prints x from the main workspace
def foo():
    print(x)  #this line will print the value of the variable x as read from the main workspace
    x = 'updated x'
    print(x)
    
    
#run the function foo
foo()  

We receive an error because the line `x = 'updated x'` actually created a new *local* variable called x within the scope of the function.  So now there are two differnt x variables, one within the foo function (local) and one within the main workspace.  The local variable overrides the global one within the function itself.  

The reason we received an error in this case is because Python is smart enough to know that we define a local variable called x within the function, but then when we ran the first `print(x)` command in the function Python says 'wait a minute, you haven't assigned a value to this local variable yet, how can I print the value of x when no value yet exists?'

Let's try one more time, but this time removing the first print command:

In [None]:
def foo():
    x = 'updated x'
    print(x)
       
#run the function foo
foo()  
print(x)

In the example above, everything now runs without an error.  You can see though that x has two different values depending on whether we are inside or outside of the function (really this is two different variables both called x).

Now let's see what happens if we go back to having the fist `print(x)` command that previously gave us an error, but this time we first tell Python that x is a global variable:

In [None]:
def foo():
    global x
    print(x)  #this line will print the value of the variable x as read from the main workspace
    x = 'updated x'
    print(x)
    
    
#run the function foo
foo()  
print(x)

In this case we didn't get an error.  Though let's dive in to understand what happened.  By making x a global variable, it no longer exists only within the scope of the function.  In other words, the function can read and write to the x variable that exists in the main workspace.  So now when the function runs, the first `print(x)` statement reads the pre-existing value of x in the main workspace.  The line `x = 'updated x'` overwrites the value of x in the main workspace.  The second `print(x)` command in the function again prints the value of x from the main workspace, but this has now been updated to be the text *updated x*.  Finally, after the function is finished running, we use the `print(x)` command outside of the function and we can see that even though the function is long gone, the value of x is still the updated value we set inside of the function.

While the use of global variables is handy and sometimes critical, it is best to avoid them as much as possible so that you don't accidentally do something like overwrite your data!  It is a __best practice__ to always try to use input and output arguements to pass information in our out of a function and let all the variables in the function be local variables.  

This might be a bit confusing now, but as you continue to get experience in Python you will better understand this issue and will eventually run into cases where the use of global variables is of key importance.

### Challenge Problem
Create your own custom module containing functions that allow you to subtract the average value from an arbitrary data set before plotting the data using a custom format.  

You can use the following example dataset:

x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

y = 6.0, 21.6, 22.5, 27.0, 23.8, 30.5, 26.6, 34.2,  38.0  39.3

