<a href="https://colab.research.google.com/github/kerryback/2022-BUSI520/blob/main/Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Code cells and markdown cells in Jupyter notebooks

* Execute either with run icon or SHFT-ENTER
* Can use latex in markdown cells
* Can make bulleted lists with *, large font with # or ## or ###, much more
* Double click a markdown cell to put it in edit mode and SHFT-ENTER to put it in viewing mode
* In a code cell, everything on a line after a # is ignored by the python interpreter
* Use # to add comments to a file for yourself or others to read
* Can also have multi-line comments using triple-quoted strings (later)
* If the last expression in a code cell is not assigned to a variable, its value will be printed below the cell.

### Jupyter notebooks vs .py scripts

You can run python from a Jupyter notebook or by executing a script.  A script is just a text file with a sequence of python commands (a program).  The conventional file extension for a python script is .py.  

Notebooks are interactive and are better for exploring data and developing code.  Notebooks can be confusing, because
you can skip around and execute things in a different order than they appear in the file.  When your code is complete (until you have to revise your paper), it could be better to put in script form so it is clearer when you come back to it.  If I have something that takes hours or days to execute, I usually write it as a script and run it from a terminal window on the JGSB server using tmux.

### IDEs

In addition to notebook vs .py, you have a choice of IDE (Integrated Development Environment).  It is possible to write python code in a text editor like Notebook and run scripts in a terminal window, but there are better choices.  Possibilities include

* Jupyter Notebook or JupyterLab installed by Anaconda
* stand-alone JupyterLab
* PyCharm
* VS Code (Visual Studio Code)
* Spyder
* JupyterHub on the JGSB server
* Cloud servers including Colab and Paperspace Gradient

PyCharm and VS Code can run either notebooks or scripts.  They have the best code completion and syntax highlighting.  The extra versatility means the learning curve for them is a little steeper.

If you ever want to do deep learning (neural networks) you will probably want to use a cloud server that provides GPUs.

### Modules

Python consists of an interpreter and modules (or libraries or packages).  If you install from python.org, you get the interpreter and core modules.  If you install from Anaconda, you get the interpreter, core modules, and standard scientific modules.

If you install from Anaconda, you by default work in a "conda environment."  It is best to use the Anaconda Navigator or "conda install" in a terminal window (in the conda environment) to install new modules when possible.  Not all modules have been catalogued by conda.  For those, you need to use "pip install" in a terminal window (in the conda environment).

We can issue operating system commands from inside a Jupyter notebook by prefacing them with !.  So, we can bypass using the terminal.  The following will install pandas-datareader.  It is already installed on Colab, and you can use conda install instead with Anaconda. 

New modules are created daily and published to PyPi for downloading with pip.  It is actually quite easy to create and publish your own modules.  There is no quality control, but there are millions of users of standard packages, and bugs are quickly reported.  Use Google and especially StackOverflow.

In [72]:
!pip install pandas-datareader



### Start by importing modules or things from modules

For clarity, it is a good idea to import everything you need at the top of a notebook or script.  

* "import ..." or "import ... as" provides access to everything in the module by prefixing with the module name or alias (e.g., np.whatever)
* "from ... import ..." or "from ... import ... as ..." provides direct access to whatever is imported using its name without a prefix
* The reason for the prefix style is to avoid having duplicate names in the workspace (only one would work).

In [73]:
import numpy as np
import pandas as pd
from pandas_datareader import DataReader as pdr
from pandas_datareader.famafrench import get_available_datasets as gad 
from seaborn import load_dataset
from math import sqrt, exp, log
from scipy.stats import norm

### Assignment statements

* a=b means "evaluate b and assign the result to a variable named a"
* spaces around = are optional, and are optional in most other places as well (with one important exception we will discuss)
* can use any name not reserved by python for a variable, must start with letter or underscore, cannot have spaces or some other characters
* can have multiple assignments on a single line


In [74]:
x = 3
x = x + 1
x

4

In [75]:
x, y, z = 3, 4, 5
z

5

### Basic object types

In [76]:
# type(3)
# type(2.718)
# type('some text')
# type(['a', 'b', 'c'])
# type(('a', 'b', 'c'))
# type({'a': 1, 'b': 2})
# type(sqrt)
# type(True)
# type(False)
# type(None)

### Some basic functions

In [77]:
# round(3.14, 1)
# int(3.14)
# int('3')
# str(3)

### Print statements


In [78]:
x = 3.14
y = 2.718
print(x)
print(x, y)
print("x is", x, "and y is", y)


3.14
3.14 2.718
x is 3.14 and y is 2.718


### Working with lists

* Counting starts from zero.  
* Ranges int1:int2 start with int1 and go to but not including int2.
* Consequently, the subset has int2 - int1 elements
* Last item can be accessed as -1, next-to-last as -2, etc.
* Can use range int1:int2:int3 to go from int1 to int2 stepping by int3
* In range int1:int2:int3, int3 can be negative and int1>int2
* Concatenate lists with +



In [79]:
x = ['a', 'b', 'c', 'd']
y = ['m', 'n', 'o' 'p', 'q']

# len(x)
# x[0]
# x[1]
# x[:2]
# x[0:2]
# x[1:4]
# x[1:4:2]
# x[4:1:-1]
# x[-1]
# x[:-1]
# x[-2]
# x[-3:]
# x[-3:-1]
# x[1]='w'
# x[1:3] = ['y','z']
# x.append('u')
# indx = x.index('a')
# x.remove('a')
# x.insert(indx, 's')
# x.reverse()
# x + y
# 3 * x
# 7 * [0]

### Working with strings

* Can use either single or double quotes
* Concatenate with +
* Strings are basically lists of characters, and many list operations work on strings

In [80]:
string1 = 'This is some text'
string2 = "This is some different text"

# string1[:4]
# string1[-4:]
# string1 + ". " + string2 + "."
# string1.split(" ")
# string1.split(" ")[0] + " " + string1.split(" ")[3]
# string1.title()
# string1.upper()



### Logical conditions

A single = is an assignment, so to test for equality we use ==

In [81]:
# 3 == 3
# 3 == 5
# 3 != 5
# not(3==5)
# 3 > 5
# (2<4) and (3>5)
# (2<4) & (3>5)
# (2<4) or (3>5)
# (2<4) | (3>5)
# 1 in [3, 4]
# 1 not in [3, 4]
# 1 * (3>5)
# 1 * (2<4)


### Ternary operator




In [82]:
# "yes" if 3>5 else "no"
# 10 if 3>5 else 100
# 10 if (2<4) or (3>5) else 100
# 10 if 0 else 100
# 10 if None else 100
# 10 if 16 else 100

### List enumeration

In [83]:
letters = ['a', 'b', 'c', 'd']
# [x + '1' for x in letters]
# [x + '1' for x in letters if not x=='c']


### Range objects

In [84]:
# [i for i in range(6)]
# [i for i in range(1,7)]
# [i for i in range(2,8,2)]
# [i for i in range(6,2,-1)]

### Assign by reference or by value

"By reference" means the memory location is assigned to a variable.  Multiple variables can be assigned to the same location and changes to any of them will affect all of them.  Integers and floats are assigned by value.  Lists and other types of arrays are assigned by reference.

Lists have a copy method that creates a copy at a new memory location.

In [85]:
x = 3
y = x
x = x + 1
y

3

In [86]:
x = ['a', 'b', 'c']
y = x
x.append('d')
y

['a', 'b', 'c', 'd']

In [87]:
x = ['a', 'b', 'c']
y = x.copy()
x.append('d')
y

['a', 'b', 'c']

### Defining functions

Why functions?  Modularized code is easier to test, maintain, and reuse.

* The def keyword starts the function definition.
* Arguments are enclosed in parentheses and followed by a colon.
* The return keyword indicates the value returned by the function.
* Functions can return numbers, lists, strings, ...
* Indentation is crucial.  All lines within the function definition must be indented the same number of spaces, unless there is a reason the line must be indented further (more later) or it is within parentheses or braces.  A good IDE will prompt you to indent and tell you when you have indentation wrong.


In [100]:
def double(x):
    return 2*x

double(3)

6

### Passing arguments by name

In [101]:
def exponentiate(base, exponent):
    return base ** exponent

# exponentiate(2, 5)
# exponentiate(base=2, exponent=5)
# exponentiate(exponent=5, base=2)
# exponentiate(5, 2)

### Returning tuples


In [102]:
def f(x):
    return 2*x, 3*x 

a, b = f(2)
b

6

### Defining classes

Why classes?  To store data so we don't have to input it repeatedly into functions.

* Main takeaways: objects have attributes and methods
* Attributes are data (in a general sense - not necessarily numbers) that are stored in the object
* Methods are functions that operate on the data and possibly on other arguments
* A class definition is initiated with the class keyword
* The __init__ method is how an instance of the object is created.  It usually defines the attributes.
* Note that the lines following the class keyword must be indented, and method definitions must be further indented.
* In general, indentation is sequential.  Each class / function / for or while block / if-else block must be further indented.


In [103]:
class multiplier():
    def __init__(self,x) :
        self.factor = x
    def multiply(self, y) :
        return y * self.factor

x = multiplier(3)
# x.multiply(4)
# x.factor

### Sets

Sets are unordered collections with no repeated items.

In [104]:
x = set([1, 1, 2, 3])
x

{1, 2, 3}

In [105]:
x.issubset([1, 2, 2, 3, 4, 4])

True

### Dictionaries

* Dictionaries are unordered collections of key/value pairs
* Sometimes called look-up tables or hash tables 
* Compared to normal dictionaries, key $\sim$ word and value $\sim$ definition.
* Created with dict function or by enclosing key/value pairs in {}
* Keys and values can be any types of objects

In [106]:
x = {'a': 1, 'b': 2}
x['a']

1

In [107]:
x = dict(a=1, b=2)
# x['a']
# list(x.keys())
# list(x.values())

### Loops

* A loop is a block of code that is executed repeatedly, for a given number of times (for loop) or until some condition is met (while loop).
* It is not common to need a while loop.
* On the other hand, zip and enumerate are often useful (especially zip) in for loops.
* Indentation is again crucial.  

In [129]:
for i in range(5):
    print(i)

0
1
2
3
4


In [130]:
for ltr in ['a', 'b', 'c']:
    print(ltr)

a
b
c


In [132]:
lst1 = ['a', 'b', 'c']
lst2 = ['1', '2', '3']
for a, b in zip(lst1, lst2):
    print(a+b)

a1
b2
c3


In [133]:
for i, ltr in enumerate(lst1):
    print(ltr+lst2[i])

a1
b2
c3


In [134]:
i = 0
while i<3:
    print(i)
    i = i + 1

0
1
2


### Conditional execution

* An indented block following an if statement is executed only if the condition evaluates to True.
* Often but not always there is an else with another indented block following the if block.
* There can also be one or more elif (else if) blocks based on additional conditions.

In [135]:
def f(number):
    if number < 10:
        return 'small' 
    elif number < 100:
        return 'medium' 
    else:
        return 'large' 

f(20)

'medium'

In [136]:
def g(number):
    return 'small' if number<10 else ('medium' if number<100 else 'large')

g(20)

'medium'

### Default values for function arguments


In [137]:
def f(x, y=3):
    return x*y

print(f(2, 3), f(2), f(2, 4))

6 6 8


In [138]:
# tips.head()
# tips.head(3)
# tips.head(n=3)
# help(tips.head)

### Local and global variables

You can reuse variable names inside a function definition without affect variables with the same name outside the definition (variables are local to the function).

In [139]:
y = 2

def f(x):
    y = x + 1
    return y

z = f(3)
print(y, z)

2 4


In [140]:
lst = ['a', 'b', 'c']

def f(x):
    lst = [x]
    return lst

z = f('d')
print(lst, z)

['a', 'b', 'c'] ['d']


Inside a function definition, you can use variables that are defined outside.

In [141]:
y = 2

def f(x):
    return y+x

z = f(3)
print(y, z)

2 5


In [142]:
lst = ["a", "b", "c"]

def f(x): 
    return lst + [x]

z = f('d')
print(lst, z)

['a', 'b', 'c'] ['a', 'b', 'c', 'd']


Inside a function, you can change variables that are defined outside the function.

In [143]:
lst = ['a', 'b', 'c']

def f(x):
    lst.append(x)
    return lst

z = f('d')
print(lst, z)

['a', 'b', 'c', 'd'] ['a', 'b', 'c', 'd']


In [144]:
y = 2

def f(x):
    global y
    y = y + x
    return y

z = f(3)
print(y, z)

5 5
