# Introduction to Python
by Nick Blauch and Matt Clapp

Python is one of the most popular general purpose programming languages, and it has also been developed to serve as a complete replacement for numerical computing languages like MATLAB. In this tutorial we will cover some of the basic principles of programming in Python, along with simple use cases for some of the most useful packages for scientific computing. However, **we can only scratch the surface here** and we encourage you to do a lot more research on your own time. 

## General themes 

- Python is an object-oriented language
    - this means that "objects" - defined by their class methods and properties - are given a special status in Python. They are, according to the Python creator Guido van Rossum, "first class"
    - rather than simply writing functions that operate on generic inputs, you should try to organize your code to utilize objects, with classes that have their own functions (methods). More on this later.
    - this allows you to take advantage of concepts such as inheritance and encapsulation, which we will discuss later
- There are several ways to execute Python code. 
    - You can see the setup guide we developed for more information on that. 
    - For the UPNC we will be using JupyterLab and primarily Jupyter Notebooks. 
    - When it comes to writing code to do more intensive analyses, or for larger projects, we do not recommend Jupyter Notebooks. Rather, you should write Python files that can depend on each other, allowing you to cleanly organize your code.
- Python is excellent for data science
- Python is excellent for machine learning
- Python is excellent for neuroscience

## What Jupyter notebooks are and are not:

Jupyter notebooks are a great tool for:
- simple analyses
- prototyping new analyses
- visualizing your data
- developing interactive exercises and tutorials

Jupyter notebooks are not:
- a complete replacement for .py files
- easily tracked with version control (since their underlying representation is not simply human-readable text
- meant for intense analyses that need to maximally take advantage of parallel computation

Now that we know what Jupyter Notebooks are and are not, we can take advantage of their amazing functionality and ease-of-use for this tutorial. We will also be using them for many of the exercises to come in the uPNC.

## Variables and different built in object types

Let's start by covering the basic variable types, how to use them, and some simple operations that can be done on them.

In [1]:
# we can store values in variables
a = 12 # defaults to int
# we can do some math on those variables
b = a*12
print(type(b))
# in Python 3, division automatically converts to float
print(b/12)
print(type(b/12))

<class 'int'>
12.0
<class 'float'>


### Multi-element arrays: lists and tuples

In [2]:
# we can store multiple values in a list
my_list = [12, 14, a, b]
print(my_list)
# we can compose a list of lists
my_double_list = [my_list, my_list]
print(my_double_list)
my_list = [12, 14]
print(my_double_list)

[12, 14, 12, 144]
[[12, 14, 12, 144], [12, 14, 12, 144]]
[[12, 14, 12, 144], [12, 14, 12, 144]]


In [3]:
# indexing into lists - 
first_el = my_list[0] # Python is 0-based
last_el = my_list[-1] #the last element can be acquired with index -1
# we can use colon indexing to get multiple variables. 
first_two_els = my_list[0:2] # last index is given by last number - 1
print(first_el)
print(last_el)
print(first_two_els)

12
14
[12, 14]


In [4]:
# tuples are another form of multi-element array created with () notation
my_tuple = (12, 14.0) # like lists, they can store elements of different types

### Mutability

Mutability is an important concept and defines the difference between lists and tuples.

- Mutable objects can be changed after creation
    - built-in types: list, dict, set
- Immutable objects cannot be changed after creation
    - built-in types: int, float, complex, string, tuple, frozen set


In [5]:
# if we construct a list including int objects, changing value of the pointer to the int object won't affect the int in the list, which is immutable
a = 12
my_list = [12, 14, a]
a = 100
print(my_list)
# however the list itself can be modified
my_list[-1] = a
print(my_list)

[12, 14, 12]
[12, 14, 100]


In [6]:
# Python also has tuples, which are immutable. replace [] of lists with () for tuple
my_tuple = (12, 14, a)
print(my_tuple)
my_tuple[-1] = 12 # can't change elements of a tuple! 

(12, 14, 100)


TypeError: 'tuple' object does not support item assignment

### Dictionaries

Dictionaries are one of the most useful built-in types in Python. 

They are essentially look up tables of key:value pairs.

In new versions of Python, all dictionaries are ordered

In [7]:
# dictionaries can be created with curly bracket syntax or the built-in dict function
my_dict = {'best_language': 'Python', 'best_research_field': 'computational neuroscience'}
my_dict_2 = dict(best_language='Python', best_research_field='computational neuroscience')
# and we can index into values using square brackets indexed by the key
print(my_dict['best_research_field'])
print()

computational neuroscience



In [8]:
# we can loop over dictionary values
for val in my_dict:
    print(val)

best_language
best_research_field


In [9]:
# using .items() gives both the key and value for each iteration
for key, val in my_dict.items():
    print(f'{key}: val')

best_language: val
best_research_field: val


In [10]:
# remember we can use enumerate to get the index, too
for ii, (key, val) in enumerate(my_dict.items()):
    print(f'{ii}-th item is {key}: {val}')
    

0-th item is best_language: Python
1-th item is best_research_field: computational neuroscience


In [11]:
# dictionary comprehensions are also possible, similar to list comprehensions
inverse_dict = {val: key for key, val in my_dict.items()}
print(inverse_dict)

{'Python': 'best_language', 'computational neuroscience': 'best_research_field'}


## Basics of object-oriented programming

In object-oriented languages such as Python, everything is an object, which is defined by its class. Different classes have different properties and methods, and classes can properties and methods *inherit* from other classes

In [12]:
# everything is an object, defined by a class. what is 4?
a = 4
print(type(a))

<class 'int'>


In [13]:
# the methods and properties of a class can be accessed with the built-in dir function
print(dir(a))
print() # we can just call print to get an extra line

['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']



In [14]:
# even dir is an object, technically
print(type(dir))
print(dir(dir))
print()

<class 'builtin_function_or_method'>
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__text_signature__']



In [15]:
# of course, lists are also objects
l = [12, 14, 16]
print(type(l))
print(dir(l))

<class 'list'>
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [16]:
# the help function can also be used, and is sometimes cleaner and more helpful ;) 
help(l)

Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /

In [17]:
# an object's methods are accessed using dot syntax
x = l.pop() # pops the last element in the list
print(x)
# note that general functions, such as dir, must exist in the namespace (either built-in, imported, or defined), and are called directly with objects as input
# sometimes, these functions look for a specific class method defined for the object, which is what dir does
print(dir(l) == sorted(l.__dir__()))

16
True


### Underscores in names
- in general, an object will have many methods or attributes with single or double underscores prepending and/or appending the name
    - e.g., `list.__len__`
- this syntax is sometimes purely convential, and sometimes functionally meaningful
    - this guide provides a helpful analysis of this syntax, which we will cover here (https://dbader.org/blog/meaning-of-underscores-in-python)
- `_var`:
    - hint to another programmer that a variable or method starting with a single underscore is intended for internal use
    - only functional role is that these methods will not be imported with the wildcard syntax (`from module import *`), which you should avoid anyway
- `var_`:
    - used if the relevant name is already taken, to avoid naming conflicts and allow laziness
- `__var`:
    - triggers "name mangling", which essentially allows a variable to be defined for a class and not overridden when a subclass is created
    - prevents direct access as `object.__var` , and is rather changed to, `object.__classname__var`
- `__var__`:
    - indicates special methods defined by the Python language
    - should be avoided for naming methods, except when implenting methods for which Python has defined a general purpose, such as `__init__` , `__call__`, or `__iter__` for your own classes
    - no name mangling
- `_` :
    - temporary or insignificant variable
    - gives the result of the last expression in a Python interpreter (including Jupyter Notebooks)

### Creating our first custom class

Using custom classes is an essential aspect of effective programming in Python.

Let's create a simple class that we can use to represent people. This class must have an `__init__` method, and any other methods that we think would be important. 

In [18]:
class Person(object): # (object) means that Person inherits from object, the most generic object type. all custom classes must inherit from something 
    """
    A Person object is defined by its name, age, occupation, and (favorite programming) language
    
    The __init__ function defines the input parameters. Our function takes 4 inputs:
    
    name (str): the name of the person
    age (int): the age of the person in years
    occupation (str): the occupation of the person
    language (str): the person's favorite programming language (default = 'Python')
    
    """
    def __init__(self, name: str, age: int, occupation: str, language: str = 'Python'):
        super().__init__()
        self.name = name
        self.age = age
        self.occupation = occupation
        self.language = language
    
    def get_name(self):
        # can you implement this?
        raise NotImplementedError()
        
    def get_age(self):
        # can you implement this?
        raise NotImplementedError()    
        
    def get_occupation(self):
        # can you implement this?
        raise NotImplementedError()
        
    def get_language(self):
        # can you implement this?
        raise NotImplementedError()
        
## question: is there any reason to implement the functions that we listed above? if so, what? 
 

### Inheritance

As mentioned earlier, all classes inherit from some other class. If we are defining a completely new class, the most basic class to inherit from is `Object`. However we can also inherit from more specific classes.

Let's create the `GradStudent` class to demonstrate inheritance from the `Person` class. Writing this class will be quick since it will inherit most of its functionality from the `Person` class. The one difference is that we will specify the occupation in terms of the field of study and year of study to automatically compute the occupation.

In [19]:
class GradStudent(Person):
    """
    A grad student is a Person with occupation graduate student. 
    We can also specify the field of study and years of experience to make a more specific grad student
    """
    def __init__(self, name: str, age: int, language: str = 'Python', field_of_study=None, years_of_experience=None):
        field = f'{field_of_study}' if field_of_study is not None else ''
        experience = f'Year {years_of_experience} ' if years_of_experience is not None else ''
        occupation = f'{experience}{field_of_study} graduate student'
        super().__init__(name, age, occupation, language)
        
nick = GradStudent('nick', 25, language='Python', field_of_study='neural computation', years_of_experience=2)
print(f'{nick.name} is a {nick.occupation}')

nick is a Year 2 neural computation graduate student


### Creating a simple function

We have already seen class-specific functions (i.e. methods). But functions can also be created independently of classes, and used to operate on arbitrary inputs. 

In [20]:
def multiply(l: list, a: int) -> list:
    """
    takes a list l and multiples it by an int or float a
    """
    print(type(a))
    x = [a*b for b in l]
    return x

multiply([2,3,4], 'hello')

<class 'str'>


['hellohello', 'hellohellohello', 'hellohellohellohello']

### `*args` and `**kwargs`

You will often see `__init__` function calls include `*args` and/or ``**kwargs``. This allows additional unnamed arguments to be grouped into a tuple here called `args` (the name itself is arbitrary), and additional named arguments to be grouped into a dictionary here called `kwargs`.

Some uses of this:
- **Raw functionality**: writing a function which itself may depend on a variable number of inputs
- **Cleaner code**: writing a function which depends on other functions that accept a variable number of inputs


In [21]:
# Raw functionality - we can replace args with a more informative name here
def get_grad_student_names(*grad_students):
    names = []
    for grad_student in grad_students:
        names.append(grad_student.name)
    return names
print(get_grad_student_names(
    Person('Nick', 25, 'Neural computation graduate student', 'Python'),
    Person('Matt', 23, 'Neural computation graduate student', 'Python'),
                            )
     )

['Nick', 'Matt']


In [22]:
# more functionality AND cleaner code
class uPNCStudent(Person):
    """
    a uPNCStudent is a Person who is defined to be an undergraduate student.
    
    These students may also be allowed to have extra properties defined by the programmer
    
    """
    def __init__(self, name, age, language, **kwargs):
        super().__init__(name, age, 'undergraduate student', language)
        # this loop assigns all key, val pairs in kwargs directly to the uPNCStudent object
        for key, val in kwargs.items():
            setattr(self, key, val)

joe = uPNCStudent('Joe', 19, 'Java', favorite_ice_cream='mint chip')
print(joe.favorite_ice_cream)

mint chip


We can also use **unpacking** of a dictionary to specify the kwargs (or of a tuple to specify the args)

In [23]:
joe = uPNCStudent('Joe', 19, 'Java', **{'favorite_sport': 'baseball'})
print(joe.favorite_sport)

baseball


## String formatting

Formatting strings is crucial for effectively naming files or variables programmatically with variables

there are lots of ways to format strings. we will cover a few here. 

let's assume we have some variables that we want to use to print a message. perhaps you are looping through different people and want to know their occupation.

In [24]:
# let's create a few objects of type Person
nick = Person('Nick', 25, 'Neural computation graduate student', 'Python')
linda = Person('Linda', 50, 'Professor', 'Matlab')
matt = Person('Matt', 23, 'Neural computation graduate student', 'Python')

In [25]:
# f-strings (new and very convenient)
print(f"{nick.name}'s occupation is {nick.occupation}")

Nick's occupation is Neural computation graduate student


In [26]:
# .format() notation, somewhat similar to sprintf in matlab. compatible with older versions of Python
print("{}'s favorite programming language is {}".format(linda.name, linda.language))

Linda's favorite programming language is Matlab


In [27]:
# string addition
print("Nick thinks that " + matt.name + " is " + str(matt.age) + " years old")

Nick thinks that Matt is 23 years old


## For-loops and list comprehensions

Everybody probably knows about for loops. They are a great way to iterate through operations (but should be avoided when array operations are possible, to take advantage of multi-core processing, we will get to this). 

In [28]:
# in Python, we can specify a range of values using the range function
for ii in range(10):
    print(ii)

0
1
2
3
4
5
6
7
8
9


Remember that Python is 0-based. We can also specify a non-zero starting point of range. The end point is the 2nd argument to range, and is actually the final number minus 1, such that `range(10)` gives a 10-element range starting at 0. If we want to iterate through the range of even positive integers less than 10 what would we do? 

In [29]:
for ii in range(2,10,2):
    print(ii)

2
4
6
8


The `enumerate` function is a great Python trick for quickly getting indices of a for loop iteration. If we want to know what index we are at for each iteration of the last function, we can do the folllowing:

In [30]:
for ii, num in enumerate(range(2,10,2)):
    print(f'the number at index {ii} is {num}')

the number at index 0 is 2
the number at index 1 is 4
the number at index 2 is 6
the number at index 3 is 8


### List comprehensions
Only Python users are probably familiar with list comprehensions. 

They are a bit quirky at first but most grow to love them for their simplicity and brevity

In [31]:
# let's assume we have a list and want to iterate some operation over it
my_list = [1, 2, 3, 4]
my_doubled_list = [2*a for a in my_list] # yes, there are other ways to do this...to come
print(my_doubled_list)

[2, 4, 6, 8]


# Python for data science: 
The data science "stack" includes some key packages
- `Numpy` - numerical arrays and matrix computation
- `Pandas` - dataframes (similar to those in R)
- `Scipy` - scientific functions, including statistics 
- `Scikit-Learn` - machine learning powerhouse
- `Matplotlib` - powerful plotting library
- `Seaborn` - wrapper for Matplotlib that makes making beautiful and highly informative plots super easy 

## numpy

In [32]:
# many packages have standard shorthand import names, such as np for numpy
import numpy as np

array = np.zeros((100,100)) # 100x100 array of zeros
array[0, :] = 1 # set all elements in the first row to 1
array[2, :] = 1 # set all elements in the 3rd row to 1
array[1,1] = 10 # set the 2nd row, 2nd column element to 10
print(array)

[[ 1.  1.  1. ...  1.  1.  1.]
 [ 0. 10.  0. ...  0.  0.  0.]
 [ 1.  1.  1. ...  1.  1.  1.]
 ...
 [ 0.  0.  0. ...  0.  0.  0.]
 [ 0.  0.  0. ...  0.  0.  0.]
 [ 0.  0.  0. ...  0.  0.  0.]]


In [34]:
# we can do simple things like compute sums, means and standard deviations of rows, columns, or the whole array
mean_rows = np.mean(array, 1) # take the mean over the column dimension to give the mean for each row
sum_cols = np.sum(array, 0) 
std_array = np.std(array) # if no axis is specified, the array is flattened and then the mean/sum/std is taken 
print(mean_rows)
print(sum_cols)
print(std_array)

[1.  0.1 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
 0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]
[ 2. 12.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
0.1719273102215003


## organizing your data with pandas dataframes

## plotting with matplotlib and seaborn

## Scientific functionality with scipy

## Machine learning with scikit-learn 

## Deep learning with torch

# A good code organization for projects

Projects can easily get disorganized if you do not begin with a well-thought-out organization scheme. Taking the time up front to be organized will save you time later when you inevitably have to refactor your project to make it more readable for others, either those who might start working on the project with you, or others interacting with your code in order to e.g. reproduce the results of your paper.

Python code is meant to be organized hierarchically in modules. This will get you pretty far in code organization. Here is a totally made up example of some code that you might organize during this bootcamp. Don't read too much into it, as most of the assignments you will be working on will be simple enough to complete in 1-2 files, and so they won't need a fancy organization. We are trying to help you think ahead to your research. 
```
README.md
upnc/
upnc/__init__.py
upnc/ephys/
upnc/ephys/__init__.py
upnc/ephys/loading.py
upnc/ephys/temporal_processing.py
upnc/behavior/parse_eyelink.py
upnc/behavior/parse_behavior.py
upnc/behavior/__init__.py
scripts/analyze_smith_data.py
scripts/plot_smith_data.py
notebooks/plot_smith_data.ipynb
```
The main themes are: a `README.md` file which explains the code base (markdown renders nicely automatically on GitHub, try this out), a module `upnc` containing sub-modules `ephys` and `behavior`, and a `scripts` directory containing analysis scripts that call functions organized neatly in the modules. Separating scripts and functions generally makes for a more organized code experience. Finally, you might also want to separate notebooks and scripts, where notebooks are more exploratory and focused on plotting, and scripts are more for crunching numbers and saving outputs to disk.  