# Software Carpentry with Python: Part 1

## Intro to Python Concepts

For November 21, 2019. 

Starts with overview of Python and Anaconda as in: [Data Carpentry: Python Ecology Lesson](https://datacarpentry.org/python-ecology-lesson/00-before-we-start/index.html)

## What is Python 

Python is a general purpose programming language that supports rapid development of data analytics applications. The word “Python” is used to refer to both, the programming language and the tool that executes the scripts written in Python language.

Its main advantages are:

* Free
* Open-source
* Available on all major platforms (macOS, Linux, Windows)
* Supported by Python Software Foundation, has large community
* Supports multiple programming paradigms: data analysis, application development, machine learning
* Rich ecosystem of third-party packages

So, why use Python for data analysis?
* As a language, accessible for new members of the community to get up to speed.
* Supports reproducibility: Reproducibility is the ability to obtain the same results using the same dataset(s) and analysis.
** Data analysis written as a Python script can be reproduced on any platform. Moreover, if you collect more or correct existing data, you can quickly and easily re-run your analysis! An increasing number of journals and funding agencies expect analyses to be reproducible, so knowing Python will give you an edge with these requirements.

Versatility: Python can read text files, connect to databases, and many other data formats, on your computer or on the web.

Interdisciplinary and extensible: Python provides a framework that allows anyone to combine approaches from different research (but not only) disciplines to best suit your analysis needs.


## Knowing your way around Anaconda
**Everyone open Anaconda Navigator**
Anaconda distribution of Python includes a lot of its popular packages, such as the IPython console, Jupyter Notebook, and Spyder IDE. Have a quick look around the Anaconda Navigator. You can launch programs from the Navigator or use the command line.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that allow one to easilty create documents that combine code, graphs, and narrative text. Spyder is an Integrated Development Environment that allows one to write Python scripts and interact with the Python software from within a single interface.

Anaconda also comes with a package manager called conda, which makes it easy to install and update additional packages from the command line. 

## Opening up a notebook:
1. Click on the Launch button under Jupyter Notebook (not Jupyter Lab). 
2. Navigate to your Desktop
3. Create a new folder called python-lesson
4. Click into that. 
5. Click the New > Notebook button


## Variables and Assignment 
(from [SWC Plotting and Programming with Python](https://swcarpentry.github.io/python-novice-gapminder/02-variables/index.html))

### Use variables to store values.
* Variables are names for values.
* In Python the = symbol assigns the value on the right to the name on the left.
* The variable is created when a value is assigned to it.


In [1]:
size = 35
building = "Gelman Library"

Variable names
* can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
* cannot start with a digit


## Use print to display values.
* Python has a built-in function called print that prints things as text.
* Call the function (i.e., tell Python to run it) by using its name.

In [2]:
print(size)

35


* Provide values to the function (i.e., the things to print) in parentheses.
* The values passed to the function are called arguments

In [3]:
print("Workshop in", building, "has", size, "people.")

Workshop in Gelman Library has 35 people.


### Variables must be created before they are used.
If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error. (Unlike some languages, which “guess” a default value.)

In [4]:
print(university)

NameError: name 'university' is not defined

### Variables Persist Between Cells
Be aware that it is the order of execution of cells that is important in a Jupyter notebook, **not the order in which they appear**. Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. 

If you define variables lower down the notebook and then (re)run cells further up, those defined lower down will still exist. As an example, create two cells with the following content, in this order:

In [5]:
# will get an error first time 
print(myval)

NameError: name 'myval' is not defined

In [6]:
myval = 1

If you execute this in order, the first cell will give an error. However, if you run the first cell after the second cell it will print out 1. To prevent confusion, it can be helpful to use the Kernel -> Restart & Run All option which clears the interpreter and runs everything from a clean slate going top to bottom.

### Variables can be used in calculations.
* We can use variables in calculations just as if they were values.
* Remember, we assigned the value 35 to size a few lines ago.

In [7]:
size = size + 3
print('Size when three more people attend:', size)

Size when three more people attend: 38


## A few more points about variables: 
* Python is case-sensitive.
* Python thinks that upper- and lower-case letters are different, so Name and name are different variables.
* There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.

### Use an index to get a single character from a string.
* The characters (individual letters, numbers, and so on) in a string are ordered. For example, the string 'AB' is not the same as 'BA'. Because of this ordering, we can treat the string as a list of characters.
* Each position in the string (first, second, etc.) is given a number. This number is called an index.
* Indices are numbered from 0.
* Use the position’s index in square brackets to get the character at that position.

In [11]:
atom_name = 'helium'
print(atom_name[0])

h


### Use a slice to get a substring.
* A part of a string is called a substring. A substring can be as short as a single character.
* An item in a list is called an element. Whenever we treat a string as if it were a list, the string’s elements are its individual characters.
* A slice is a part of a string (or, more generally, any list-like thing).
* We take a slice by using `[start:stop]`, where start is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
* The difference between stop and start is the slice’s length.
* Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

In [9]:
print(atom_name[0:3])

hel


In [10]:
print(len('helium'))

6


### Exercise: 
What is the final value of position in the program below? (Try to predict the value without typing in and running the program, then check your prediction.)

In [None]:
initial = 'left'
position = initial
initial = 'right'

## Types and Conversion

### Every value has a type.
* Every value in a program has a specific type.
* Integer (int): represents positive or negative whole numbers like 3 or -512.
* Floating point number (float): represents real numbers like 3.14159 or -2.5.
* Character string (usually called “string”, str): text.
* Written in either single quotes or double quotes (as long as they match).
* The quote marks aren’t printed when the string is displayed.

### Use the built-in function type to find the type of a value.
* Use the built-in function type to find out what type a value has.
* Works on variables as well.
* But remember: the value has the type — the variable is just a label.

In [12]:
print(type(52))

<class 'int'>


In [13]:
title = "Programming with Python"
print(type(title))

<class 'str'>


### Types control what operations (or methods) can be performed on a given value.
A value’s type determines what the program can do to it.


In [14]:
print(5 - 3)

2


In [15]:
print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

* You can use the “+” and “*” operators on strings.
* “Adding” character strings concatenates them.

In [16]:
full_name = 'Laura' + ' ' + 'Wrubel'
print(full_name)

Laura Wrubel


Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
Since multiplication is repeated addition.

In [18]:
pattern = "ABC"
print(pattern * 3)

ABCABCABC


* Strings have a length (but numbers don’t).
* The built-in function len counts the number of characters in a string.
* But numbers don’t have a length (not even zero).

In [19]:
print(len(full_name))

12


In [20]:
print(len(size))

TypeError: object of type 'int' has no len()

* Must convert numbers to strings or vice versa when operating on them.
* Cannot add numbers and strings.

**Why is this important? You may have data you import and you need to change it from a string to a number or vice versa.**

Must convert numbers to strings or vice versa when operating on them.

In [22]:
print(size + '2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [23]:
print(size + int('2'))
print(str(size) + '2')

40
382


Integers and floating-point numbers can be mixed in arithmetic.
Python automatically converts integers to floats as needed. 


**Variables only change value when something is assigned to them.**
* If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
* This does not happen in programming languages.

In [24]:
first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)

first is 2 and second is 5


Note that "second" did not change when we changed first. 

### Exercise:
What type of value is 3.4? How can you find out?

In [25]:
type(3.4)

float

## Built-in functions

We have seen some functions already — print, len, type-- now let’s take a closer look.
* Must always use parentheses, even if they’re empty, so that Python knows a function is being called.
* A function may take zero or more arguments.
* An argument is a value passed into a function.
 * len takes exactly one.
 * int, str, and float create a new value from an existing one.
 * print takes zero or more. (print with no arguments prints a blank line.)

Some other common built-in functions include `max`, `min`, and `round`.
* Use max to find the largest value of one or more values.
* Use min to find the smallest.
* Both work on character strings as well as numbers. “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.

In [27]:
print(max(1, 2, 3))
print(min('a', 'A', '0'))

3
A


* Use the built-in function `help()` to get help for a function.
* Every built-in function has online documentation.

In [28]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



The Jupyter Notebook has two ways to get help:
1. Place the cursor anywhere in the function invocation (i.e., the function name or its parameters), hold down **shift, and press tab.**
2. type a function name with a question mark after it.

In [29]:
round?

[0;31mSignature:[0m [0mround[0m[0;34m([0m[0mnumber[0m[0;34m,[0m [0mndigits[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Round a number to a given precision in decimal digits.

The return value is an integer if ndigits is omitted or None.  Otherwise
the return value has the same type as the number.  ndigits may be negative.
[0;31mType:[0m      builtin_function_or_method


In [None]:
round()

## Libraries
* Most of the power of a programming language is in its libraries.
* A library is a collection of files (called modules) that contains functions for use by other programs.
* May also contain data values (e.g., numerical constants) and other things.
* The Python standard library is an extensive suite of modules that comes with Python itself.
* Many additional libraries are available from PyPI (the Python Package Index).


In [36]:
import math
#math is from the standard library

print('pi is', math.pi)

pi is 3.141592653589793


In [35]:
help(math)

## Lists

Doing calculations with a hundred variables created one-by-one called temp_001, temp_002, etc., would be at least as slow as doing them by hand.

Use a list to store many values together. It's a collection of items. 

Contained within square brackets [...]. Values separated by commas ,.


In [2]:
temps = [45, 55, 68, 61, 52]
print('temperatures:', temps)

temperatures: [45, 55, 68, 61, 52]


In [3]:
len(temps)

5

Use an item’s index to fetch it from a list. This is similar to what we did with strings earlier. 

In [7]:
sat = temps[0]
print(sat)

45


In [8]:
sun = temps[1]

We can overwrite/change a value by assigning a new value to its index / position in the list. So, to correct the temperature from Saturday, the first value in the list: 

In [10]:
temps[0] = 35
print(temps)

[35, 55, 68, 61, 52]


Appending items to a list lengthens it. Use `list_name.append` to add items, one at a time, to the end of a list. Append will only take a single thing as an argument. 

In [12]:
temps.append(56)
print(temps)

[35, 55, 68, 61, 52, 56]


append() is a method of a list object. A method is like a function, but is part of a particular type of object (string methods, list methods).

Deliberately resembles the way we refer to things in a library. Use `help(list)` for a preview of the many things you can do to a list. 
extend is similar to append, but it allows you to combine two lists. For example:


In [14]:
#help(list)

Can see that there is an extend() method described there. Explains that we can merge two lists together. 

In [15]:
temps.append([90, 91, 92])
print(temps)

[35, 55, 68, 61, 52, 56, [90, 91, 92]]


In [16]:
#get rid of that using pop()
temps.pop()

[90, 91, 92]

In [17]:
print(temps)

[35, 55, 68, 61, 52, 56]


In [18]:
next_week = [75, 78, 72]
temps.extend(next_week)

### Slicing lists

Remember how we sliced strings at the beginning of the lesson, to get at particular items? We can do something similar with lists, again using square brackets. 

Let's take a look at what temps is like at this point:

In [19]:
print(temps)

[35, 55, 68, 61, 52, 56, 75, 78, 72]


In [20]:
len(temps)

9

If I want the first five elements in that list, I need to slice, providing the first position and the position just past the end of the slice I want. The last number I include is never part of the slice. 

So, to get the first 5 elements in the list (let's say that's the last week):

In [21]:
last_week = temps[:4]
print(last_week)

[35, 55, 68, 61]


I haven't changed the temps list, it's still the same. 

#### Exercise:
Given this list:

`months = ["Aug", "Sep", "Oct", "Nov", "Dec"]`

Create a slice that contains only: Nov, Dec.
Assign the result to a variable called `winter` and print the variable

In [23]:
months = ["Aug", "Sep", "Oct", "Nov", "Dec"]
winter = months[3:]
print(winter)

['Nov', 'Dec']


**Bonus:** add the months Jan and Feb to `winter` and check the length of the list. 

In [24]:
winter.extend(["Jan", "Feb"])
print(len(winter))

4


**Note:** So far, our lists have been made up of items of all of the same type. These are all strings. However, lists can contain integers, floats, other lists and formats. And they don't have to be the same type, either. 

Tomorrow, we'll come back to more complicated things you can do with lists, such as using them in loops. But next we're going to move on to some data analysis with pandas.