# B - First steps in python

## Jupyter notebooks

___jupyter notebook___ is a tool for developing and presenting  projects involving programming in ___python___, but also in ___julia___ or ___R___.

Indeed, the name ___jupyter___ makes reference to those 3 supported programming languages.

Two characteristics make notebooks very attractive:

* For programming, the possibility to run smaller pieces of code and test them interactively, instead of having a single, big program.
* For presentation or sharing, the rich formatting possibilities as well as the option to export the complete project as PDF or HTML file to share.

When in interactive mode, the commands are interpreted and executed immediately, one after the other. For this, we need an interpreter. Python has it's own interpreter. Other are the IPython console and the jupyter notebooks.

### Interface

The anaconda bundle includes the jupyter notebook already.

It can be launched either from the anaconda navigator interface, or directly typing _jupyter notebook_ in the start menu in windows:

In [None]:
pwd

<img src='../../misc/img/jupyter_win_menu.png' width='300'>

When jupyter is launched, it starts a server that runs in a web browser. The operation and editing of the notebooks will then be done in the web browser:

<img src='../../misc/img/delete_notebook.png' width='1000'>

The main screen shows a file and directory tree. From here, it is possible to navigate folders and open files. While it is possible to open text files and images (among others) in the browser, the main use of this dashboard panel is to create, rename and delete, to start and stop _notebooks_. In order to perform these actions, it is necessary to check the box to the left of the file(s) and then look for the desired action on top of the screen. By the way, it is also possible to create and delete folders from this screen.

These notebooks are the documents where we will include code as well as output and formatted text and images.

### Cells

Notebooks are built by ___cells___ which are the units of code that will be executed independently, although it is possible to run them all one after the other.

___Cells___ can be of three types:
* Code
* Markdown
* Raw text

Basically, the three types of cells are used for:
* Code → Writing pieces of program, that will be executed in python3
* Markdown → Rich format, explanations, text, titles, images, equations
* Raw text → Text without any format, in a raw form, that won't be interpreted as format or code

The type can be changed using the cell menu:

<img src='../../misc/img/cells1.png' width='700'>

or with the toolbar:

<img src='../../misc/img/cells2.png' width='700'>

or using keyboard shortcuts in ___command mode___:
* <kbd>y</kbd> → Make this cell ___code___
* <kbd>m</kbd> → Make this cell ___markdown format___
* <kbd>r</kbd> → Make this cell ___raw input___

### Coding

Each cell in a jupyter notebook can run python code, provided that the python kernel is installed and running correctly, and that the cell is configured as `code`.

By default, all cells are first configured as `code`.

A cell can be run either using a combination of keys, or using the buttons in the toolbar:

* <kbd>CTRL</kbd>+<kbd>ENTER</kbd> → to run the code in the current cell
* <kbd>Shift</kbd>+<kbd>ENTER</kbd> → to run the code in the current cell and select next cell below
* <kbd>ALT</kbd>+<kbd>ENTER</kbd> → to run the code in the current cell and insert an empty new under it

<img src='../../misc/img/run_buttons.png' width='300'>

One interesting feature is that we can run a cell *after*, and then one *before*, and the values of variables or imported libraries is preserved.

In [1]:
print( a )

NameError: name 'a' is not defined

In [None]:
a = 2

While running cells without order can be useful sometimes, it leads to untidy and messy code.

It is a good practice to structure the code to be run from beginning to end.

To run the complete notebook, we can use the menus:
* `Cell` → `Run All`
* `Kernel` → `Restart & Run All`

## Python

### _import_ and _print_ statements

___print___ is a function that takes any number of arguments (separated by commas), and prints them on the screen.

___print___ is one of the major differences between ___python 2___ and ___python 3___: in ___python 2___ it is a statement instead of a function.

In [None]:
print( 'Hello world!' ) # <--- string

In [None]:
print( 5 ) # <--- number
print( 5.4 ) #  <--- number

In [None]:
print( 5, 5.4 )

In [None]:
print( 'Addition:',  5 + 5.4 ) # <--- combined types

In [None]:
print( '-'*50 )
print( 'Addition:',  5 + 5.4 ) # <--- combined types
print( '-'*50 )

In [None]:
print( '-'*50 )
print( 'pi:',  np.pi ) # <--- 3.1416
print( '-'*c50 )

___import___ calls an installed library, or a part of it, and can be used in several forms:

In [None]:
# Imports the complete numpy library
import numpy

In [None]:
numpy.pi

In [None]:
# Imports the complete numpy library with a pseudonym
import numpy as np

In [None]:
np.pi

In [None]:
# Imports only a part of the numpy library
from numpy import pi

In [None]:
pi

In [None]:
print( '-'*50 )
print( 'pi:',  np.pi ) # <--- 3.1416
print( '-'*50 )

### Data structures: _list_ & _dictionary_

#### Variables 

Variables can be defined on the run, they don't need to be declared previously (as in other programming languages).

In [None]:
a = 3
print( a )
print( a+1 )
b = 5
print( a+b )
c = 'foo' + 'bar'
print( c )

Even though the variables are not declared explicitly, they still have a type:

In [None]:
print( type( a ) )
print( type( b ) )
print( type( c ) )

print( a+c )

We can check the type of variables with the function ___type()___ or with logical comparisons:

In [None]:
print( type( a ) == int )
print( type( b ) == int )
print( type( c ) == int )

#### Lists

Lists are a type of variable, similar to _arrays_ and _vectors_ in other programming languages.

In Python, they deserve particular attention. They are widely used due to their flexibility and a number of functions and routines that encourage their usage. They can make programs very efficient and provide neat solutions to otherwise complex problems.

Lists are created either by using square brackets:

In [None]:
ll1 = [ 1, 2, 3, 4 ]
print( ll1, type(ll1) )

Or by using the function *list*:

In [None]:
ll2 = list( [ 1, 2, 3, 4 ] )
print( ll2, type(ll2) )

In [None]:
ll3 = list( ( 1, 2, 3, 4 ) )
print( ll3, type(ll3) )

In all three cases, the result is the same 

In [None]:
print( ll1 == ll2 == ll3 )

(don't bother with the parenthesis in ll3 for now... If you *do* want to bother, they define a *generator*, look it up!)

Lists can contain variables, not only explicit values:

In [None]:
ll = [ a, b, c, [ 1,2,3 ] ]

print( ll )

Notice that the type of the variables in a list doesn't need to be the same!!

We access particular elements in a list by _indexing_ using square brackets:

In [None]:
print( ll )

In [None]:
print( ll[1] )

Notice that python uses 0-indexing in lists!

To index a range inside a list, we use a semicolon:

In [None]:
print( ll )

In [None]:
print( ll[1:3] )

In [None]:
print( ll[:2] )

Notice that the last element (here 'foobar') is *not* included in the returned range!

We can add elements to a list, either at the beginning, at the end, or at some point in between.

Please look how to do it!

##### '+' operator

We can concatenate lists using the ___'+'___ operator. It returns a list made with the operators:

In [None]:
print( ll )

In [None]:
print( ll[:1] + [ '000' ] + ll[1:] )

In [None]:
print( [ '000' ] + ll + [ 'xxx', 'yyy' ] )

In [None]:
print( ll )

Two things to notice:
* The elements to concatenate ___must___ be all lists, therefore the square brackets in `['000']`
* The original list is ___not___ modified with the results, for that, we need to assign the result to the list again:

In [None]:
print( ll )
ll = [ '000' ] + ll
print( ll )
ll = [ '000' ] + ll + [ 'xxx', 'yyy' ]
print( ll )

We can also use the *methods* of the *class* *list*, which we can access with a point '.' :

In [None]:
print( ll )

In [None]:
ll.append( 'xxx' )
print( ll )

Ohter useful methods are *insert*, *pop*, *remove*. In many consoles you can type a '.' and hit <TAB> to get a list of available methods.

In [None]:
ll.

In most interactive consoles, you can look the documentation of objects using a question mark '?' :

In [None]:
ll.append?

Some functions and commands that are useful when working with lists include:
* ___len()___
* ___sorted()___
* ___set()___
* ___reversed()___
* ___in___

#### Dictionaries

We can think of dictionaries as an enhanced type of lists, where the indexing is not necessarily numbers, but arbitrary labels.

In terms of syntax, the main difference is the use of curly brackets. Also that the keys (indexes) must be specified for each element.

In [None]:
ll = [ 'apple', 'pear', 'quince' ]
print( ll[2] )

In [None]:
dd = { 'a': 'apple', 'b': 'pear', 'c':'quince' }
print( dd['c'] )

A more realistic example can look like this:

In [None]:
coord = { 'Chapingo': { 'lat':19.483, 'lon':-98.883 },
          'Berlin':   { 'lat':52.516, 'lon':13.388 } }

In [None]:
coord[ 'Chapingo' ][ 'lat' ]

### Control flow: _if_ & _for_

We will see only the ___if___ conditional and the ___for___ loop statements, because they are the most widely used.

Both can be used to control the flow of the program execution in different lines, or inside lists. 

We will cover both uses for both statements.

#### *if* conditionals

##### First case: Change the program execution line according to a logical condition.

In [None]:
a = 3
if a<5:
    print( 'a is less than 5!' )
else:
    print( 'a is not less than 5!' )

Notice:
* The indentation blocks!!! → Very... VERY important, since python defines the execution blocks by having always the same indentation. Other programs use for example curly brackets ({}) or keywords (END) to demarcate the blocks. Python uses the indentation, which is both a common source of errors (because we used 3 spaces instead of 4 in some line) and a source of beauty (because it forces us to write very ordered, good looking code)
* The ___else___ statement and the corresponding indentated block
* The colon ___':'___ before an execution block

It is possible to define more than two possible ways to execute the program by using _elif_:

In [None]:
a = 5
b = 2

if a<5:
    print( 'a is less than 5!' )
elif a==5:
    print( 'a is exactly 5!' )
    print( '****' )
    if b<3:
        print( 'yes, b is less than 3' )
else:
    print( 'a is more than 5!' )

The ___if___ block will be executed if the the first condition is true. 

Each (there can be more than one) ___elif___ block will be executed if corresponding condition is true.

The (only) ___else___ block will be executed only of all other conditions were not met.

Lastly, it is possible to ask for complex conditions using the logical operators ___and___, ___or___ and ___not__:

In [None]:
cost = 5.2
place = 'Chapingo'
day = 'Sunday'

if cost>0 and place=='Chapingo' and not (day=='Saturday' or day=='Sunday'):
    print( 'We are in ', place, ' on a nice ', day )
else:
    print( '***' )

A last example, using the ___in___ operator and lists (as well as ___index___)...

(information taken from https://strawberryplants.org/strawberry-varieties/)

The following are four strawberry (*Fragaria x ananassa*) varieties:

In [None]:
varieties = [ 'Valley Sunset', 'Kent', 'Benicia', 'Mojave' ]

and the corresponding cultivation (production) season:

In [None]:
seasons = [ 'Very Late Season', 'Midseason', 'Short-day June-bearing', 'Short-day June-bearing' ]

In [None]:
print( varieties )
print( seasons )

In [None]:
season_to_search = 'Midseason'

if season_to_search in seasons:
    ii = seasons.index( 'Midseason' )
    print( ii )
    print( '\n' ) # print empty line
    print( varieties[ii],' is a ', season_to_search, ' variety' )

That is a very simple way of indexing lists, we will learn about more sophisticated strategies in later sessions. 

However, the use of conditionals and the overall logic remains the same.

##### Second case: Building lists according to a logical condition.

→ Building lists on logical conditions is better understood in loops, see below in the ___for___ section ahead.

#### *for*

##### First case: Change the program execution line according to a logical condition.

___for___ loops repeat the indented block according to the logical condition. 

The following code: repeats the call to ___print()___ 4 times:

In [None]:
for i in range(4):
    print( '*' )
    print( 'x' )
    print( '\n' )

But the call doesn't need to be _exactly_ the same:

In [None]:
for i in range(4):
    print( i, '*' )

The ___range()___ function allows us to generate a list of integer numbers over which we can iterate with ___for___.

It takes a start, end and step parameters. Some examples on its usage:

In [None]:
for i in range( 1, 10, 2 ):
    print( i )

In [None]:
for i in range( 0, 10, 2 ):
    print( i )

In [None]:
for i in range( 10, 0, -2 ):
    print( i )

In [None]:
for i in range( 10, 0, -2.5 ):
    print( i )

In [None]:
range?

Another very useful way to use the ___for___ loops is to iterate _over a list_, which is to say: for each element present in the list, do what the indentated block says:

In [None]:
print( varieties )

In [None]:
for v in varieties:
    print( '*', v, '*' )

If we need the position of each variety (index), we can use ___enumerate()___:

In [None]:
for i, v in enumerate(varieties):
    print( '*', i, ': ', v, '\t\t\t-', seasons[i] )

If we need to nest loops, we need to take care of the indentation: The inner loop needs a second level of indentation.

In [None]:
for i in range( 2, 5 ):
    for j in range( 10, 6, -1 ):
        print( 'i=', i, '\t j=', j )
    print( '_'*15 )
print( '-end-' )

##### Second case: Building lists using if and for.

We use this technique, called _list comprehension_ to create a list starting from another, existing one. 

Two applications are: 
* to apply a function to each element in a list
* to select a subset of a previous list according with a logical condition    

Apply a function to each element in an existing list...

Given a list of strings, change the case of each element:

In [None]:
numbers1 = [ 1.66666, 3.2, 5.89, -1.11, 100000.1 ]

numbers2 = [ round(item,2) for item in numbers1 ]

print( numbers1 )
print( numbers2 )
print( '\n' )

for i in range( len( numbers1 ) ): # using len() we can add or remove items without bothering about changing the loop
    print( numbers1[i], ' → ', numbers2[i] )


The second application is to select certain elements according to a condition. 

In the last numeric example, we can select only the positive numbers (and still round) with:

In [None]:
numbers1 = [ 1.66666, 3.2, 5.89, -1.11, 100000.1 ]

numbers2 = [ round(item,2) for item in numbers1 if item>0 ]

# using len() we can add or remove items without bothering about changing the loop
for i in range( len( numbers2 ) ): 
    print( numbers2[i] )

Or we can select numbers in a range:

In [None]:
numbers1 = [ 1.66666, 3.2, 5.89, -1.11, 100000.1 ]

numbers2 = [ round(item,2) for item in numbers1 if ( item>0 and item<1000 ) ]

# using len() we can add or remove items without bothering about changing the loop
for i in range( len( numbers2 ) ): 
    print( numbers2[i] )

## Exercise

Let's suppose that we want to take soil samples and check their pH and N-content. Let's say we have 5 sampling points: 'east', 'west', 'north', 'south' and 'central'. In each point, we have 2 sampling depths: '10cm' and '30cm'. Lastly, we do it three times, monthly from April to June.

Create a list for each factor in this hypothetical problem. Use nested ___for___ loops to print all the combinations of measurements that will be needed, in a table similar fashion to that shown before.

In [None]:
samples = [ 'pH', 'N' ]
points = [ 'east', 'west', 'north', 'south', 'central' ]
depths = [ '10cm', '30cm' ]
months = [ 'April', 'May', 'June' ]

for s in samples:
    for p in points:
        for d in depths:
            for m in months:
                print( '| ' + s + ' | ' + p + ' | ' + d + ' | ' + m )

Modify the previous piece of code to print a __(!)__ mark after each measurement in the central sampling point.

In [None]:
samples = [ 'pH', 'N' ]
points = [ 'east', 'west', 'north', 'south', 'central' ]
depths = [ '10cm', '30cm' ]
months = [ 'April', 'May', 'June' ]

for s in samples:
    for p in points:
        for d in depths:
            for m in months:
                if p=='central':
                    print( '| ' + s + ' | ' + p + ' | ' + d + ' | ' + m + ' !' )
                else:
                    print( '| ' + s + ' | ' + p + ' | ' + d + ' | ' + m )
                

Print th __(!)__ mark only after each measurement in the central sampling point that is to be taken at 30cm.

In [None]:
samples = [ 'pH', 'N' ]
points = [ 'east', 'west', 'north', 'south', 'central' ]
depths = [ '10cm', '30cm' ]
months = [ 'April', 'May', 'June' ]

for s in samples:
    for p in points:
        for d in depths:
            for m in months:
                if p=='central' and d=='30cm':
                    print( '| ' + s + ' | ' + p + ' | ' + d + ' | ' + m + ' !' )
                else:
                    print( '| ' + s + ' | ' + p + ' | ' + d + ' | ' + m )
                