# Jupyter ~Python~ for APS Students

Python is more than Jupyter, is a whole language!


Python Tutorial! https://docs.python.org/3/tutorial/ \
Jupyter Notebook: https://jupyter-notebook.readthedocs.io/en/stable/ \
Jupyter's documentation: https://docs.jupyter.org/en/latest/ \
Markdown's cheat sheet: https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf

**Troubleshooting**

* Google it!
* Use Stackoverflow.
* Use the documentation.

**Some shortcuts in Jupyter:**
- Markdown mode: Esc + M
- Code mode: Esc + Y
- Insert cell above: A
- Insert cell below: B
- Run current cell: Ctrl + Enter
- Delete current cell: D + D


- Comment: Ctrl + /

## Jupyter Vs. Spyder or PyCharm?

- Depends on you. Notebooks allow you to make reports ready for presentation. More transferable than Spyder/PyCharm
- There are extensions that allow to list current variables and other functions from Spyder
- Not so easy to run single line on Jupyter
- My answer: Jupyter Lab. Includes single line execution

## How about Colab? (Google's version of Jupyter)
- Can open Jupyter Notebooks (.ipynb) files with Colab without need for installing Python or main packages.
- May need to install custom or specific packages using `pip install`
- It may require your consent to access files in your Drive
- X number of hours a month, otherwise you have to pay
- You can decide if you want to use GPU or CPU
- 

# Packages
Mainly `!pip install mypackage` from your Jupyter Notebook, *or* `conda install -c conda-forge mypackage`.
Conda seems to work better most times, but sometimes you have to try both. **Just Google it!**

## Some of the main packages:

| Package | Use | Source |
| :-- | :-- | :-- |
| `numpy` | Fundamental package for array computing | https://numpy.org/ | 
| `pandas` | Data analysis and manipulation | https://pandas.pydata.org/ |
| `matplotlib` | Static, animated, and interactive visualizations | https://matplotlib.org/ |
| `scikit-learn` | Robust for data analysis and machine learning | https://scikit-learn.org/stable/ |
| `tensorflow` | Implementing neural networks | https://www.tensorflow.org/ |
| `scipy` | Scientific and mathematical funtions derived from `NumPy` | https://scipy.org/ |
| `statsmodels` |Hardcore statistics | https://www.statsmodels.org/stable/index.html |
| `seaborn` | Data analysis and manipulation | https://seaborn.pydata.org/ |
| `scikit-image` | Algorithms for image processing | https://scikit-image.org/ |
| `opencv` | Computer vision, machine learning, and image processing | https://pypi.org/project/opencv-python/ |




# Markdown

In [None]:
# This is a comment in a code mode cell

In [None]:
This is an error

simple text

## This is a subtitle

Some **bold text**
Some _italic text_

Some text with `code` in it

An equation using *LaTeX*:

$y = mx + b$

Another equation:

$$P = G + E + GxE$$

First table:

| Row 0 Column 0 | Row 0 Column 1 | Row 0 Column 2 |
| :-: | :-: | :-: |
| 10 | 11 | 12 |
| 20 | 21 | 22 |
| 30 | 31 | 32 |


Second table:

|Left aligned column | Centered column | Right aligned column|
|:--|:-:|--:|
|**Bold data**|*Italic data*| Normal data|
| 20 | 21 | 22 |

Third table?:

| Column 0 | Column 1 | Column 2 |
| :-:  :-: | :-: |
| 00 | 01 | 02 |

# Python as a Calculator
Numbers, Strings, Lists

## Numbers

In [None]:
2 + 2

In [None]:
50 - 5*6

In [None]:
(50 - 5*6) / 4

In [None]:
8 / 5  # division always returns a floating point number

The integer numbers (e.g. 2, 4, 20) have type `int`, the ones with a fractional part (e.g. 5.0, 1.6) have type `float`.

In [None]:
5 ** 2  # 5 squared

In [None]:
2 ** 7  # 2 to the power of 7

The equal sign (=) is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:

In [None]:
width = 20
height = 5 * 9
width * height

If a variable is not “defined” (assigned a value), trying to use it will give you an error:



In [None]:
n  # try to access an undefined variable

## Strings

Strings can be enclosed in single quotes ('...') or double quotes ("..."). 

\ can be used to escape quotes:

In [None]:
'spam eggs'  # single quotes

In [None]:
'doesn't'

In [None]:
'doesn\'t'  # use \' to escape the single quote...

In [None]:
"doesn't"  # ...or use double quotes instead

In [None]:
'"Yes," they said.'

In [None]:
"\"Yes,\" they said."

In [None]:
'"Isn\'t," they said.'

In [None]:
'doesn't'

Strings can be concatenated (glued together) with the + operator, and repeated with *:

In [None]:
# 3 times 'un', followed by 'ium'
3 * 'un' + 'ium'

Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [None]:
word = 'Python'

In [None]:
word[0]  # character in position 0, (1st element)

In [None]:
word[5]  # character in position 5 (6th element)

Indices may also be negative numbers, to start counting from the right:

In [None]:
word[-1]  # last character

In [None]:
word[-2]  # second-last character

In [None]:
word[-6]

Note that since -0 is the same as 0, negative indices start from -1.

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain substring:

In [None]:
word[0:2]  # characters from position 0 (included) to 2 (excluded)

In [None]:
word[2:5]  # characters from position 2 (included) to 5 (excluded)

In [None]:
word[:2]   # character from the beginning to position 2 (excluded)

In [None]:
word[4:]   # characters from position 4 (included) to the end

In [None]:
word[-2:]  # characters from the second-last (included) to the end

Note how the start is always included, and the end always excluded. This makes sure that s[:i] + s[i:] is always equal to s:

In [None]:
word[:2] + word[2:]

In [None]:
word[:4] + word[4:]

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

```python
 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1
```

The first row of numbers gives the position of the indices 0…6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

## Lists
Python knows a number of compound data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

In [None]:
squares = [1, 4, 9, 16, 25]
squares

Lists can also be sliced

In [None]:
squares[0]  # indexing returns the item

In [None]:
squares[-1]

In [None]:
squares[-3:]  # slicing returns a new list

In [None]:
# Concatenation
squares + [36, 49, 64, 81, 100]

Unlike strings, which are immutable, lists are a mutable type, i.e. it is possible to change their content:

In [None]:
word[2] = 'g'

In [None]:
cubes = [1, 8, 27, 65, 125]  # something's wrong here
4 ** 3  # the cube of 4 is 64, not 65!

In [None]:
cubes[3] = 64  # replace the wrong value
cubes

You can also add new items at the end of the list, by using the append() method

In [None]:
cubes.append(216)  # add the cube of 6
cubes

In [None]:
cubes.append(7 ** 3)  # and the cube of 7
cubes

Assignment to slices is also possible, and this can even change the size of the list or clear it entirely:

In [None]:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters

# replace some values
letters[2:5] = ['C', 'D', 'E']
letters

# now remove them
letters[2:5] = []
letters

# clear the list by replacing all the elements with an empty list
letters[:] = []
letters

It is possible to nest lists (create lists containing other lists), for example:



In [None]:
a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n]
x

In [None]:
x[0]

In [None]:
x[0][1]

# First Steps Towards Programming

In [None]:
# Fibonacci series:
# the sum of two elements defines the next
a, b = 0, 1
while a < 10:
    print(a)
    a, b = b, a+b

- The first line contains a multiple assignment.
- The while loop executes as long as the condition (here: a < 10) remains true.
- The body of the loop is indented: indentation is Python’s way of grouping statements.
- The print() function writes the value of the argument(s) it is given.

The keyword argument end can be used to avoid the newline after the output, or end the output with a different string:

In [None]:
a, b = 0, 1
while a < 1000:
    print(a, end=',')
    a, b = b, a+b

Avoid infinite loops!

## What's wrong here?

In [None]:
a, b = 0, 1
while a < 10
    print(a)
    a, b = b, a+b

In [None]:
a = 0
b = 1
while a < 10:
    print(a)
     a, b = b, a+b

## Control Flow Tools

### `if` Statements

In [None]:
x = int(input("Please enter an integer: "))

if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')

### `for` Statements

In [None]:
# Measure some strings:
words = ['cat', 'window', 'defenestrate']
for w in words:
    print(w, len(w))

### The `range()` Function
If you do need to iterate over a sequence of numbers, the built-in function range() comes in handy. It generates arithmetic progressions:

In [None]:
for i in range(5):
    print(i)

In [None]:
list(range(5, 10))

In [None]:
list(range(0, 10, 3))

In [None]:
list(range(-10, -100, -30))

In [None]:
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
    print(i, a[i])

In [None]:
range(10)

In many ways the object returned by range() behaves as if it is a list, but in fact it isn’t. It is an object which returns the successive items of the desired sequence when you iterate over it, but it doesn’t really make the list, thus saving space.

In [None]:
sum(range(4))  # 0 + 1 + 2 + 3

# Defining Functions

In [None]:
def fib(n):    # write Fibonacci series up to n
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

# Now call the function we just defined:
fib(2000)

In [None]:
fib

In [None]:
f = fib
f(100)

In [None]:
def fib2(n):  # return Fibonacci series up to n
    """Return a list containing the Fibonacci series up to n."""
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)    # see below
        a, b = b, a+b
    return result
                # write the result

In [None]:
f100 = fib2(100)    # call it
f100

Another tutorial for blob (or seed) detection

# Data Wrangling with Pandas
Source: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

Pandas cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

In [None]:
# Import packages
import numpy as np
import pandas as pd

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [None]:
dates = pd.date_range("20130101", periods=6)
dates

In [None]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:

In [None]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)

df2

In [None]:
df2.dtypes

In [None]:
# Use tab autocomplete
df2.duplicated

In [None]:
df2.<TAB>  # noqa: E225, E999

## Viewing data


In [None]:
df.head()

In [None]:
df.tail(3)

In [None]:
df.index

In [None]:
df.columns

In [None]:
df.to_numpy()

In [None]:
df2.to_numpy()

In [None]:
df.describe()

In [None]:
df.T

In [None]:
df.sort_values(by="B")

## Getting

In [None]:
df["A"]

In [None]:
df[0:3]

In [None]:
df["20130102":"20130104"]

### Selection by label

In [None]:
df.loc[dates[0]]

In [None]:
df.loc[:, ["A", "B"]]

In [None]:
df.loc["20130102":"20130104", ["A", "B"]]

In [None]:
df.loc["20130102", ["A", "B"]]

In [None]:
df.loc[dates[0], "A"]

### Selection by position

In [None]:
df.iloc[3]

In [None]:
df.iloc[3:5, 0:2]

In [None]:
df.iloc[[1, 2, 4], [0, 2]]

In [None]:
df.iloc[1:3, :]

In [None]:
df.iloc[:, 1:3]

In [None]:
df.iloc[1, 1]

### Boolean indexing

In [None]:
df[df["A"] > 0]

In [None]:
df[df > 0]

Using the `isin()` method for filtering:

In [None]:
df2 = df.copy()

In [None]:
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2

In [None]:
df2[df2["E"].isin(["two", "four"])]

### Missing data

In [None]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1

In [None]:
df1.loc[dates[0] : dates[1], "E"] = 1
df1

In [None]:
df1.dropna(how="any")

In [None]:
df1.fillna(value=5)

In [None]:
pd.isna(df1)

### Operations

In [None]:
df.mean()

In [None]:
df.mean(1)

## Merge

### Concat

In [None]:
df = pd.DataFrame(np.random.randn(10, 4))
df

In [None]:
pieces = [df[:3], df[3:7], df[7:]]
pieces

In [None]:
pd.concat(pieces)

### Join

In [None]:
left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})
left

In [None]:
right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
right

In [None]:
pd.merge(left, right, on="key")

In [None]:
left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
left

In [None]:
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
right

In [None]:
pd.merge(left, right, on="key")

## Grouping

In [None]:
df = pd.DataFrame(
    {
        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
        "C": np.random.randn(8),
        "D": np.random.randn(8),
    }
)

df

In [None]:
df.groupby("A").sum()

In [None]:
df.groupby(["A", "B"]).sum()

## Plotting

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.close("all")

In [None]:
ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts

In [None]:
ts = ts.cumsum()
ts

In [None]:
ts.plot();

In [None]:
plt.show();

In [None]:
df = pd.DataFrame(
    np.random.randn(1000, 4), index=ts.index, columns=["A", "B", "C", "D"]
)
df

In [None]:
df = df.cumsum()
df

In [None]:
plt.figure();
df.plot();
plt.legend(loc='best');

## Getting data in/out

Check current directory

In [None]:
pwd

Write csv

In [None]:
df.to_csv("foo.csv")

Read csv

In [None]:
pd.read_csv("foo.csv")

Write Excel file

In [None]:
df.to_excel("foo.xlsx", sheet_name="Sheet1")

Read Excel file

In [None]:
pd.read_excel("foo.xlsx", "Sheet1", index_col=None, na_values=["NA"])

## More Pandas

In [None]:
df = pd.read_csv("foo.csv")
df

In [None]:
# Genotypes = ['G01'] * 3 + ['G02'] * 3 + ['G03'] * 3 + ['G04'] * 3
# Genotypes

In [None]:
Genotypes = ['G01'] * 250 + ['G02'] * 250 + ['G03'] * 250 + ['G04'] * 250
# Genotypes

In [None]:
df['Genotypes'] = Genotypes

In [None]:
df

In [None]:
df = df.rename(columns={"A": "Env01", "B": "Env02", "C": "Env03", "D": "Env04"}, errors="raise")

In [None]:
df = df.drop(['Unnamed: 0'], axis=1)

In [None]:
df.columns

In [None]:
df

In [None]:
df2 = pd.melt(df, id_vars=['Genotypes'], 
                value_vars=['Env01','Env02','Env03','Env04'],
                var_name='Environment', value_name='Phenotype')

In [None]:
df2

In [None]:
# Change a specific category
df2['Environment'] = np.where(df2['Environment']=='Env03','Greenhouse', df2['Environment'])

### Missing `ggplot2`???

In [None]:
from plotnine import *

In [None]:
ggplot(df2, aes(x='Environment', y='Phenotype', 
               fill='Genotypes')) + geom_boxplot()  

In [None]:
ggplot(df2, aes(x='Genotypes', y='Phenotype', 
               fill='Environment')) + geom_boxplot()  

# Image Processing in Python


Images are commonly used as NumPy arrays. This is because an image is just a matrix of numbers. Almost all computations in scienctific python are done on Numpy arrays. \
Arrays can be n-dimensional. Gray images are usually 2D arrays. RGB images are usually 3D arrays.


## Example using an edge detector (Sobel)

In [None]:
# Import specific functions from a package
from skimage import data, io, filters

In [None]:
image = data.coins()
# ... or any other NumPy array!
image

In [None]:
edges = filters.sobel(image)
io.imshow(edges)

## Example using thresholding

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

from skimage import data
from skimage.filters import threshold_otsu
from skimage.segmentation import clear_border
from skimage.measure import label, regionprops
from skimage.morphology import closing, square
from skimage.color import label2rgb

In [None]:
# load a subset of the image
image = data.coins()[50:-50, 50:-50]
image

In [None]:
# Apply threshold (Otsu)
thresh = threshold_otsu(image)
thresh

In [None]:
# Apply closing
bw = closing(image > thresh, square(3))
# plt.imshow(bw)
plt.imshow(bw, cmap='gray')

In [None]:
# Remove artifacts connected to image border
cleared = clear_border(bw)
plt.imshow(cleared)

In [None]:
# label image regions
label_image = label(cleared)
label_image
# plt.imshow(label_image)

In [None]:
# to make the background transparent, pass the value of `bg_label`,
# and leave `bg_color` as `None` and `kind` as `overlay`
image_label_overlay = label2rgb(label_image, image=image, bg_label=0)
plt.imshow(image_label_overlay)


## Region Properties
Requires a labeled image

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
ax.imshow(image_label_overlay)

for region in regionprops(label_image):
    # take regions with large enough areas
    if region.area >= 100:
        # draw rectangle around segmented coins
        minr, minc, maxr, maxc = region.bbox
        rect = mpatches.Rectangle((minc, minr), maxc - minc, maxr - minr,
                                  fill=False, edgecolor='red', linewidth=2)
        ax.add_patch(rect)

ax.set_axis_off()
plt.tight_layout()
plt.show()

In [None]:
Props = regionprops(label_image)
# rp

In [None]:
# Get data with list comprehensions
Labels = [rp.label for rp in Props]
Areas = [rp.area for rp in Props]

In [None]:
Labels

In [None]:
Image_df = pd.DataFrame()
Image_df['Label'] = Labels
Image_df['Area'] = Areas
Image_df