# Lecture 2: Up and Running with Python

## Jupyter Notebooks, expressions, and data types
</br>
<center><img src="./logos.png"></center>

# Programming languages help extract information from data

* Python is popular for data science and software development.
* Learn through practice!
* Learn just enough to use it, as you need it!
* Follow along at [datahub.ucsd.edu](https://datahub.ucsd.edu)
* See the documentation for references: [Jupyter Documentation](https://jupyter-notebook.readthedocs.io/)

# Notebooks mix text with code

* Perfect for experimenting with code
    - Annotate code experimentation for others
* Perfect for presentations about data
    - Annotate data analysis with explanations
* It's not perfect for everything!
    - Software development, big projects, compiled languages...

# Notebook cells have two types: 
* `markdown`
* `code`
### (DEMO)

# Header &#8604; look at my source to see how I'm made!

* list item 1
    - sublist item1
    - sublist item2
* list item 2

---

## Sub-Header

You can link to websites:

[Jupyter Documentation](https://jupyter-notebook.readthedocs.io/)

You can display images:

![this is an image](./codedoge.jpeg)

## You can also use math equations:

Embedded math $e^{i\pi} + 1 = 0$ in text inline, or on it's own line:

$$e^{i\pi} + 1 = 0$$

## You can also write tables easily:

|header 1|header 2|header 3|
|--------|--------|--------|
|value 1 |value 2 |value 3 |
|value 4 |value 5 |value 6 | 


# Getting started with python: expressions

Write an expression in a "code cell" and either hit "Shift-Enter" or press the "Run" button to evaluate the code!

# REPL (read-eval-print loop)
* Type an **expression** into a code cell.
* The python interpreter **evaluates** the expression.
* The notebook **displays the value** of the (last) expression in the cell.

In [1]:
print('Hello world!')

Hello world!


In [2]:
2 * 3

6

In [3]:
# this is a comment
1 + 2 # this code demonstrates addition

3

In [4]:
# display two expresssions
3
4

4

# Numbers and Arithmetic

![](./arithmetic_table.png)

## Python uses typical order of operations

In [5]:
3*2**2

12

In [6]:
(3*2)**2

36

# Assignment: names and variables

$$ \overbrace{\texttt{myvariable}}^{\text{name}} = \overbrace{\texttt{2 + 3}}^{\text{any expression}} $$

* Assignment statements like above don't have a value.
* An assignment statement changes the meaning of the name to the left of the `=` symbol.
* `myvariable` is bound to `5` (value) not `2 + 3` (expression).

In [7]:
more_than_1 = 2 + 3

In [8]:
more_than_1

5

In [9]:
more_than_1 * 2

10

### Aside: hit ```tab``` to autocomplete a set name

In [10]:
more_than_1

5

### A name's value is set at the time of assignment

In [11]:
x = 2
y = 3 + x
y

5

In [12]:
x = 3

In [13]:
y

5

# Call Expressions
* Call expressions invoke functions
* Functions are called in python just like in standard mathematics:
$$ y = f(x) $$

In [16]:
abs(- 12)

12

### Some functions can take a variable number of arguments

In [17]:
min(3, -4)

-4

In [18]:
max(2, -3, -6, 10, -4)

10

### use the ```?``` after a function to see the documentation for a function
* or use the `help` function.

In [19]:
# round
my_number = 1.22
round(my_number)

1

In [20]:
round?

In [21]:
round(1.22222, 3)

1.222

### What functions are available for use? [builtin functions](https://docs.python.org/3/library/functions.html)

<img src="./python_builtins.png"  width="50%" align="middle"/>

In [22]:
print?

## Import functions from python modules
* Modules are roughly collections of python functions.
* Access these functions via an *import statement*.
* Call the functions using `module.function()` syntax.

### Import the `math` module and look around
* sqrt, log, etc...

In [24]:
import math

In [25]:
math.sqrt(9)

3.0

In [26]:
# what base is log?
math.log?

In [None]:
# tab completion for browsing
math.

# Data Types
* Every value in python has a type (use the `type` function!)
* All data analyzed in this class is stored as a python data type.
* Understanding the data often requires understand how the data was stored.

## Type inference: #MachineFail
</br>
<center><img src="./type_inference_1.jpg"  width="300"/></center>

## Type inference: #MachineFail
</br>
<center><img src="./type_inference_2.png"  width="700"/></center>

# Two data types: ```float``` and ```int```
* ```int``` : an integer of any size
* ```float```: a number with an optional fractional part

### ```int```
* integer arithmetic: `+`, `-`, `*`, `**`

In [27]:
2 + 3

5

In [28]:
2**5

32

In [29]:
2**4000

1318204093430943100103889794236591363184019161093272769092803450241756928112834455107975212317212203314094075648071682303844681769424058128173106245251218403854467444438688895632897064277199393003658655292424951448883218338941583237562000928492260894611103857875407791326544091858312558605043164728460363649082385000782681167246890021068910448808948534719215270882011976500612594485839776187466930127874523350479658699451405443521705380373270324028340081592616934836479947271609457689400724316866256888660306583248683060612501764335646973240725287456721773369482423667532334175568183922195469382045607202025388437122682684485863619421287513956658744539006801474797581397174811477043924882668866712923795412855584187446066572963049265860017933827257911002088122876736120060347897312016889399757435372765399896922309279825570166606797269890623692162876477283791552608646438916157053461695670374484050297527909408758729896842351653162609089838935144902005685122107904896671887894330923207197857563987720

### ```float```
* a float is specified using a decimal point
* a float might be printed using scientific notation

In [30]:
2.0 + 3.2

5.2

In [31]:
3.0**400

7.055079108655333e+190

### ```float```
* floats have limited size (but the limit is huge)
* floats have limited precision of 15-16 decimal places
* after arithmetic, the final decimal few places can be wrong (limited precision!)

In [32]:
3.0*4.2

12.600000000000001

In [33]:
3.0**4000

OverflowError: (34, 'Result too large')

## Type coercion between ```int``` and ```float```
* by default, python changes an int to float in a mixed expression
* an value can be explicity coerced using ```int``` and ```float``` functions.
* division of two integers automatically returns a float value

In [34]:
2.0 + 3

5.0

In [35]:
2/1

2.0

In [36]:
# want an integer back
int(2/1)

2

In [37]:
# int rounds float down to the nearest integer
int(3.9)

3

### Be careful converting between ```int``` and ```float```

In [38]:
2.51 * 100

250.99999999999997

In [39]:
int(2.51 * 100)

250

### The consequences of `float` to `int` conversion error

The Ariane I exploded on launch in 1996 due to floating point conversion errors: 
[see story here](https://itsfoss.com/a-floating-point-error-that-caused-a-damage-worth-half-a-billion/)

<center><img src="ariane.jpg" width="400"/></center>

# Text, Strings, and Types

## A string value is a snippet of text of any length
* enclose a string in either single or double quotes

In [40]:
'a'

'a'

In [41]:
"word"

'word'

In [42]:
"here is a full sentence. Here is another sentence."

'here is a full sentence. Here is another sentence.'

In [43]:
"12.0"

'12.0'

### String arithmetic

In [44]:
s1 = 'hello'
s2 = 'world'

In [45]:
s1 + s2

'helloworld'

In [46]:
s1 + ' ' + s2

'hello world'

In [47]:
s1*3

'hellohellohello'

### string methods
* Strings are associated with certain functions called *string methods*.
* Access string methods with a `.` after the string.
* e.g. `.upper()`, `.replace()`,...

In [48]:
my_cool_string = 'data science is super cool!'

In [51]:
my_cool_string = my_cool_string.upper()

In [52]:
my_cool_string.replace('super', 'super-duper')

'DATA SCIENCE IS SUPER COOL!'

### Special characters in strings
* apostrophies, quotes, new-lines, etc...

In [53]:
'my string's full of apostrophes!'

SyntaxError: invalid syntax (<ipython-input-53-50d82291aa71>, line 1)

In [54]:
"my string's full of apostrophes!"

"my string's full of apostrophes!"

In [None]:
# escape the apostrophe with a backslash!
'my string\'s "full" of apostrophes!'

In [55]:
print('my string\'s "full" of apostrophes!')

my string's "full" of apostrophes!


## Digression: ```print()```
* By default Jupyter notebooks displays the "raw" value of the expression of the last line in a cell.
* The function ```print```, displays the value in human readable text when it's evaluated.

In [56]:
12 # 12 won't be displayed
23

23

In [57]:
print(12)
print(23)

12
23


In [58]:
my_newline_str = 'here is a string with two lines.\nhere is the second line'  # '\n' inserts a new line
my_newline_str

'here is a string with two lines.\nhere is the second line'

In [59]:
print(my_newline_str)  # notice the quotes disappear!

here is a string with two lines.
here is the second line


# Type conversion to and from strings

### Type conversion causes messy data!

Genomics data (string-to-date):
> geneticists use MARCH1 as shorthand for membrane associated ring-CH-type finger 1. But Excel interprets MARCH1 as a date, automatically converting it to 1-Mar or another designation for the first of March.


### Type conversion causes messy data!

Genomics data (string-to-float):

> Excel misidentifies some other gene names as coordinates or floating points. You might be able to suss out that 1-Mar is actually MARCH1, but how about 2.31E+13? That's how Excel converts the RIKEN identifier 2310009E13.

### Type conversion causes messy data!

Personal data (string-to-int/float):

|name|address|city|state|zip|phone|
|----|-------|----|-----|---|-----|
|john|123 state street|Bangor|ME|4402|2074239878.0|
|`str`|`str`|`str`|`str`|`int`|`float`|

## Type conversion to and from strings
* Any value can be converted to a string using ```str```
* Strings can be converted to ```int``` and ```float``` when possible

In [60]:
str(3)

'3'

In [61]:
float('3')

3.0

In [62]:
int('4')

4

# Sequences: Arrays, and Ranges

# Lists

- We can make a sequence of items with a Python list
- Can hold `int`s, `float`s, `str`s, etc.

In [None]:
[1, 2, 3, 4, 5]

# Arrays

* But lists are slow.
* Solution: use NumPy *arrays*
* To make array, pass a list to `np.array` function:


In [None]:
import numpy as np               # 'numerical python library' for working with arrays

# Array Arithmetic

- Adding, subtracting, multiplying, dividing an array by a number *broadcasts* elementwise.

In [None]:
sd_temps_1990 = np.array([62, 63, 64, 64, 67, 68, 70, 70, 72, 69, 68, 67])
sd_temps_1990

### use ```[ ]``` or ```.item()``` to access an array element by index
* Warning: array indices start with zero!

In [None]:
sd_temps_1990[0]

In [None]:
sd_temps_1990.item(0)

## Arrays make working with data easy
* add them, subtract them, muliply, divide, exponentiate
* arrays have to be the same size!

In [None]:
a1 = np.array([1,2,3])
a2 = np.array([3,2,1])

In [None]:
a1

In [None]:
a2

In [None]:
a1 + a2

In [None]:
a1 - a2

In [None]:
a1 * a2

In [None]:
a1/a2

In [None]:
a1**a2

## Arrays for basic statistics: newborn birth weight

In [None]:
# four girls with weight in kg: g1 = 3.405, g2 = 3.207, g3 = 2.42, g4 = 3.984

g1 = 3.405 
g2 = 3.207
g3 = 2.42
g4 = 3.984

# average weight of a newborn girl (in kg): 3.3
girl_av_weight = 3.3

### Load the weights into an array of floats

In [None]:
weights_kg_g = np.array([g1, g2, g3, g4]) 

weights_kg_g

### Calculate the deviation of weights from the average weight
* Subtracting a number from an array subtracts the number from each element.

In [None]:
weights_kg_g - girl_av_weight

### Convert the weights to pounds (2.2 kg/lb)

In [None]:
weights_lbs_g = weights_kg_g * 2.2
weights_lbs_g

### How many girls are recorded in the array?

In [None]:
len(weights_lbs_g)

## Arrays for basic statistics: daily temperatures

### Below is an array of daily high temperatures in San Diego from August 2018

In [None]:
temps = np.array([86, 85, 85, 84, 85, 86, 91, 89, 90, 88, 88, 85, 83, 82, 79, 81, 82,
                   83, 82, 79, 81, 83, 83, 79, 80, 80, 79, 80, 82, 82, 80])

Numbers of days temperatures are collected in August:

In [None]:
temps.size

### temperature statistics (mean, min, max)

In [None]:
temps.sum() / temps.size  # use sum and size

In [None]:
temps.mean() # build the mean method

In [None]:
min(temps), max(temps) # builtin functions work on array

In [None]:
temps.min(), temps.max() # the array has it's own min/max method (faster)

### Sort the temperatures / calculate differences

In [None]:
np.sort(temps)

In [None]:
np.diff(temps)

# Ranges
* A range is an array of consecutive numbers
* ```np.arange(end)```: An array of increasing integers from 0 up to end
* ```np.arange(start, end)```: An array of increasing integers from start up to end
* ```np.arange(start, end, step)```: A range with step between consecutive values
* The range always includes start but excludes end (i.e. a half-open interval)

In [None]:
np.arange(5)

In [None]:
np.arange(3, 9)

In [None]:
np.arange(3, 30, 5)

In [None]:
np.arange(-3, 2, 0.5)

In [None]:
np.arange(1, -3)

In [None]:
np.arange(1, -3, -1)

### Discussion Question

Assume you have run the following statements:||Choose the expression that will cause an error:
---|---|---
`x = np.array([2,3,4])`||`A.                    x + y`
`y = np.arange(2,3,4)` ||`B.                    x + z`
`z = np.arange(3)`     ||`C.    x.item(0) + y.item(0)`
                       ||`D.    x.item(1) + y.item(1)`
