# Basic Use of Python

> Justin Post (Some notes modified from Dr. Eric Chi)

---

In preparation for dealing with big data we need to learn a programming language and figure out a good coding environment. We'll learn `python` and code in Google Colab/JupyterLab.

We choose python due to its popularity and the ease of programming in `spark` through `pyspark`.

We use [`JupyterLab`](https://jupyter.org/) as it is a widely used software for creating `python` notebooks.  Google `Colab` is built on `JupyterLab`!

---

## Getting Started

When you open a new notebook in `colab` by default it will use python to run any 'code cells' (this can be changed in the 'notebook settings' under the View -> 'Notebook info' menu).

There are two types of cells:
- Code cells: allow you to submit code
- Text cells: allow you to write text using 'markdown' (we'll learn more about that shortly!)

These can be added in the top left of the notebook (`+ Code` and `+ Text`). Below is a python code cell. These can be run by clicking 'shift-enter' when in the cell.

In [None]:
#A comment - this text is not evaluated
5 + 6
10 * 2
5**2

25

- Only the last bit of code is 'printed' unless you specifically print it. We'll do this much of the time with `print()` function.

In [None]:
# % is mod, // is floor
print(10 / 3)
print(10 % 3)
print(10 // 3)

3.3333333333333335
1
3


- Operators are applied left to right, except for exponentiation

In [None]:
3 + 4 - 5

2

In [None]:
(3 + 4) - 5

2

In [None]:
3**2**4

43046721

In [None]:
#interpreted this way
3**(2**4)

43046721

In [None]:
#not this
(3**2)**4

6561

---

## Creating Variables

You can assign variables to reference an object using `=`

In [None]:
x = "Hello! "
y = 'How are you?'
print(x)
print(x + y)

Hello! 
Hello! How are you?


- Strings are automatically concatenated using the `+` operator. As with most programming languages, there are special characters like `\` which indicate something. For instance, `\n` is a line break. These appear differently depending on if you print something or just view the object.

In [None]:
x = "Hello! \n"
y = 'Then I asked, "How are you?"'
x

'Hello! \n'

In [None]:
print(x)

Hello! 



In [None]:
x + y

'Hello! \nThen I asked, "How are you?"'

In [None]:
print(x + y)

Hello! 
Then I asked, "How are you?"


- Variables can be used to simplify and generalize your code

In [None]:
degrees_celsius = 26.0
print(9 / 5 * degrees_celsius + 32)
degrees_celsius = 100
print(9 / 5 * degrees_celsius + 32)

78.80000000000001
212.0


## Object Types

There are a number of built-in objects you can create. Some important ones are listed below:

- Text Type: `str`

In [None]:
y = "text string"
type(y)

- Numeric Types:	`int`, `float`

In [None]:
y = 10
type(y)
x = 10.4
type(x)

- Boolean Type:	`bool`

In [None]:
y = True
type(y)

- Sequence Types:	`list`, `tuple`

In [None]:
z = [10, "a", 11.5, True]
type(z)

- Mapping Type:	`dict`

In [None]:
w = {"key1": "value1",
     "key2": ["value2", 10]}
type(w)

---

## Multiple Assignment

- Assigning multiple variables on one line is easy

In [None]:
x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
print(z)

In [None]:
x = y = z = "Orange"
print(x)
print(y)


The use of `*` can allow you to 'pack' the remaining values into one object. Placement of the `*` is important here!

In [None]:
x, *y = "Orange", "Banana", "Cherry"
print(x)
print(y)
type(y)

Orange
['Banana', 'Cherry']


list

In [None]:
*x, y = "Orange", "Banana", "Cherry"
print(x)
print(y)

['Orange', 'Banana']
Cherry


---

## `_` Variable

When doing python interactively (as with a JupyterLab notebook), the last evaluated expression is assigned to the variable `_`. This carries across code cells.

In [None]:
x, y, z = "Orange", "Banana", "Cherry"
x

'Orange'

In [None]:
_

'Orange'

In [None]:
x

'Orange'

In [None]:
#print doesn't count toward the _!
print(y)

Banana


In [None]:
_

'Orange'

In [None]:
y

'Banana'

In [None]:
_

'Banana'

We'll use this `_` operator when doing computations where we don't need to save things. For instance,

In [None]:
degrees_celcius = 100
(9 / 5) * degrees_celcius + 32

In [None]:
_ - 10

In [None]:
(9 / 5) * degrees_celcius + 32 - 10

In [None]:
_ * 10

In [None]:
sum_numbers = 0
#no need to create a variable for the index
for _ in range(1,101):
  sum_numbers += _
sum_numbers

5050

---

## Copying vs Referencing

*Careful* when modifying elements of a compound object: 'assignment statements do not copy objects, they create bindings between a target (a spot in computer memory) and an object'!

If you come from R, this is a very different behavior!

In [None]:
#Changing the original modifies both variables
x = [1, 2, 3, "Cats Rule!"] #create a 'list' of four values
y = x                       #Make y an alias for x (reference the same memory)
print(x, y)

[1, 2, 3, 'Cats Rule!'] [1, 2, 3, 'Cats Rule!']


In [None]:
x[3] = "Dogs rule!" #note that this modifies y!
print(x, y)

[1, 2, 3, 'Dogs rule!'] [1, 2, 3, 'Dogs rule!']


- If you want to avoid this behavior, you can create a copy of the object instead of a reference

In [None]:
#Can create a (shallow) copy of the object rather than point to the same object in memory
y = x.copy()
x[2] = 10
x[3]= "No cats rule!"
print(x, y)

[1, 2, 10, 'No cats rule!'] [1, 2, 3, 'Cats Rule!']


---

## Variable Names

Variable names can use letters, digits, and the underscore symbol (but cannot start with a digit)

Ok variable names:

- `X`, `species5618`, and `degrees_celsius`

Bad variable names:

- `777` (begins with a digit)
- `no-way!` (includes punctuation)


---

## Augmented Assignment

Python has lots of shorthand notation!

- Quite often we want to take a value, add to it, and replace the old value

In [None]:
winnings = 100
winnings = winnings + 20
winnings

120

- 'Augmented assignment' gives a shorthand

In [None]:
winnings = 100
winnings += 20
winnings

120

- This works for all operators except negation

In [None]:
winnings
winnings -= 30
winnings

In [None]:
winnings *= 40
winnings

In [None]:
winnings **= 1/2
winnings

### Augmented Assignment Execution

Executed in the following way:

1. Evaluate the expression on the right of the `=` sign to produce a value  

2. Apply the operator to the variable on the left and the value produced

3. Store this new value in the memory address of the variable on the left of the `=`.

This means the operator is applied _after_ the expression on the right is evaluated.

In [None]:
winnings = 100
winnings += 100*10
winnings

---

## Continuing a Line of Code

- For long lines of code, we can break the code across multiple lines using `\` or by wrapping the code in `()`

In [None]:
10 + 20 - 100 * 60 \
/ 20

-270.0

In [None]:
(10 + 20 - 100 * 60
/20)

-270.0

---

## Functions & Methods

Two major ways to do an operation on a variable/object:

- Functions: `function_name(myvar, other_args)`
- Methods: `myvar.method(other_args)`

In [None]:
myList = [1, 10, 100, 1000]
#len function
len(myList)

4

In [None]:
#max function
max(myList)

1000

In [None]:
#pop method
myList.pop(3)

1000

In [None]:
myList

[1, 10, 100]

In [None]:
myList.append(100000)
myList

[1, 10, 100, 100000]

---

## Video Demo

This quick video shows how to open a new Google Colab notebook and run some basic python code. I'd pop the video out into the panopto player using the arrow icon in the bottom right.

In [7]:
from IPython.display import IFrame
IFrame(src = 'https://ncsu.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=bae161a8-bac0-4c44-a7a1-b0ef0163e90d&autoplay=false&offerviewer=true&showtitle=true&showbrand=true&captions=false&interactivity=all', width = '620', height = '380')

---

# Recap

- Create variables with `=`

- Many built-in data structures

- Python shorthands (multiple assignment, `_` variable, augmented assignment)

- Careful when *copying* a variable

- Functions and Methods