# STA 141B Data & Web Technologies for Data Analysis

### Lecture 1, 9/28/23, Basics of Python

### Today's topics

<style>
    font-size: 40x;
</style>

- Course Organization
- Basics of Python

### Course Organization

This course covers topics of data acquisition and processing. 
We will learn how to automatically retrieve information from publicly available sources on the internet. 
This includes processing these data so that they can then be studied statistically. 

The course consists of three parts: 

1. Introduction to Python
2. Data acquisition
3. Visualisation

The final grade is determined by 
- six homeworks (40%),
- one exam on the basics of Python on October 19th (each 20%),
- project due December 15th (35%).

For comprehensive and updated information about the course, please consult [Canvas](https://canvas.ucdavis.edu/courses/823714).  

The project will be collaborative work with two to three group members. You will use the methods learn in this class to procure a data set, preferrably from multiple sources, and process it to make it accessible for further investigation. This involves displaying its properties by visual means, so that statistical hypotheses can be formed. 

The groups have to be formed by October 20th. A project proposal is due November 4th. 

All material of this course will be made available online [GitHub](https://github.com/kramlinger/sta141b). Use [Piazza](https://piazza.com/class/lmwfo497gbh5xs), for any inquiries regarding organization, homework or lectures. We will monitor this site M-F during business hours. Please do not write emails! Screen recordings and all other class related administrative information will be made available on [Canvas]([Canvas](https://canvas.ucdavis.edu/courses/823714). 

Office hours: 
* Peter Kramlinger: R 10:45-11:45 AM, MSB 1143
* Xiangbo Mo: T.B.A.
* Jingwei Xiong: F 1-2 PM, [Zoom](https://canvas.ucdavis.edu/courses/823714/external_tools/8022)

#### Ethics

This is a programming class. Using assistance is part of programming and is encouraged. This can be AI based, or from online sources (e.g., [stackoverflow](https://stackoverflow.com/questions)). 

However, you will be graded by your proficiency in coding. In all assignments, make sure that you display your own contribution. Submitting AI generated code, answers from online sources, or even classmates' solutions will not be enough to pass the course. Furthermore, if you pass off someone else's work as your own, then you are engaging in academic misconduct. 


### Basics of Python

For this course, we will use Python to retrieve data. Today and next Tuesday we will introduce and review some basic aspects. Due to its simplicity, it is one of the most popular programming languages. 

In [None]:
import this

#### Arithmetic operations

Python is a fancy calculator that allows for all basic arithmetic operations. The first notable difference to R is that not every result is printed. 

In [None]:
12 + 4
20 - 4 
2 * 8
32 / 2

In [None]:
4 ** 2 # exponentiation

In [None]:
4 ^ 2 # ^ is a binary operator in python, we won't use it here

In [None]:
33 % 17 # modulus

In [None]:
(1 + 12 / 4) ** (3 + 2) / 8 * * 2

The assignment operator in Python is `=`. 

In [None]:
x = 4 

In [None]:
x

Python is all about cleanliness! Very useful are the assignment operators that perform an arithmetic operation and avoid redundant code. 

In [None]:
x += 3
x

In [None]:
x -= 2
x

In [None]:
x *= x + 1
x

In [None]:
x /= 6
x **= 2
x %= 5
x

Boolean operators are: 

In [None]:
4 == 5

In [None]:
4 != 5

In [None]:
4 >= 5

In [None]:
4 >= 5 and True

In [None]:
4 >= 5 or True

In [None]:
4 <= 5 or not True

Brackets are important to nest statements. 

In [None]:
4 <= 5 == True

In [None]:
(4 <= 5) == True 

In [None]:
(4 <= 5) is True

The distinction between `==` and `is` is not semantics: 

In [None]:
x = 3
y = 3.0
x == y # equality

In [None]:
x is y # identical

#### Syntax

Python code is user friendly and much cleaner than, e.g., R. We want to keep it that way. Therefore, lets adhere to some principles: 

Instead of brackets, Python uses indentation. The if clause below enters the indented chunk if the condition is `True`. 

In [None]:
if 4 > 0: 
    print('Four is strictly greater than zero.')
else: 
    print('Four is not strictly greater than zero.')
print('Indeed.')

We will learn more about loops and if statement next lecture. Indentation works if at least one space is left, as in the `else` statement above. However, it is advised to use four blank spaces. In any case, be consistent throughout your code. 

In [None]:
k = 0
for i in range(1,100):
    k = k + i
    print(k)

Indentation really matters in Python!

In [None]:
k = 0
for i in range(1, 100): # 1, 2, 3, ..., 99
    k = k + i
print(k)

Identation is necessary everywhere where other languages would put curly brackets...

In [None]:
def square_it(x):
    return x ** 2

In [None]:
square_it(123)

While statements can be separated using `;` its better to put every statement in a new line. Screen space doesn't cost anything!

In [None]:
y = 2; x = y + 1; print(y % x); (y ** 2) == 2 # wrong!

If you don't want to break down lengthy formulae in smaller steps, use brackets to use several lines: 

In [None]:
x = (2 # imagine a full line of code...
     + 3 # ...and a second one... 
     - 4) # .. until you're done

Keep the operators at the beginning of the line to increase readability. 

Variable names must not contain `.`, as the dot accesses an attribute of the considered object. 

In [None]:
x.y = 3

In [None]:
x_y = 3

Cleanliness and convention is very important in Python. Make sure to follow the [style guide (link to external website)](https://peps.python.org/pep-0008/)  in coding. 

#### Types

So far we have used <kbd>int</kbd>, <kbd>floats</kbd> and <kbd>bool</kbd>. We will use the following basic types: 
- Numeric: <kbd>int</kbd>, <kbd>floats</kbd>, <kbd>complex</kbd>
- Boolean: <kbd>bool</kbd>
- String: <kbd>str</kbd>
- Sequence: <kbd>list</kbd>, <kbd>tuple</kbd>, <kbd>range</kbd>
- Mapping: <kbd>dict</kbd>
- Set: <kbd>set</kbd>

We can check the type by calling `type`. 

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type(1 + 1j) # not i, but j! 

In [None]:
type(True) 

In [None]:
type('Hello World!')

Types can be casted using the constructors.  

In [None]:
x = 1
type(x)

In [None]:
id(x)

In [None]:
x = float(x)
type(x)

In [None]:
id(x)

Although recasting makes things easy to work with, always be aware of all the work you require your compiler to do! 

In [None]:
x = bool(x)
type(x)

In [None]:
x 

In [None]:
bool(-1)

In [None]:
bool(0)

In [None]:
bool(1)

In [None]:
bool(0.1)

In [None]:
x = str(x)
x

In [None]:
x + ' or False?'

We can force printing by using `print`. 

In [None]:
x = bool(x)
print(x)
complex(x)

You can check what a functions does by running `help`. 

In [None]:
help(print)

We are using Python because it is industry standard, not because it is superior to, e.g., R. It is industry standard because it is so easy and simple. We should keep it that way. 

__Adhere to the principles of proper programming!__

- K.I.S.S. (Keep It Simple, Stupid): Functions should perform one task, and one task only. 
- Rule of Three (avoid code duplication): Duplication is a bad programming habit because it makes code harder to maintain. 
- Clarity before Efficiency: Never sacrifice clarity for some perceived efficiency. Donald Knuth: "Premature optimization is the root of all evil."
- Naming: Stick to consistency and conventions. 

#### Sequence

##### Range

We have already used a <kbd>range</kbd> object in `for` loops. The function `range(start, stop, step)` creates a <kbd>range</kbd> type object. Note that it starts at `start` and ends at `stop - 1`.

In [None]:
x = range(1, 10) 
len(x)

In [None]:
x = range(1, 10)
for n in x:
    print(n)

In [None]:
type(x)

`range(0, 100)` creates an iterable object and should be used in , e.g., `for` loops to iterate over. It does not instantiate a vector of length $100$, that would take up too much space. 

In [None]:
x = range(0, 100**100)

In [None]:
import sys
sys.getsizeof(x)

In [None]:
sys.getsizeof(100**100) # just the largest value of that range...

##### Tuple

A <kbd>tuple</kbd> is an ordered collection of values. Think of coordinates. <kbd>tuple</kbd> is immutable, which means they can't be changed after they're created.

In [None]:
x = (1, 3.0, "horse") # parenthesis are optional, but should be used for clarity 
x

In [None]:
type(x)

There are three ways to get elements from a tuple:
- Indexing with `[]` 
- Slicing with `[a:b]`, a slice a:b gets elements from a to b - 1
- Unpacking: assign to a same-shape tuple of variables on the left-hand side

Note that that objects in Python are indexed starting with zero!

In [None]:
x[0]

In [None]:
type(x[2])

In [None]:
x

In [None]:
x[3]

In [None]:
x[0:2]

In [None]:
x[0:1] 

In [None]:
type(x[:1]) # leaving the a-position blank assumes the first entry at position zero

Python has the ability to assign multiple varaiables at once! This is called unpacking. 

In [None]:
u, v, w = x
print(u)
print(v)
print(w)

In [None]:
type(v)

For unpacking, there must be enough variables provided. 

In [None]:
u, v = x
print(v)
print(w)

Once created, tuples can't be changed!

In [None]:
x[2] = 'horsies'

This is a feature, not shortcoming of <kbd>tuple</kbd>. Since they cannot be changed nor appended, they are more  economical than <kbd>list</kbd>. 

##### Lists

<kbd>list</kbd> is the mutable counterpart of <kbd>tuple</kbd>. They are instantiated with square brackets. 

In [None]:
y = [100, 3.0, "horse"]
type(y)

In [None]:
sys.getsizeof(x)

In [None]:
sys.getsizeof(y)

They are however mutable. 

In [None]:
y[2] = "horsies"
y

Accessing lists works just like for tuples. 

In [None]:
y[0]

In [None]:
y[3]

In [None]:
y[1:]

In [None]:
y[2:] # one-dim list

In [None]:
y[2:][0]

There are two ways to read the first and third element in `y`. 

In [None]:
z = [y[i] for i in range(0, 3, 2)] # more on that syntax next lecture! 
z

Alternatively, we can slice:

In [None]:
z = y[0:3:2] # start at 0, stop at 3 - 1, use step size 2
z

Just as for tuples, we can use unpacking. 

In [None]:
u, v, w = y
print(u)
print(v)
print(w)

In [None]:
del y[1]
y

Some other important methods for <kbd>list</kbd> are: 

In [None]:
y.append(4) #appends argument to list, does not return anything

In [None]:
y

In [None]:
y.append([6,7]) #appends list!
y

In [None]:
y.index('horsies') # returns index of argument

In [None]:
y.index(4)

In [None]:
y.pop(1) # removes element on argument position and returns it

In [None]:
y

In [None]:
y.reverse() # reverses the order of elements, returns nothing
y

### Mapping

<kbd>dict</kbd> type objects are a one-to-one map from keys to values. In other words, you use a key to look up a value. Dictionaries are mutable. They are instantiated with curly brackets `{}` and colons `:`. 

In [None]:
x = {"hello": 1, 3: 5.0}
x

We can access dictionaries with indexing or its `get` method. 

In [None]:
x['hello']

In [None]:
x[3]

In [None]:
x.get('hello')

In [None]:
x.get('house') # nothing is returned

In [None]:
x.get('house', 'not here') # second argument is returned if first argument is not in dictionary

The keys of a dictionary must be unique and of immutable type, i.e., numeric, boolean, string or tuples. 

In [None]:
x.keys() #  returns a dict_keys type object

In [None]:
x['new key'] = 8
x

In [None]:
x[[1,2]] = "lists are mutable and cant be keys"

Therefore, only the value, not the key can be changed. 

In [None]:
x['hello'] = 5.0 ** 0.5
x

<kbd>dict</kbd> type is useful, since lists be looked up efficiently. 

#### Set

A <kbd>set</kbd> is an unordered collection of unique items. It is instantiated with curly brackets. Since the items are unique, they must be inmutable!

In [None]:
x = {"apple", True, 2} # display order changed, they are unordered
x

In [None]:
{"apple", [2,3], 2}

Sets are unordered. Hence, they do not support indexing. 

In [None]:
x[1] 

In [None]:
x.add("new item")
x

In [None]:
x.add("new item") # the items are unique
x

In [None]:
x.remove("new item")
x