## COMM 187: Data Science in Communication Research -- Fall 2024

## Coding Lab #4: 
**Wednesday, Oct 23, 2024**

Welcome to the Coding Lab #3 for COMM 187: Data Science in Communication Research! 

Two weeks ago, we learnt about types of variables, lists, libraries, and functions.

Today's lesson plan:
 - Review of Coding Assignment #2 and #3
 - Conditionals in Python
 - Intro to Numpy
   - Numpy arrays
   - Operations using numpy arrays
     - itemwise adding two `numpy` arrays
     - array creation using `array`, `zeros` and `ones`

Today's lessons are based on the following online resources (feel free to try them out yourselves too!):
 - https://wesmckinney.com/book/numpy-basics
 - https://wesmckinney.com/book/python-basics

### Conditionals in Python

Programming, fundamentally, is telling computers a set of tasks to do. When thinking about complex tasks, it is helpful to first think about how we would do them, in order to break them down into discrete tasks for the computer. 

Now, imagine you're deciding what to wear based on the weather. 

If it's sunny 🔆, you might choose shorts 🩳 \
If it's raining 💦, perhaps a raincoat 🧥, or you would choose to carry an umbrella ☔

So, based on the *condition* of the weather, you will decide to either wear shorts, or wear a raincoat/carry an umbrella. 

This decision-making process is called **conditionals** in programming. \
In Python, we use `if`, `else`, and `elif` statements to execute different blocks of code based on various conditions. 

**if, else, and elif**

The `if` statement is one of the most well-known conditional statement types. It checks a condition that, if `True`, evaluates the code in the block that follows.

```
if <condition>:
    <block of code>
```

Here, `<condition>` is an expression with a `bool` output. If the value of `<condition>` is `True`, then then block of code below is computed. If the value of `<condition>` is `False`, the block is not computer, it is *skipped*.

For example, consider the following code:

In [None]:
x = -5
if x < 0: 
    print("It's negative")

Notice, again, the syntax of writing an `if` statement. 

It starts with `if` followed by a space, and then the condition expression which can be `True` or `False`. In this case, that condition expression is `x < 0`. 

The condition is then followed by a colon `:`.

After the colon `:`, in a new line is the code block that should run if the condition is True. 

**IMPORTANT:** Notice the gap at the beginning of the print statement. That is called an **indent**, and can be achieved by pressing the "tab" button on your keyboard. Every line that has an **indent** after the `if <condition>:` line is the "block of code" that will run if the condition is `True`.

**NOTE:** You need to use an indent in the next line after using a conditional or loop statement. If you do not do so, your computations might be wrong or you might encounter errors.

**Example:** Consider the previous example again, this time without indent. Execute the following cell and observe the output. Is it different from the output of the previous example? Do you get an error? What does the error say?

In [None]:
x = -5
if x < 0: 
print("It's negative")

**Example:** Consider the following two cells of code. Execute each cell and observe the output. Do they have the same output? If not, why?

In [None]:
x = 5
y = 1
z = 0
if x < 0:
    y = x
    z = x + y
print(z)

In [None]:
x = 5
y = 1
z = 0
if x < 0:
    y = x
z = x + y
print(z)

An `if` statement can be optionally followed by an `else` block if the conditions are `False`.

In [None]:
x = 5
if x < 0:
    print("It's negative")
else:
    print("It's positive")

Notice the syntax for writing an else statement. 

Notice the colon `:` at the end of `else`. 

Note that there is no indent with the `else:` statement, but the statement(s) that follow must have an indent, just like with the `if` statement.

The **elif** statement is simply a combination of `else` and `if` statement. 

Let us say I want to instruct the computer with the following steps about a variable `x`:

 - If x is less than 0, then print "It's negative"
 - Else, if x is less than 3, then print "Less than 3"
 - Else, if x is less than 10, then print "Less than 10 but more than 3"

This is how I would write it in Python:

In [None]:
x = 2

if x < 0:
    print("It's negative")
elif x < 3:
    print("Less than 3")
else:
    print("Less than 10 but more than 3")

Change the values of `x` in the cell above and see how it changes its output

### Numpy

**NumPy** is short for "Numerical Python," and it is one of the most important libraries for numerical computing in Python. Advanced libraries in Python (some of which we will learn about in this course) are built on top of NumPy! A very important library to learn and master to conduct Data Science research using Python.

In [None]:
import numpy

After importing numpy, now in the cell below, write `numpy` and press "Tab" key on your keyboard. 

In [None]:
### Your code below this line

It should look something like this:

<img src="./images/Lab2TabOptionsNumpy.png" alt="Tab Options" width="300"/>


Scroll through these items, you will notice that a lot of these are labeled `f` on the right and `function` on the left. These are functions included in the `numpy` library. You can use these functions after importing numpy.

#### Numpy Arrays

Thus far, we have learned about lists, which are sequences of values that can be stored in one variable, and can be indexed based on their position in the sequence.

An **array** is a vector containing values typically belonging to the *same data type*. In the physical hardware, these values are allocated with contiguous memory locations, and thus indexing is faster than in lists. 

One key difference between lists and arrays is that typically the size of an array is fixed, whereas it is not fixed in lists. Thus, with arrays, insertion and deletion costs are high as compared to the list.

Numpy has a function called **array** and a data type called **ndarray**. The function *array* converts lists to an array of type *ndarray*. These arrays are built specifically for numerical computations.

***Note:*** To call the **array** function, you will have to use the following syntax: `numpy.array(<input list goes here>)`

Let is try out an example:

In [None]:
list1 = [1,2,3,4]
list2 = [10,20,30,40]

In [None]:
numpy_list1 = numpy.array(list1)
numpy_list2 = numpy.array(list2)

Now, we have two versions of the same pair of lists: we have `list1` and `list2`, and we have `numpy_list1` and `numpy_list2`. Print them out below to see any differences in how they appear.

In [None]:
### Your code below this line

Now compute an addition (`+`) operation on the two lists, and then the two arrays. See what you get.

In [None]:
### Your code below this line 

What were the differences? Now, try out the other operations discussed earlier in the lab, discussed [here](#operators).

In [None]:
### Your code below this line

What were the differences? Were you able to use all the operators with lists? What about with numpy arrays? In numpy arrays, are the operators operating item by item?

Now, consider a third numpy array (run the next cell):

In [None]:
numpy_list3 = numpy.array(list1 + list2)
numpy_list3

Now, try using the same operations but this time, between `numpy_list3` and either `numpy_list1` or `numpy_list2`

In [None]:
### Your code below this line

What did you observe? What was the output? Why did you get that output?

#### Functions in `numpy` to create arrays

Take a look at the following short list of standard array creation functions in `numpy`. 

Try out each of these functions for yourself. Make use of the Numpy documentation [here](https://numpy.org/doc/stable/reference/generated/numpy.array.html) to understand these functions better.

| Function              | Description                                                                                                                                                                            |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `array`               | Convert input data (list, or  other sequence type) to an `ndarray` either by inferring a data type or  explicitly specifying a data type; copies the input data by default |
| `asarray`             | Convert input to `ndarray`, but do not copy if the input is already an `ndarray`                                                                                                           |
| `arange`              | Like the built-in range but returns an `ndarray` instead of a list                                                                                                                       |
| `ones`, `ones_like`   | Produce an array of all 1s with the given shape and data type; `ones_like` takes another array and produces a ones array of the same shape and data type                                 |
| `zeros`, `zeros_like` | Like ones and `ones_like` but producing arrays of 0s instead                                                                                                                             |
| `empty`, `empty_like` | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros                                                                                    |

What is feeling difficult about using these functions? What is feeling easy? Discuss on Canvas, among your classmates, and bring your questions to office hours.

## Practice

**Question 1a.** Make two new lists, `quiz1` and `quiz2`.
Assume that these lists have values of scores of 10 students in each of them. Each quiz is out of 100 points, so randomly assign points to each students for quiz1 and quiz2. 

For instance, the quiz1 could look like:
`quiz1 = [45,67,90,99,100,34,89,70,25,0,100]`

In [None]:
### Your code below this line

**Question 1b.** Now, convert these lists to numpy arrays. Make new variable names for them!

In [None]:
### Your code below this line

**Question 1c:** Using these two numpy arrays, make a third new numpy array which has the total sum of quiz scores for each student. \
*Tip:* you can simply add the two numpy arrays and store it in a new numpy array!

In [None]:
### Your code below this line

**Question 1d:** Calculate the total percentage score for each student. Remember, each score is out of 100.

In [None]:
### Your code below this line