## COMM 187 (160DS): Data Science in Communication Research -- Spring 2024

## Coding Lab #4: Python Basics
**Wednesday, April 24, 2024**

Welcome to the Coding Lab #4 for COMM 187 (160DS): Data Science in Communication Research! 

(This is technically the third coding lab, but we are in week four, so we are calling it Coding Lab #4)

In the last Coding Lab, we learnt about some more Python basics, including types of variables, functions and libraries, lists, and numpy arrays.

Today's lesson plan:
 - How to find help in Python
 - Review of lists and arrays
 - Conditionals and Loops in Python
 - Operations using numpy arrays
   - sum, prod, all, any, count_nonzero
   - round, exp, sort
   - char.lower, char.upper, char.strip, char.isalpha, char.isnumeric
   - char.count, char.find, char.rfind, char.startswith

Today's lessons are based on the following online resources (feel free to try them out yourselves too!):
 - https://wesmckinney.com/book/numpy-basics 
 - https://inferentialthinking.com/chapters/05/1/Arrays.html

### How to find help in Python

Two main ways to look for help in Python.

1. Type `?` followed without space by the function or term you need help with. For instance, you needed help with the function `abs`, you would type `?abs`.
2. The obvious one -- search on the internet! Searching for solutions on Google (and now ChatGPT!) and using those solutions successfully for your own code is also a skill. Practice this skill with challenging assignment questions. 

### Review of lists and arrays

Let us say that you wanted to store more than one value in one variable. How can you do that?

We can make a **list** using the following expression: `[value1, value2, value3, ...]` \
where `value1`, `value2`, and so on are values that you want to store in a sequence in the list.

Let us look at some examples.

In [None]:
a = [1,2,3,4,5]
a

**New topic: RANGES!**

You can also make a **range**, or an ordered sequence of subsequent integers using lists. 

The syntax for making a range is: `range(<start>, <stop>, <step>)`

 - *start* is the *starting integer* for the range.
 - *stop* is the *ending integer + 1* for the range; \
   NOTE that the *stop* needs to be one more than the integer you would like to stop at.
 - *step* is the difference, or *step increment* between every consecutive integer. The default value is `1`.

For instance, if you want to make a sequence of 1,2,3,4,5, ..., 20, you can simply write:

In [None]:
range(1, 21)

Notice that the `<step>` value was not specified, so the default value of `1` was used for `<step>`.

What does the output look like? Does it look like a list? If not, that is because range is a data type, just like `int`, `float`, `str`, `list` etc. 

To convert it to a list, typecast it to a list:

In [None]:
list(range(1, 21))

**Question:** Make a list of all the integers starting from 35 and ending in 72. 

In [None]:
### Your code below this line

**Question:** Make a list of integers starting from -56 and ending in 85, with step increments of 3. 

In [None]:
### Your code below this line

#### Numpy Arrays

An **array** is a vector containing values typically belonging to the *same data type*.

Numpy has a function called **array** and a data type called **ndarray**. The function *array* converts lists to an array of type *ndarray*. These arrays are built specifically for numerical computations.

To call the **array** function, you will 
1. have to import the numpy library by executing `import numpy`, and then
2. have to use the following syntax: `numpy.array(<input list goes here>)`

For example:

In [None]:
import numpy

In [None]:
list1 = [1,2,3,4]
numpy_list1 = numpy.array(list1)
numpy_list1

Ranges can also be typecast into `numpy` arrays. For instance, to convert `range(1,100,2)` into `numpy` array, we just need to write:

In [None]:
numpy.array(range(1,100,2))

#### Indexing

To find a value in this list or an array, you can find it using its *index*. In order of sequence from the first item in the list, the index starts from 0, 1, ..., n-1 where n is the length of the list. The item at the index can be retrieved using the following syntax: \
`<list name>[<index>]`

For instance, the 4th item in the list can be accessed as `a[3]`

In [None]:
a[3]

**Question:** Access the 3rd item in `numpy_list1`

In [None]:
numpy_list1[2]

#### Negative indexing!

**Question**: What if you want to index the last item in a list or an array?

You can index using negative numbers! 

`-1` indexes the last item, `-2` indexes the second to last item, and so on and so forth.

Try it out!

In [None]:
list1[-1]

**Question:** Access the second to last item in `numpy_list1`

In [None]:
numpy_list1[-2]

#### String indexing

**Question**: What if you want to index a character in a ***string***?

You can index it just like with lists! The first character in a string is at index 0, the second character at index 1, so on and so forth.

The syntax is `<string name>[<index>]`

Try it out!

In [None]:
str_var = "Communication"
str_var

In [None]:
str_var[5]

#### Range indexing

**Question**: What if you want to index a range of indices from a list or array?

Instead of indexing with just an integer, you can index with the following syntax: 
```
<list or array name>[<start index> : <stop index>]
```

Just like with the `range` function, here `[<start index> : <stop index>]` includes everything from the value at index `start` to one index befoer `stop`.

Try it out!

In [None]:
list1[0:3]

**Question:** Make a new numpy array `numpy_array2` which has all the integers from 1 to 1000, with increment of 12. Then, list the values in the list from position 3 to position 15. 

NOTE: Position 1 is index 0. 

In [None]:
numpy_array2 = numpy.array(range(1,1001, 12))
numpy_array2[2:15]

### Conditionals and Loops in Python

#### Conditionals

Programming, fundamentally, is telling computers a set of tasks to do. When thinking about complex tasks, it is helpful to first think about how we would do them, in order to break them down into discrete tasks for the computer. 

Now, imagine you're deciding what to wear based on the weather. If it's sunny, you might choose shorts; if it's raining, perhaps a raincoat, or you would choose to carry an umbrella. So, based on the *condition* of the weather, you will decide to either wear shorts, or wear a raincoat/carry an umbrella. This decision-making process is something call **conditionals** in programming. In Python, we use `if`, `else`, and `elif` statements to execute different blocks of code based on various conditions. 

**if, else, and elif**

The `if` statement is one of the most well-known conditional statement types. It checks a condition that, if `True`, evaluates the code in the block that follows.

```
if <condition>:
    <block of code>
```

Here, `<condition>` is an expression with a `bool` output. If the value of `<condition>` is `True`, then then block of code below is computed. If the value of `<condition>` is `False`, the block is not computer, it is *skipped*.

For example, consider the following code:

In [None]:
x = -5
if x < 0: 
    print("It's negative")

Notice, again, the syntax of writing an `if` statement. 

It starts with `if` followed by a space, and then the condition expression which can be `True` or `False`. In this case, that condition expression is `x < 0`. 

The condition is then followed by a colon `:`.

After the colon `:`, in a new line is the code block that should run if the condition is True. 

**IMPORTANT:** Notice the gap at the beginning of the print statement. That is called an **indent**, and can be achieved by pressing the "tab" button on your keyboard. Every line that has an **indent** after the `if <condition>:` line is the "block of code" that will run if the condition is `True`.

**NOTE:** You need to use an indent in the next line after using a conditional or loop statement. If you do not do so, your computations might be wrong or you might encounter errors.

**Question:** Consider the following two cells of code. Execute each cell and observe the output. Do they have the same output? If not, why?

In [None]:
x = 5
y = 1
z = 0
if x < 0:
    y = x
    z = x + y
print(z)

In [None]:
x = -5
y = 1
z = 0
if x < 0:
    y = x
z = x + y
print(z)

An `if` statement can be optionally followed by an `else` block if the conditions are `False`.

In [None]:
x = 5
if x < 0:
    print("It's negative")
else:
    print("It's positive")

Notice the syntax for writing an else statement. 

Notice the colon `:` at the end of `else`. 

Note that there is no indent with the `else:` statement, but the statement(s) that follow must have an indent, just like with the `if` statement.

#### Loops

Let us say you needed to check if a condition is true over and over, many times. 

Let us take the rainy day example again. Instead of checking for rain once on a specific day, let us say you need to check every day for a week. So, every day, you check if it is raining. If yes, then you pick an umbrell or raincoat. If not, shorts it is. So basically, I am running an `if` statement **FOR** each day in the week.

That is a `for` loop. Instead of running a block of code *IF* a condition is true, we are running a block of code *FOR* each value in a list of values.

Typical syntax for a `for` loop in Python:
```
for <value> in <collection>:
    <do something with value>
```

Here, the `<collection>` can be a list, numpy.array, or a range!

Let us first try with a list.

In [None]:
list2 = [10,12,14,16]
for i in list2:
    print("The value of i is:", i)

Let us try the same with range.

In [None]:
for i in range(10,17,2):
    print("The value of i is:", i)

### Operations using `numpy`

#### Basic mathematical and statistical functions on arrays in `numpy`

A set of mathematical functions that compute statistics about an entire numpy arrays or about the data along an axis are accessible as methods of the array class. You can use aggregations (sometimes called reductions) like sum, mean, and std (standard deviation) can be called using this syntax:
```
numpy.<function name>()
```

Try out each of these functions for yourself. Make use of the Numpy documentation [here](https://numpy.org/doc/stable/reference/generated/numpy.array.html) to understand these functions better.

| Function              | Description                                                                                                                                                                            |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `sum`                  | Sum of all the elements in the array |
| `mean`                 | Arithmetic mean |
| `std`, `var`           | Standard deviation and variance, respectively|
| `min`, `max`           | Minimum and maximum |
| `prod`                 | Multiply all elements together |
| `cumsum`               | Cumulative sum of elements starting from 0 |
| `cumprod`              | Cumulative product of elements starting from 1 |
| `all`              | Test whether *all* elements are true values (non-zero numbers are true) |
| `any`              | Test whether *any* elements are true values (non-zero numbers are true) |
| `count_nonzero`              | Count the number of non-zero elements |

What is feeling difficult about using these functions? What is feeling easy? Discuss on Canvas, among your classmates, and bring your questions to office hours.

In [None]:
numpy_list3 = numpy.random.standard_normal(20)+5 # a list of 500 values with the standard normal distribution + 5

In [None]:
### Try out the functions mentioned above here

#### Mathematical functions which take an array and output an array in `numpy`

The following set of mathematical functions take an input array and create a new output array based on the specific mathematical functions.

Try out each of these functions for yourself. Make use of the Numpy documentation [here](https://numpy.org/doc/stable/reference/generated/numpy.array.html) to understand these functions better.

| Function              | Description                                                                                                                                                                            |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `diff`                  | Difference between adjacent elements |
| `round`                 | Round each number to the nearest integer (whole number) |
| `cumprod`           | A cumulative product: for each element, multiply all elements so far |
| `cumsum`           | A cumulative sum: for each element, add all elements so far |
| `exp`                 | Exponentiate each element |
| `cumsum`               | Cumulative sum of elements starting from 0 |
| `log`              | Take the natural logarithm of each element |
| `sqrt`              | Take the square root of each element |
| `sort`                 | Sorts the array in increasing order in-place |

In [None]:
### Try out the functions mentioned above here

#### Functions for arrays of strings `numpy`

The following set of `numpy` functions take an input array of `str` values and returns a new output array based on the specific functions.

Try out each of these functions for yourself. Make use of the Numpy documentation [here](https://numpy.org/doc/stable/reference/generated/numpy.array.html) to understand these functions better.

| Function              | Description                                                                                                                                                                            |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `char.lower`                  | Lowercase each element |
| `char.upper`                 | Uppercase each element |
| `char.strip`           | Remove spaces at the beginning or end of each element |
| `char.isalpha`           | Whether each element is only letters (no numbers or symbols) |
| `char.isnumeric`                 | Whether each element is only numeric (no letters) |

In [None]:
numpy_list4 = sample_strings = numpy.array([
    "Hello World", 
    "Python3.8", 
    "Data Science 101", 
    "   padded   ", 
    "UPPERCASE", 
    "lowercase", 
    "12345", 
    "alpha123",
    "New-York", 
    "multiple words example",
    "One More Test", 
    "with symbols !@#", 
    "spaces in front ", 
    "noSpaces",
    "CapItalized",
    "alllower",
    "ALLUPPER",
    "123 numbers",
    "EndsWithSpace ",
    "startswithCaps",
    "MiXeD CaSe StRiNg"
])

In [None]:
### Try out the functions mentioned above here

#### Functions to SEARCH in array of strings `numpy`

The following set of `numpy` functions take an input array of `str` values and returns a new output array based on the specific functions.

Try out each of these functions for yourself. Make use of the Numpy documentation [here](https://numpy.org/doc/stable/reference/generated/numpy.array.html) to understand these functions better.

| Function              | Description                                                                                                                                                                            |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `char.count`                  | Count the number of times a search string appears among the elements of an array|
| `char.find`                 | The position within each element that a search string is found first |
| `char.rfind`           | The position within each element that a search string is found last |
| `char.startswith`           | Whether each element starts with the search string |

In [None]:
### Try out the functions mentioned above here