## Week 6 Lecture `.ipynb` File

#### Author: Mahmoud Harding

## Jupyter Notebook

A Jupyter Notebook is an open-source interactive computing environment that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It supports multiple programming languages, with Python being the most commonly used. Jupyter Notebooks are widely used for data analysis, scientific research, machine learning, and education because they provide a convenient way to run code, visualize results, and document findings in a single, easily shareable format.

### Code Cells

A code cell in a Jupyter Notebook is a section where users can write and execute programming code. When run, the code within the cell is processed, and the output (if any) is displayed directly below the cell. Code cells allow for iterative development, as users can modify and re-run cells independently without restarting the entire notebook, making them ideal for tasks like data analysis, testing algorithms, and exploring code interactively.

In [None]:
"Hello World"

In [None]:
## Arithmetic
3 + 2

In [None]:
## Arithmetic
4 ** 2

In [None]:
## 1e6 means 1 * 10^6, or 1,000,000
1e6

In [None]:
"Hello World"
4 ** 2
1e6

The first line (`4 ** 2`) calculates $4^2 = 16$, but it is not displayed because it is not the last line of the cell. The second line (`1e6`), which represents 1,000,000, is the last line, so only this result is shown in the output: 1000000.0.

If you want to display both values, you would need to use `print()` statements for each calculation:

In [None]:
print("Hello World")
print(4 ** 2)
print(1e6)

### Markdown (Text) Cells

Markdown cells in a Jupyter Notebook allow users to add formatted text, explanations, images, equations, and more, making the notebook easier to read and understand. These cells support Markdown syntax, enabling the use of headings, lists, links, bold and italic text, and even LaTeX for mathematical expressions. Markdown cells are useful for adding context to code, documenting workflows, and creating a narrative alongside the code and its outputs, helping make notebooks more informative and presentable.

Click **[here](https://www.markdownguide.org/basic-syntax/)** to view a Basic Syntac | Markdown Guide.

## Assignment Statements

In Python, an assignment statement is used to assign a value to a variable using the equal sign (`=`). For example, `x = 5` assigns the value `5` to the variable `x`. Python assignment statements always use `=` for assignment, regardless of data types (integers, strings, etc.).

In contrast, R uses a different syntax for assignment. While you can use `=` in R, the preferred operator is the left arrow (`<-`). For example, `x <- 5` is the standard way to assign the value `5` to `x` in R.

Thus, the key difference is that Python uses `=` exclusively for assignment, while R commonly uses `<-` for the same purpose.

In [None]:
## Assignment statement
x = 5

In [None]:
## Display the value of x
x

In [None]:
## Print the value of x
print(x)

## Data Types

In Python, the basic data types `int`, `float`, and `boolean` are used to represent numbers and logical values:

- **int**: Represents whole numbers, such as `5` or `-3`.

- **float**: Represents decimal numbers, such as `3.14` or `0.001`.

- **Boolean**: Represents logical values, either `True` or `False`.

These data types work similarly in R:

- **int**: R also uses `int` for whole numbers, though R automatically assigns `numeric` (float) for numbers unless specified otherwise.

- **float**: In R, `numeric` represents both integers and floating-point numbers, so R doesn’t have a distinct `float` type like Python.

- **Boolean**: Logical values in R are represented by `TRUE` or `FALSE`, which are equivalent to Python’s `True` and `False`.

In [None]:
## Assigning an integer value to the variable 'my_int'
my_int = 5

## Assigning a float value to the variable 'my_float'
my_float = 5.0

## Assigning a boolean value to the variable 'my_bool'
my_bool = True

## The type() function returns the data type of the variable
print("The data type for the variable my_int is", type(my_int), ".")
print("The data type for the variable my_float is", type(my_float), ".")
print("The data type for the variable my_bool is", type(my_bool), ".")

## Data Strutures

In Python, lists, tuples, and dictionaries are common data structures used to store and organize data:

- **List**: A list is an ordered, mutable collection of items. Lists are created using square brackets `[]`, and you can add, remove, or modify elements. Lists can contain different data types.

  ```python
  my_list = [1, 2, 3, "apple"]
  ```
<br>

- **Tuple**: A tuple is similar to a list but is immutable, meaning its elements cannot be changed after creation. Tuples are created using parentheses `()`. They are useful when you want to ensure the data remains unchanged.

  ```python
  my_tuple = (1, 2, 3, "apple")
  ```
<br>

- **Dictionary**: A dictionary is a collection of key-value pairs. It is unordered, and keys must be unique. Dictionaries are created using curly braces `{}`. They allow quick lookups of values based on their keys.

  ```python
  my_dict = {"name": "Alice", "age": 25}
  ```
<br>
Each of these data structures serves different purposes depending on the need for mutability, order, or the relationship between keys and values.

In [None]:
## Creating a list that contains both integers and a string
my_list = [1, 2, 3, "apple"]

In [None]:
## Creating a tuple that contains both integers and a string
my_tuple = (10, 20, 30, "pear")

In [None]:
## Creating a dictionary with key-value pairs: 
## name (string) and age (integer)
my_dict = {"name": "Alice", 
           "age": 25}

### Indexng and Slicing

You can access elements in both a list and a tuple by their index, starting from 0 for the first element. Additionally, you can use negative indexing to access elements from the end of the list or tuple, where `-1` refers to the last element.

#### Indexing

Used to extract specific values.

In [None]:
## Access the first element (index 0)
## Output: 1
print("This is the first element in the list:", my_list[0])

In [None]:
## Access the last element (index -1)
## # Output: apple
print("This is the last element in the list:", my_list[-1])

In [None]:
## Access the first element (index 0)
## Output: 10
print("This is the first element in the tuple:", my_tuple[0])

In [None]:
## Access the last element (index -1)
## Output: pear
print("This is the last element in the tuple:", my_tuple[-1])

#### Slicing

Used to extract subsets.

In [None]:
## Extracts a sublist from index 1 (inclusive) 
## to index 3 (exclusive)
my_list[1:3]

In [None]:
## Extracts all elements from index 2 to the end of the list
my_list[2:]

In [None]:
## Extracts all elements from the start of the list up to 
## (but not including) index 2
my_list[:2]

#### Dictionaries

Dictionaries store data as key-value pairs, and you access values by their keys, not by index.

In [None]:
my_dict

In [None]:
## Access the value associated with the key name
## Output: Alice
print("The value associated with the key 'name' is", my_dict["name"])  

## Access the value associated with the key age
## # Output: 25
print("The value associated with the key 'age' is", my_dict["age"])

The `.keys()` method retrieves all the keys from the dictionary `my_dict`, allowing you to access them without their associated values. This can be useful when you need to iterate over or manipulate the keys separately. Since `.keys()` returns a view object, it dynamically reflects any changes made to the dictionary.

**Note:** A dynamic view object is a special way of looking at a dictionary's data in Python. It updates automatically whenever the dictionary changes.

In [None]:
my_dict.keys()

In [None]:
## Looping through each key in the dictionary my_dict
for key in my_dict.keys():
    
    ## Print the current key
    print(key)

The `.values()` method retrieves all the values from the dictionary `my_dict`, allowing you to access them without their corresponding keys. It returns a dynamic view object that reflects any changes made to the dictionary. This is useful when you need to iterate over or manipulate the values independently.

**Note:** A dynamic view object is a special way of looking at a dictionary's data in Python. It updates automatically whenever the dictionary changes.

In [None]:
my_dict.values()

In [None]:
## Looping through each value in the dictionary my_dict
for value in my_dict.values():
    
    ## Print the current key
    print(value)

The `items()` method gives us both the key and its corresponding value at the same time. Inside the `for` loop, we use the `print()` function to display the key and the value together in the format 

```python
"key: value"
```

This is a useful way to see both the key and its value from a dictionary, instead of just one or the other.

In [None]:
my_dict.items()

In [None]:
## Looping through each key-value pair
## in the dictionary 'my_dict'
for key, value in my_dict.items():
    
    ## Print the current key-value pair
    print(key, ":", value)

### Other List Operations

#### Membership

In [None]:
3 in [1, 2, 3]

#### Concatenation 

In [None]:
[1, 2, 3] + 2

In [None]:
[1, 2, 3] + [4, 5, 6]

#### Repetition

In [None]:
[1, 2, 3] * 2

## Vectorized Operations

### NumPy

#### What is `NumPy`?

- `NumPy` is a Python library used for working with arrays.

- It also has functions for working in domain of linear algebra, fourier transform, and matrices.

- `NumPy` was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

- `NumPy` stands for Numerical Python.

#### Why Use `NumPy`?

- In Python we have lists that serve the purpose of arrays, but they are slow to process.

- `NumPy` aims to provide an array object that is up to 50x faster than traditional Python lists.

- The array object in `NumPy` is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

**Source:** [W3Schools](https://www.w3schools.com/python/numpy/numpy_intro.asp)

We need to import `NumPy` because it doesn't come pre-installed with Python by default. While Python provides built-in data structures like lists, `NumPy` offers more specialized, efficient tools for numerical and scientific computing. To access these features, you must explicitly install and import the `NumPy` library.

In [1]:
## Import the NumPy library and give it the alias np for 
## easier reference.
## If you don't use the alias, you would need to type numpy 
## each time you call a NumPy function.
## For example, without the alias, you'd write numpy.array()
## instead of np.array().


In [None]:
one_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
one_to_ten

We can create a `Numpy` array using the `np.array` command and using a list as the function parameter. We can enter the list manually, or we can enter the name of a list that has already been defined.

To create a`NumPy` array, use the `np.array()` function, which converts a list of numbers (e.g., 1 to 10) into a structured array. Unlike Python lists, `NumPy` arrays support vectorized operations and array-specific methods for computation and data manipulation.

In [None]:
## Creating a NumPy array from a list of numbers (1 to 10)
np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [None]:
np.array(one_to_ten)

Now we can perform artihmetic operations on each item in the array.

Run the code cell below. What do you notice?

In [None]:
np.array(one_to_ten) + 2

Run the code cell below. What do you notice?

In [None]:
np.array(one_to_ten) * 2

#### Basic `NumPy` Methods

In [None]:
## Convert the 'one_to_ten' list into a NumPy array and store it in the variable arr.
arr = np.array(one_to_ten)

## Print the array and a message indicating that we are about to calculate the square of each element.
## Use np.square() to compute the square of each element in the array and print the result.
print("The square of each element in the array: ", arr)
print(np.square(arr))

## Print a newline for better formatting.
print("\n")

## Print the array and a message indicating that we are about to calculate the square root of each element.
## Use np.sqrt() to compute the square root of each element in the array and print the result.
print("The square root of each element in the array: ", arr)
print(np.sqrt(arr))

## Print a newline for better formatting.
print("\n")

## Print the array and a message indicating that we are about to calculate the natural logarithm of each element.
## Use np.log() to compute the natural log of each element in the array and print the result.
print("The natural log of each element in the array: ", arr)
print(np.log(arr))

## Wolrd Population

The `world_population.csv` dataset contains the world population estimates from 1950-2024. 

**Example 1.** Load the `world_population.csv` file (located in the data directory) into a 1-dimensional NumPy array using the `loadtxt()` function from the `NumPy` module.

Remember, to use functions from the `NumPy` module, we must reference the alias that was set up during the import statement.

**Note:** Replace the ellipses (`...`) with your code.

In [None]:
wp = ...

Let’s examine the first 10 observations. We can access elements in a `NumPy` array using bracket notation `[ ]`, just like how we access elements in an R vector.

**Example 2.** Use a `for` loop to print the year and its corresponding population from the `wp` array.

In [None]:
## Initialize the variable 'year' with the starting value of 1950.
year = 1950 

## Loop through the first 10 indices (i.e., the first 10 years) 
## using the range() sequence.
for i in range(10):
    ## For each iteration, print the year (starting from 1950) 
    ## and the corresponding population value from 'wp' array.
    ## The "\t" adds a tab space between the year and the population
    ## value for better formatting.
    print(year + i, "\t", wp[i])

**Note:** Although `range()` behaves similarly to a function when used in loops or for generating numbers, it actually returns a range object, which is a special sequence type rather than a list or tuple. For example,

```python
range(start, stop, step)
```

generates a sequence of numbers based on the specified parameters.
It does not return a list but instead returns a range object that produces numbers on demand.

Run the cell below.

In [None]:
for i in range(5):
    print(i)

In [5]:
print("Three - that's the magic number.")
for i in range(3, 33, 3):
    print(i)

Three - that's the magic number.
3
6
9
12
15
18
21
24
27
30


**Example 3.** Repeat the code from **Example 3.**, but modify it to display the last 10 years in the `wp` dataset.

**Note:** Replace the ellipses (`...`) with your code.

In [None]:
...

**Example 4.** Using programming, calculate the population difference between the first and last year in the dataset.

**Note:** Replace the ellipses (`...`) with your code.

In [None]:
...

`NumPy` offers many built-in functions for processing data in both single and multi-dimensional arrays.

If we wanted to calculate the difference in population between successive years, you might initially think to write a for loop like this:

```python
for i in range(len(wp) - 1):
    print(wp[i+1] - wp[i])
```
Run the code cell below to see how it works.

In [None]:
## Determine the number of years in the dataset, 
## subtracting 1 to avoid an out-of-bounds error
n = len(wp) - 1

## Loop through the years, stopping one year before the last
for i in range(n):
    
    ## Calculate the difference between consecutive 
    ## population values
    pop_diff = wp[i+1] - wp[i]
    
    ## Only print the first 10 differences
    if i < 10:
        print(pop_diff)

However, `NumPy` provides a built-in function, `np.diff()`, that efficiently calculates the differences between consecutive elements in an array and returns a new array with the results. In contrast, the manual approach (shown above) requires first initializing an empty `NumPy` array and then iterating through the data to compute and store each difference individually.

Run the cell below.

In [None]:
np.diff(wp)

In general, it's better to use a function or method instead of a `for` loop when available, as it is often more efficient and concise.