# Run the cell below

To run a code cell (i.e.; execute the python code inside a Jupyter notebook) you can click the play button on the ribbon underneath the name of the notebook that looks like ▶| or hold down `Shift` + `Return`.

Before you begin run the code cell below.

In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("dsc201_001_003_a5.ipynb")

## Jupyter notebooks

This webpage is called a Jupyter notebook. A notebook is a place to write programs and view their results. We can also to write and format text using markdown.

As for working in notebooks, there are two types of cells (each rectangle containing text or code is called a **cell**): 

 - **code** cells that hold executable code.
 
 - **markdown** cells that hold a special kind of text that follows the markdown syntax. To get familiar with the markdown syntax, take a look at this [Markdown Gudie](https://www.markdownguide.org/basic-syntax/). Markdown cells (like this one) can be edited by double-clicking on them. After you edit a markdown cell, click the "Run cell" button at the top that looks like ▶| or hold down `Shift` + `Return` to confirm any changes. 

   **Note:** Try not to delete the instructions of the assignment.

## Markdown Cells

Markdown cells are sometime referred to as *text* cells. In the text cell below enter your name, section, and the date.

**Note:** After you make changes to the text cell don't forget to click the "Run cell" button at the top that looks like ▶| or hold down `Shift` + `Return` to view the changes.

**Name:** 

**Section:** 

**Date:**

## This Week's Assignment

In this week's assignment, you'll learn how to:

- use assignment statements.

- create a list, a tuple and a dictionary.

Let's get started!

## Edit Mode vs. Command Mode 

Jupyter Notebook has a modal user interface. This means that the keyboard does different things depending on which mode the Notebook is in. There are two modes: "Edit mode" and "Command mode". Edit mode allows you to type into the cells like a normal text editor. Command mode allows you to edit the notebook as a whole, but not type into individual cells.

<img src='images/edit-mode.png' height=200 width=500>

## The Kernel

The kernel is a program that executes the code inside your notebook and outputs the results. In the top right of your window, you can see the name `(DSC201)` and a circle that indicates the status of your kernel. If the circle is empty (⚪), the kernel is idle and ready to execute code. If the circle is filled in(⚫), the kernel is busy running some code. 

Next to every code cell, you'll see some text that says `In [...]`. Before you run the cell, you'll see `In [ ]`. When the cell is running, you'll see `In [*]`. If you see an asterisk (\*) next to a cell that doesn't go away, it's likely that the code inside the cell is taking too long to run, and it might be a good time to interrupt the kernel (discussed below). When a cell is finished running, you'll see a number inside the brackets, like so: `In [1]`. The number corresponds to the order in which you run the cells; so, the first cell you run will show a 1 when it's finished running, the second will show a 2, and so on. 

You may run into problems where your kernel is stuck for an excessive amount of time, your notebook is very slow and unresponsive, or your kernel loses its connection. If this happens, try the following steps:

1. At the top of your screen, click **Kernel**, then **Interrupt Kernel**. Trying running your code again.

1. If that doesn't help, click **Kernel**, then **Restart Kernel**. If you do this, you will have to run your code cells from the start of your notebook up until where you paused your work.

## Errors

Whenever you write code, you'll make mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong. Errors are okay; even experienced programmers make many errors.  When you make an error, you just have to find the source of the problem, fix it, and move on.

There is an error in the next cell.  Run it and see what happens.

In [None]:
print("This line is missing something."

**Note:** In the toolbar, there is the option to click `Run > Run All Cells`, which will run all the code cells in this notebook in order. However, the notebook stops running code cells if it hits an error, like the one in the cell above.

You should see something like this (minus our annotations):

<img src="images/error.jpg" />

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you have created an illegal structure.  "`incomplete input`" means "somehing is missing" so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

**Question 1.** Fix the code below so that you can run the cell and see the intended message instead of an error.

In [None]:
print("This line is missing something."

## Variables

Variables are used to store information in Python programs. To store a value in a variable we use an assignment statement. An assignment statement has a name on the left side of an `=` sign and an expression to be evaluated on the right. They are particularly helpful when you need to calculate many different things in a row, and also when you need one value to help calculate another.

- Variable names can contain uppercase and lowercase letters, the digits 0-9, and underscores.

- Variable names can not start with a number.

- Variables names are case sensitive

- Variables that do not have values yet cause errors when called.

## Data Types

- Integer (`int`) are numbers w/o decimals

- Float (`float`) are numbers with decimals

- String (`str`) are used to store text

**Question 2.** Assign a value to each variable. Be sure to use the appropriate data type based on the comments and the name of the variable.

In [None]:
# An integer
my_int = ...

# A float (decimal)
my_float = ...

# A string
my_string = ...

# Print output using the print function
print(my_int)
print(my_float)
print(my_string)

In [None]:
grader.check("q2")

**Question 3.** We set a variable `pi` to the value 3.14 and another variable `r` to 5. On a following line, assign the variable `sa` to a Python expression that evaluates to the surface area of a sphere with radius 5. **Do not** use the numbers `3.14` or `5` in your expression for `sa`.

**Hint**: The formula for the surface area of a sphere with radius $r$ is: 

$$\text{Surface Area} = 4 \pi r^2$$

In [None]:
pi = ...
r = 5

sa = ...
sa

In [None]:
grader.check("q3")

**Question 4.** Assign the name `seconds_in_a_decade` to the number of seconds between midnight January 1, 1990 and midnight January 1, 2000.

In [None]:
seconds_in_a_decade = ...

# The last line in this cell will print the value
# you've given to seconds_in_a_decade when you run it.  
# You don't need to change this.
seconds_in_a_decade

In [None]:
grader.check("q4")

## Data Structures

- List

- Tuple

- Dictionary 

### List

- A list is used to store multiple values in a single value or variable. 

- To create a list use square brackets.

- Lists can contain elements of different types.

- List items are indexed beginning with 0.

In [None]:
# List of random words
random_words = ["apple", "banana", "cherry", "doggie", "elephant", 
                "flower", "grape", "hamburger", "ice cream"]

The first item in the list has index position 0. Run the cell below to see.

In [None]:
# The first item in the random_words list with index value 0
random_words[0]

The second item in the list has index position 1. Run the cell below to see.

In [None]:
# The second item in the random_words list with index value 1
random_words[1]

## `for` Loop

A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string). This is less like the for keyword in other programming languages, and works more like an iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.

In [None]:
for i in random_words:
    print(i)

In [None]:
len(random_words)

In [None]:
range(9)

In [None]:
range(len(random_words))

In [None]:
for i in range(len(random_words)):
    print(i, random_words[i], '\t', i + 1)

Lists are a flexible data structure that allow us to store and process collections of data. However, there arithmetic limitations of Python lists. Let's take a look.

In [None]:
# List of number from 1 to 10
one_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
one_to_ten + 4

In [None]:
one_to_ten * 2

## NumPy

### What is `NumPy`?

- `NumPy` is a Python library used for working with arrays.

- It also has functions for working in domain of linear algebra, fourier transform, and matrices.

- `NumPy` was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

- `NumPy` stands for Numerical Python.

### Why Use `NumPy`?

- In Python we have lists that serve the purpose of arrays, but they are slow to process.

- `NumPy` aims to provide an array object that is up to 50x faster than traditional Python lists.

- The array object in `NumPy` is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

**Source:** [W3Schools](https://www.w3schools.com/python/numpy/numpy_intro.asp)


In [None]:
# Import Numpy using the alias np
import numpy as np

In [None]:
one_to_ten

We can create a `Numpy` array using the `np.array` command and using a list as the function parameter. We can enter the list manually, or we can enter the name of a list that has already been defined.

In [None]:
np.array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [None]:
np.array(one_to_ten)

Now we can perform artihmetic operations on each item in the array.

In [None]:
np.array(one_to_ten) + 2

In [None]:
np.array(one_to_ten) * 2

In [None]:
arr = np.array(one_to_ten)

print("The sqaure of each element in the array", arr)
print(np.square(arr))
print("\n")

print("The sqaure root of each element in the array", arr)
print(np.sqrt(arr))
print("\n")

print("The natural log of each element in the array", arr)
print(np.log(arr))

For this part of our active demonstration, we will use world population estimates from 1951 through 2023.

**Question 5.** Load the `world_population.csv` file into a 1-dimensional `Numpy` array using `loadtxt()` command from the `Numpy` module. 

**Note:** To use functions from te `Numpy` module we need to use the alias we set up in our imoprt statement.

In [None]:
world_population = ...

In [None]:
grader.check("q5")

Let's look at the first 10 observstions. We can access items in a `Numpy` array the same way we access items in an `R` vector; by using bracket notation `[ ]`.

In [None]:
world_population[0:10]

**Question 6.** Which years are represented in the abouve output?

_Type your answer here, replacing this text._

**Question 7.** Use the `np.diff` function to find the difference in population between each consecutive year. Save the result to a 1-dimensional array named `diff_world_population`.

In [None]:
diff_world_population = ...
diff_world_population[0:10]

In [None]:
grader.check("q7")

### Tuple

- A tuple is used to store multiple values in a single value or variable. 

- To create a tuple use parenthesis.

- Tuples can contain elements of different types.

In [None]:
tuple_1 = ("NC", "State", "Wolfpack")
tuple_2 = ("Fall", 2023)

### Dictionary

- A collection of key-value pairs.

- A key is used to look up values.

- A value can be numbers, strings, lists, or even other dictionaries.

In [None]:
# Dictionary of NFL teams in the AFC Conference. The key is the division name as a string and
# the value is a list with the name of each team in that division.
afc = {
    "East":  ["Buffalo Bills", "Miami Dolphins", "New England Patriots", "New York Jets"],
    "North": ["Baltimore Ravens", "Cincinnati Bengals", "Cleveland Browns", "Pittsburgh Steelers"],
    "South": ["Houston Texans", "Indianapolis Colts", "Jacksonville Jaguars", "Tennessee Titans"],
    "West":  ["Denver Broncos", "Kansas City Chiefs", "Las Vegas Raiders", "Los Angeles Chargers"],
}

# Dictionary of NFL teams in the NaFC Conference. The key is the division name as a string and
# the value is a list with the name of each team in that division.
nfc = {
    "East":  ["Dallas Cowboys", "New York Giants", "Philadelphia Eagles", "Washington Football Team"],
    "North": ["Chicago Bears", "Detroit Lions", "Green Bay Packers", "Minnesota Vikings"],
    "South": ["Atlanta Falcons", "Carolina Panthers", "New Orleans Saints", "Tampa Bay Buccaneers"],
    "West":  ["Arizona Cardinals", "Los Angeles Rams", "San Francisco 49ers", "Seattle Seahawks"],
}

We can access all the keys with the `keys()` method and we can access all the values with the `values()` method.

In [None]:
afc.keys()

In [None]:
afc.values()

**Question 8.** Access the values from the `nfc` dictionary that correspond to the South division. Save this to a list named `nfc_south`. 

In [None]:
nfc_south = ...
nfc_south

In [None]:
grader.check("q8")

**Question 9.** Access the Carolina Panthers name from the `nfc_south` list. Save the name to `panthers`.

In [None]:
panthers = ...
panthers

In [None]:
grader.check("q9")

## SchoolHouse Rock

As a kid, growing up in the 80s, one of my most favorite things to do on Saturday morning was watch [SchoolHouse Rock](https://en.wikipedia.org/wiki/Schoolhouse_Rock!). My character/episode was [Verb!](https://www.schoolhouserock.tv/Verb.html). 

Run the cell below.

In [None]:
from IPython.display import YouTubeVideo

# The YouTube video ID
video_id = 'IrfZCvTe-Ko?si'

# Embed the YouTube video
YouTubeVideo(video_id)

Run the cell below to read the lyrics.

In [None]:
# Open the verb.txt file in the data directory as read-only
with open('data/verb.txt', 'r') as file:
    
    # Read the entire content of the file into a string
    f = file.read()

# Print the contents of the file
print(f)

**Question 10.** Suppose you wanted to count the number of occurences of each word and store that information in a data structure. What kind of data type and data structure would you consider using? Type your response in the markdown cell below. Limit your discussion to no more than a paragraph of fewer than 300 words.

**Note:** There are no wrong answers. I just want to know your thoughts about how you would solve this problem.

_Type your answer here, replacing this text._

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

When done exporting, download the .zip file by `SHIFT`-clicking on the file name and selecting **Save Link As**. Or, find the .zip file in the left side of the screen and right-click and select **Download**. You'll submit this .zip file for the assignment in Moodle to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)