## Data lesson 2

Today we will learn about more advanced data structures: lists, arrays, and floats

Go ahead and import numpy and math for today's lesson

In [None]:
# Add import statements

#### **Finishing up lesson 1**

You can use the built-in `help()` function to get more information about module or a specific function.

In [None]:
help(math)

You should see a lot of information about the `math` module, and also a link you to the online documentation.

If you are done looking at the help information for now, but would like to keep it handy for later, you can click the vertical blue bar to the left of the output to collapse the output.  If you click it a second time, it will expand again.

*Go back up to your `help(math)` cell and collapse the output.*

You can also use `help()` on an individual function.

*Try out the example below.  What does `log10()` do?*

In [None]:
help(np.log10)

It is convenient to use exponential notation for expressing very large or very small numbers.  To do so, use `eX` to indicate "times 10^X".  

In [None]:
bignumber = 1.75e10
bignumber

In [None]:
smallnumber = 1.75e-10
smallnumber

#### **Lists**

Lists are one way of storing multiple values in a single variable. They are defined using brackets `[]` with individual **elements** separated by commas.

In [None]:
masses = [1.01, 4.00, 6.94, 9.01, 10.81, 14.01]
masses

Lists are very flexible, in that a single list can "hold" multiple data types

In [None]:
mix_list = ["Berkeley", masses, 6.022e23, "gobears"]
mix_list

You can fetch a particular element using its **index**, which indicates its position in the list.

The syntax for this is `list_name[index]`

In [None]:
masses[1]

In [None]:
mix_list[2]

You may have noticed that an index=1 gave us the second element in `masses`.

An index=2 gave us the third element in `mix_list`

This is because **index numbering starts with 0!**

(It's like how the 2nd floor of Latimer is the same level as the 1st floor of Lewis.  Lewis uses 0-indexing and Latimer uses 1-indexing)

*Fetch the element* `"gobears"` *from* `mixlist`

In [None]:
# Add code here

Keep in mind that ordering is very important for lists.  Lists remember the order of elements they contain, and you use this position to access elements individually

You can retrieve multiple elements from a list simultaneously using list **slicing**.  

The syntax for this is `[start:stop]`, where `start` is included and `stop` is excluded.

If you ask for elements `[1:4]`, you will get the elements with indices 1, 2, and 3.

In [None]:
masses[2:5]

In [None]:
masses[0:3]

It is also valid to do:

* `[:stop]` : provides all elements up until the stop index (not inclusive)

* `[start:]` : provides all elements after the start index (inclusive)

*Retrieve the atomic masses for all elements above helium (4)*

In [None]:
# Add code here

*Retrieve the atomic masses for all elements below nitrogen (14)*

In [None]:
# Add code here

We can measure the length of a list using the built-in function `len`

In [None]:
len(masses)

Lists also have functions associated with them for performing common tasks.

These functions which are specifically attached to lists are called list **methods**.

One very useful one is `append()` which is used to add a new element to the end of a list

Carbon is missing from our original list of masses, let's add it in now

In [None]:
masses.append(12.01)
masses

Now our list is out of order, which could be a problem for us.  We can use `sort()` to put it in order

In [None]:
masses.sort()
masses

You can also delete specific elements with `remove()`

*Remove H from the list*

In [None]:
# Add code here

You can find other list methods here: https://www.w3schools.com/python/python_lists_methods.asp

For our purposes, it is important to know that:
* Lists are a basic container that we can use to store values
* They are flexible: they can store different data types and can change sizes
* We can access individual elements of a list using indexing
* `append` allows us to add new elements to a list.  This will become useful when we encounter `for` loops!

*Create a list containing the names of at least 4 different fruits*

*Print the element corresponding to your favorite fruit using its list index*

*Sort your list of fruit alphabetically*

*Remove the element corresponding to your least favorite fruit*

#### **Dictionaries**

Dictionaries also data structures that allow us to store multiple values.

Unlike lists, they are **unordered**.

We keep track of different values by associating them with keys; these duos are called **key:value pairs**

Dictionaries are useful when we want to access stored values with a key instead of using positional order

In [None]:
mass_dict = {'H':1.01, 'He':4.00, 'Li':6.94, 'Be':9.01, 'B':10.81, 'C':14.01, 'N':14.01, 'O':16.00, 'F':19.00, 'Ne':20.18}
mass_dict

We can see that each value is explicitly linked to its key.

Similar to how we retrieve an element in a list using its index, we retrieve a value from a dictionary using its key

In [None]:
mass_dict['O']

We can also overwrite values and add new key:value pairs

In [None]:
mass_dict['C'] = 12.01
mass_dict

In [None]:
mass_dict['Bk'] = 247
mass_dict

If you don't know the keys ahead of time, there is a method for that

In [None]:
mass_dict.keys()

While keys are often strings, they can also be numbers.

Values can be any data type, including lists.

In [None]:
atomic_numbers = {1: "H", 2: "He", 3: "B", 4: "Li"}
atomic_numbers

In [None]:
periodic_table = {'group_one' : ['H', 'Li', 'Na', 'K', 'Rb', 'Cs', 'Fr'], 'group_eight':['He', 'Ne', 'Ar', 'Kr', 'Xe', 'Rn']}
periodic_table

*Use the dictionary above to find the element in row 4 and group 8 of the periodic table*

In [None]:
# Add code here

*Create a new dictionary containing element names as keys and their atomic symbol as values.*

In [None]:
# Add code here

#### **Arrays**

Arrays are (biased opinion) my favorite data structure!

They can store multidimensional data and are in general excellent for scientific calculations.

Array operations are performed using `numpy`, who we have already met.

Creating a 1-dimensional array is similar to making a list.  Note though that each array can only hold elements with a single data type.

In [None]:
data = np.array([0.5,1.7,0.6,0.2,0.4])
data

You can also make an array using a list that is already defined

In [None]:
int_list = [0,1,2,3,4]
int_array = np.array(int_list)
int_array

We can also use functions to generate arrays.  Here are two examples that generate sequences.

In [None]:
# Arguments are start, stop, number of elements
a1 = np.linspace(0,10,11)
a1

In [None]:
# Arguments are start, stop, step size
a2 = np.arange(0,11,1)
a2

*Create an array of numbers from 17 to 19 with a step size of 0.2*

In [None]:
# Add code here

The reason that arrays are so popular is that they allow you to propagate operations through all elements in the array

In [None]:
a2 * np.pi

We can also do matrix operations using multiple arrays.

In [None]:
a1 + a2

In [None]:
a1 * a2

Imagine doing this with a list.  You would have to step through each element individually and do the addition 10 times.

While you could use a `for` loop (as we will learn about later), it is still much more efficient to use array operations.

Like in normal matrix math, the arrays have to be the correct dimensions for array operations to work

In [None]:
a3 = np.ones(5)
print(a3)

a3 + a1

*Create a new array using `linspace()` that you can add to `a3`.*

In [None]:
# Add code here

Arrays can also be indexed or sliced similar to lists

In [None]:
# select a single element
a1[3]

In [None]:
# select an interval 
a1[2:5]


Numpy has numerous array methods.  Here is a list of useful ones to be familiar with.

 - `np.sum(a)` sum of all values on the array
 - `np.min(a)` find the minimum value in the array
 - `np.argmin(a)` find position (index) of the minimum value in the array
 - `np.max(a)` find maximum value in the array
 - `np.argmax(a)` find position (index) of the maximum value in the array
 - `np.unique(a)` selects a subset of unique elements
 - `np.sort(a)` sorts the array from the maximum to the minimum value
 - `np.mean(a)` and `numpy.std(a)` compute mean and standard deviation of array values
 - `np.median(a)` computes the median value of an array.

*Create a new array containing your guess for the peak temperature each of the past 4 days.*

In [None]:
# Add code here

*Print the average, minimum, and maximum temperature from your array*

In [None]:
# Add code here

Arrays can also have multiple dimensions.

In [None]:
array2d = np.array([a1, a1*a2, a1+a2])
array2d

There are array methods that tell us the attribute of the array.

*What do each of the following do?*

In [None]:
print(array2d.ndim)
print(array2d.shape)
print(array2d.size)

There are two valid syntaxes for indexing in a two-dimensional array.  

In both cases, the first index refers to the row, and the second index to the column.

The first method is usually preferred for efficiency

In [None]:
# Brackets + comma
array2d[1,5]

In [None]:
# Two sets of brackets
array2d[1][5]

You can select an entire row or an entire column using a colon `:`

In [None]:
# all columns of row 1
array2d[1,:]

In [None]:
# all rows of column 1
array2d[:,1]

*Retrieve the element in column 4 and row 2 of array2d*

In [None]:
# Add code here

*Find the position of the maximum value in array2d*

In [None]:
# Add code here

*Find the sum of all elements in array2d*

In [None]:
# Add code here

#### **Other data types**

We will not go into detail here, but be aware that other data types exist.

* Booleans: used for Boolean logic (True/False)
* Tuples: like a list but immutable
* Sets: like a list but each element can only appear once