# Lecture 5



In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

---

# 1. Arrays

Arrays are ordered "lists" of elements that can be directly accessed by location.

In [None]:
my_array = make_array(1, 2, 3, 4)
my_array

In [None]:
my_array ** 2

In [None]:
my_array+make_array(1, 2, 3, 4)

In [None]:
my_array+make_array(1, 2, 3)

In [None]:
my_array*make_array(2)

## Arrays of Different Data Types

**Exercise:** Arrays can be any type. Make an array of `Strings` called `string_array`:

<details><summary>Solution</summary>
   
```python
string_array = make_array("cat", "dog", "bird")
string_array
```
    
</details></br></br>


**Exercise:** Mixing types (Strings, Numbers, Booleans).  Make an array of multiple types:

<details><summary>Solution</summary>
   
```python
weird_array = make_array("cat", 3, True)
weird_array
```
    
</details></br></br>

What is the type of `weird_array`?

In [None]:
weird_array = make_array("cat", 3, True)
weird_array

</br></br>
### Ranges

We use ranges to make arrays of number sequence easily.  The numpy `np.arange(start, stop, step)` function produce an array starting at `start` and ending *before* `stop`, in increments of `step`.

**Exercise:** Make an array of the nubmers 0 through 6:

Can we write it shorter?

**Exercise:** What will the following produce:

```python
np.arange(40, -1, -5) 
```

</br></br></br>

## Accessing Elements

For this exercise lets start with this array of strings.

In [None]:
string_array = make_array("cat", "dog", "bird")
string_array

You can use `array_name.item( NUMBER )` to get an element from an array.

**Exercise:** What will the following expression return?

```python
string_array.item(1)
```

**Bonus!** This is called **array indexing**.  There is a shorter "equivalent" syntax that people will often use. However, for this class you only need to know about `.item()`.

```python
string_array[ INDEX ]
```

**Negative indexing**.  

```python
string_array[ -1 ]
```

In [None]:
string_array[ -1 ]

**Exercise:** Use the `len` function to determine the length of the string array.

<details><summary>Solution</summary>
   
```python
len(string_array)
```
    
</details></br></br>

Arrays also have a **member variable** `array_name.size` that contains the size of the array.  

**Exercise:** Use the size **member variable** to check the size of the array:

<details><summary>Solution</summary>
   
```python
string_array.size
```
    
</details></br></br>

</br></br></br>

## Aggregation Operations

You will often need to compute summaries of an array like the `sum`, `max`, or the `min`.  These are all **member functions** of an array.  Here is the documentation on all the **[member functions](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)** for arrays.

In [None]:
cool_numbers = make_array(0, 1, 42, np.pi, np.e)
cool_numbers

**Exercise:** Use the `sum`, `min`, `mean`, and `max` operations to summarize the cool numbers array.

<details><summary>Solution</summary>
   
```python
print("sum", cool_numbers.sum())
print("min", cool_numbers.min())
print("mean", cool_numbers.mean())
print("max", cool_numbers.max())
```
    
</details></br></br>

You can also use numpys built-in library of math functions on arrays.  Here we compute the `mean` and the `log`:

In [None]:
print("np.average", np.average(cool_numbers))
print("np.mean", np.mean(cool_numbers))

In [None]:
print("np.log", np.log(cool_numbers))

</br></br></br>

## Doing math with arrays

You can do mathematical operations on arrays:

In [None]:
a = make_array(1, 2, 3, 4)
b = make_array(10, 20, 30, 40)
print("The a array:", a)
print("The b array:", b)

**Exercise:** Add and multiply the arrays: 

<details><summary>Solution</summary>
   
```python
print("Adding Arrays", a + b)
print("Multiplying Arrays", a * b)
```
    
</details></br></br>

You can also add and multiply scalars

</br></br></br>

## Common Bugs

**Exercise:** What happens if we run the following:

```python
bigger_array = make_array(1,2,3,4,5)
a * bigger_array
```

In [None]:
bigger_array = make_array(1,2,3,4,5)
a *bigger_array

**Exercise:** What happens if I run the following:

```python
uhoh = make_array(0,1,2,3)
a / uhoh
```

In [None]:
# uhoh = make_array(0,1,2,3)
# a / uhoh

</br></br></br></br>

---

# Tables are Made of Arrays

We are covering arrays because this is the mathematical object that is returned when we work on specific columns of a table. Here we load a table of NBA salaries from a local file `nba_salaries.csv`.

In [None]:
nba = Table.read_table('nba_salaries.csv')
nba

Let's focus on the **Golden State Warriors**.

**Exercise:** Use the `my_table.where` function to select the rows where team is the `"Golden State Warriors"`.

In [None]:
warriors=nba.where('team',"Golden State Warriors")

<details><summary>Solution</summary>
   
```python
warriors = nba.where("team", "Golden State Warriors")
warriors
```
    
</details></br></br>

We can also select columns by name. 

**Exercise**: Make a table with just the `"name"` and `"salary"` columns. 


In [None]:
warriors.select(['name','salary'])

<details><summary>Solution</summary>
   
```python

warriors.select("name", "salary")
```
    
</details></br></br>

**Exercise:** Compute the average average salary of the warriors.  Which of the following works?

*Option (A):*
```python
warriors.mean()
```

*Option (B):*
```python
warriors.select("salary").mean()
```

*Option (C):*
```python
warriors.column("salary").mean()
```



In [None]:
warriors.select("salary").mean()

**Exercise:** Would the following work?

```python
np.average(warriors.select("salary"))
```

What about?

```python
np.average(warriors.column("salary"))
```

In [None]:
np.average(warriors.column("salary"))

Why?

**Exercise:** Use `np.average` to compute the average salary of the Warriors:

<details><summary>Solution</summary>
   
```python
np.average(warriors.column("salary"))
```
    
</details></br></br>

**Exercise:** Compute the difference in the average salaries of the warriors and the `"Los Angeles Lakers"`.

<details><summary>Solution</summary>
   
```python
lakers = nba.where('team', 'Los Angeles Lakers')
warriors.column('salary').mean() - lakers.column('salary').mean()
```
    
</details></br></br>

## Creating a Table from Arrays

Let's start with an array of street names.

In [None]:
streets = make_array('Bancroft', 'Durant', 'Channing', 'Haste')
streets

We can make an empty table (no rows, no columns, no problems ...).

The `Table()` function makes an empty table.

In [None]:
empty_table = Table()
empty_table

**Exercise:** Check that the empty table has 0 rows and 0 columns

<details><summary>Solution</summary>
   
```python
print("Rows:", empty_table.num_rows)
print("Cols:", empty_table.num_columns)
```
    
</details></br></br>

**Exercise:** Use the `table.with_column` function to add a column to the table and save the new table as `southside`.

In [None]:
southside = empty_table.with_column(<>,<> )
southside

<details><summary>Solution</summary>
   
```python
southside = empty_table.with_column("Streets", streets)
southside
```
    
</details></br></br>

**Exercise:** Extend the southside table to include the blocks from campus (use `np.arange`). ([map](https://goo.gl/maps/7QcNgpRC52NHbM6o7))

Bancroft, Durant, Channing and Haste sreet are respectively 0,1,2 and blocks away from the campus

**Hint:** Use with_column

<details><summary>Solution</summary>
   
```python
southside = southside.with_column('Blocks from campus', np.arange(4))
southside
```
    
</details></br></br>

**Exercise:** Build the entire table with blocks from campus in one call to the **`table.with_columns`** function.

Note that **with_columns** function is different from **with_column** function

In [None]:
help(Table.with_columns)

<details><summary>Solution</summary>
   
```python
Table().with_columns(
    'Streets', streets,
    'Blocks from campus', np.arange(4)
)
```
    
</details></br></br>

# Case Study: Understanding the [W. E. B. Du Bois](https://en.wikipedia.org/wiki/W._E._B._Du_Bois) Visualization

![Picture from Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg/167px-W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg)

**From Wikipedia:**  *William Edward Burghardt Du Bois (/djuːˈbɔɪs/ dew-BOYSS;[1][2] February 23, 1868 – August 27, 1963) was an American sociologist, socialist, historian, and Pan-Africanist civil rights activist. Born in Great Barrington, Massachusetts, Du Bois grew up in a relatively tolerant and integrated community. After completing graduate work at the University of Berlin and Harvard University, where he was the first African American to earn a doctorate, he became a professor of history, sociology, and economics at Atlanta University. Du Bois was one of the founders of the National Association for the Advancement of Colored People (NAACP) in 1909.*

For more context on the visualization in lecture checkout [Du Bois’ Data Portraits Tell A Story About Black Life In Georgia And Beyond](https://www.wabe.org/du-bois-data-portraits/)



In [None]:
du_bois = Table.read_table('du_bois.csv')
du_bois

**Exercise:** Compute the actual amount of money spent on food and add it to the table and add it to the table as `"FOOD $"`:

**Hint:** Use with_columns and sort function

<details><summary>Solution</summary>
   
```python
du_bois = du_bois.with_columns(
    "FOOD $", du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD'))
du_bois
```
    
</details></br></br>

</br></br></br>


**Exercise:** Use the table functions we learned this week to find the income bracket ("class") that spent the most money on rent.

<details><summary>Solution</summary>
   
```python

du_bois = (
    du_bois
        .with_columns("RENT $", 
            du_bois.column("RENT") * du_bois.column("ACTUAL AVERAGE"))
        .sort("RENT $", descending = True)
)

```
    
</details></br></br>

In [None]:
du_bois.select("ACTUAL AVERAGE", "RENT $", "FOOD $").iscatter("ACTUAL AVERAGE", s=12)

In [None]:
?Table.iscatter