# Experimenting with Lists and NumPy
In this section, we'll do some **very basic** experimenting with numpy.
numpy stands for "numerical python".  We'll learn about:
- Creating Numpy Arrays
- Random distributions in Numpy
- Python topics: random distributions, list comprehensions, and types

### 1 --> Edit this cell to make the three things you are learning about into a "bullet list".  Web search for a  "markdown cheat sheet" to see how to do that.

### 2 --> Below this cell, add a new cell to ```import numpy as np```


In [1]:
import numpy as np

---
We will now use python to create a list type variable and then use it to explore list comprehensions a bit.  List comprehensions are going to be invaluable for us for managing and working with data.

#### 3 --> Add a new cell below this one to build a list comprehension
* creates a range from 0 to 9.
* converts that range to a list using the list() type cast function
* assigns that to variable `counts`

I'm going to usually provide the answer, embedded as an HTML comment, that is invisible when you run the cell. It's okay to look at the answer and type it.  Don't get stuck - if you have no clue, go ahead and peek!

**Double click here**

<!--
counts = list(range(10))
counts
-->

In [6]:
counts = list(range(10))
counts

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### 4 --> Use a list comprehension

Let's make a new variable `doubledcounts` that is a list of all of the values in the counts list, but doubled.

The syntax looks like:

> mynewvariable = `[`   *expression with iteratorvariable* `for` *iteratorvariable* `in` *list-like-thing* `]`

**Double click to see answer**
<!--
doubledcounts = [  2 * x    for    x    in   counts  ]
# and then to display the doubledcounts variable on the interactive console:
doubledcounts
-->

In [10]:
doubledcounts = [number*2 for number in counts]
doubledcounts

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

#### 5 --> Conditional "if" expressions

Syntax for a conditional expression is in the code below.  

> `( True if x >= 1000 else False )`

Note that the values True and False could be any value or any other expression:

> `( x**2 if x >= 1000 else x )`

> `( "Whoa even" if x % 2 == 0 else "odd.")`

Explore it.  Try running the cell several times by clicking on the cell and then clicking the *run* button.
    

In [34]:
#evaluating the list, needs else statement

import random
x = random.randint(0, 2000)
isBigNum =  ( True if x >= 1000 else False )
if isBigNum:
    print("Whoa its big!")
else:
    print("It's not THAT big.")

It's not THAT big.


#### 6 --> Conditional "if expressions" in list comprehensions

Now use a **conditional** expression in a list comprehension make a 
list of **boolean flags** - a vector of True / False values - to indicate which numbers are evenly divisible by 3 in the original counts.

Use the mod operator % in python to test for divisibility by 3.

Your result should look like this:

`[True, False, False, True, False, False, True, False, False, True]`

**Double Click for answer**
<!--
[ (True if x % 3 == 0 else False) for x in counts ]
-->

In [33]:
#creating a list, can just use if without else 

boolin = [(True if number%3 == 0 else False) for number in counts]
boolin

[True, False, False, True, False, False, True, False, False, True]

What python types are there?  What is the type of `True` ?

In [26]:
type(True)

bool

### List comprehension filters

You can also use a list comprehension to filter values, by putting a conditional if expression after the loop part.

> mynewvariable = `[` *expression with iter_variable* for *iter_variable* in *list-like-thing*  if something `]`

Here's an example to get a list of perfect squares:

```
import math
[ x for x in range(0,100)  if math.sqrt(x) - int(math.sqrt(x)) < 0.01 ]
```
#### 7 --> Use a conditional list comprehension to get a new list of all numbers in the counts list that evenly divisible by 3. 

**Double click for answer**
<!--
[ x for x in counts if x % 3 == 0]
-->


In [32]:
comprehensive = [x for x in counts if x%3 == 0]
comprehensive

[0, 3, 6, 9]

---
# Now for Numpy

Be sure and click the save button.

Now let's NumPy.

Remember we imported numpy as np, which is sort of the standard shortcut name used for numpy.

Whereas in Python lists, you can have a list of anythings - even mixed type of things, in NumPy, there is no "list" per se.  NumPy lets you define and use an **array**.  And an **array has things all the same type**.

Lists are heterogeneous.
Arrays are homogeneous (in most languages).  (Including NumPy Python.)

#### 8 --> Use the np.zeros function in the numpy package to get an array of ten zeros.  

You don't need to assign it to a value.

**Double click here for answer**
<!--
np.zeros(10)
-->

In [41]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

By default, NumPy makes arrays of floats.  You can also have int and bool.

#### 9 --> Try adding a *named parameter* `dtype='int'` to get a 1d array of ten ints.

Note how it will look a little different.  No decimal place.

**Double click here for answer**
<!--
np.zeros(10, dtype="int")
-->

In [51]:
np.zeros(10, dtype = 'int')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

#### 10 --> use (3,3)  - a tuple - to make a 3 by 4 array of ones. 

The first parameter ( can be just a number in your case 10)  for a size of a single dimension array.  But you can also put in a tuple for a shape.  The tuple defines the number of rows and columns.

The reason why NumPy us used so heavily in Machine Learning and data science is that the operations on homogeneous arrays of numbers can be made very fast.

Use ones, instead of zeros.

**Double click here for answer**
<!--
np.ones( (3,4), dtype='int')
-->

In [55]:
np.ones((3,4), dtype = 'int')

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

#### 11 -->  Make a three dimensional 2,3,4 array of booleans with zeros function.


How do you think you might get a three dimensional array of True values?

A **matrix** in math is a 2d array.
A multidimensional array equivalent in math is a **tensor**

**Double Click Here for answer**
<!--
np.zeros( (2,3,4),dtype="bool")
-->

In [57]:
np.zeros((2,3,4), dtype = 'bool')

array([[[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]]])

---
# Standard NumPy Array Builders

### Standard Arrays 
```
np.zeros(  shape,  dtype=int/float/bool )
np.ones(  shape,  dtype=int/float/bool )
np.arange(  start, uptobutexclude,  step )
np.linspace( start, stop, numberofvalues )
np.eye(  shape  )
np.array(  some python lists  )
np.random.normal(  mean, stddev,  numberofvalues )
np.random.random(  )
```

There are other ways to make Numpy arrays.  Try them out below.


## The arange function

What it does:
* Build an array starting at some value, and go up to but don't include a last value, by a step that defaults to 1.
* The type comes from the type of the parameters.

Example:
> np.arange(0, 20, 4)

#### 12 --> Try it.
*Build an array that starts at 0 goes up to 1.0, and goes by 0.05.


**Double Click Here for answer**
<!--
np.arange(0, 1, 0.05)
-->


In [69]:
#good for building axes on plots

np.arange(0, 1.0, 0.05)


array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

## The linspace function

What it does:
    * Build an array of a certain number of values evenly spaced out between a start and stop value.
    * The start value will be the first value, and the stop value will be the last value.
    
Example to build 5 values that start at 0 and end at 2
> np.linspace( 0, 2, 5)

#### 13 --> Try it - build an array of 12 values that go from 0 up to 2*pi.  You can get pi with np.pi

**Double Click Here for answer**
<!--
np.linspace( 0, 2*np.pi, 12)
-->



In [66]:
np.linspace(0,2*np.pi,12)


array([0.        , 0.57119866, 1.14239733, 1.71359599, 2.28479466,
       2.85599332, 3.42719199, 3.99839065, 4.56958931, 5.14078798,
       5.71198664, 6.28318531])

## Identity matrix - np.eye()
Super important.  Super simple.  Identity matrix is always a 2 by 2 so you just tell it how many rows / columns.

#### 14 --> Try it - make a 3x3 identity matrix.

**Double Click Here for answer**
<!--
np.eye(3)
-->



In [68]:
np.eye(3,3, dtype = 'int')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

## Casting Python list to numpy array

You cast, or convert, a python list to numpy array using a cast like other casts in Python:

> `np.array( some python list )`

#### --> 15 - try it.  You still have `counts` in your iPython session - make it into a numpy array.

What happens if you try to make an array out of lists that have uneven dimensionality?

What happens if you try to make an array out of a non-homogeneous list?

```
[ [1,2,3], [4,5] ]

[ True, 4, 5.0 ]
# This will make something different:
[ True, 4, 5.0, "SIX" ]

```
**Double Click Here for answer**
<!--
np.array(counts)

#Numpy does the best it can, but defaults to making an array of "object"
np.array([ [1,2,3],  4, 5.0, True ] )
-->


In [71]:
np.array(counts)


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Numpy Random

The normal distribution is critical for a huge number of applications.  You can create an array of numbers sampled from a huge number of distributions with the NumPy random module:

np.random.normal - use normal distribution	
	np.random.normal(  mean,  std-deviation,  (shape) )

Example:  Normal distribution mean 0, std dev 10, 5 values:
```
    samples = np.random.normal(0,10, 5)
    samples
```

#### --> 16.  In a cell - with markdown above it, show the IQs of 10 people.

IQs have mean 100, and standard deviation 15.

BONUS!  Sample 1000 people. How many people have above average IQ?

How many people have IQs below 85?  These people are quite mentally challenged, by the way.  And though IQ is a very loose and very crude measure of "brain power," that set of people are going to find it challenging to, for example, understand how congressional districts are drawn.  When that number gets to 75, that's the set of people that are likely going to need assistance getting through most tasks in life.  That's just interesting.
 
Look at all the wonderful things the random module can do!

> [Random sampling (numpy.random) — NumPy v1.15 Manual](https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html)

**Double Click Here for answer**
<!--
print(np.random.normal( 100, 15, 10 ))
# or how about:
print(f"{%d.1} " for d in np.random.normal( 100, 15, 10 ))
iqlimit = 85
count = sum( [1 for x in np.random.normal(100,15,1000) if x <= iqlimit] )
print(f"Of 1000 people, {count} have an IQ of {iqlimit} or less. ")
-->


[129.60805737  85.95426811 135.19354687 124.13327793 120.9530737
 101.58126873  90.22778947  84.85182628 116.38843833 109.42854479]
['98.6', '89.5', '90.8', '84.7', '86.6', '128.9', '99.3', '91.2', '75.4', '111.1']


Of 1000 people, 153 have an IQ of 85 or less. 


### The Random Seed
Random numbers aren't truly random. They come from a built in "random number generator."  One important property of most testing and analysis is that you need to control what happens.  So if you run a piece of code, it will do **exactly** the same thing the next time you run it.

Random number generators use a mathematical function to pull a number from what might as well be a very very long list of numbers that meet certain properties of "randomness".  

The *seed* tells the generator where to start in that list.  So if you *seed* a random number generator with a fixed value, you'll always get the *same* result. So it's not random at all.  But the result will still have all the properties of randomness other than the fact that it is the same every time.

You use   `np.random.seed(somevalue)` to seed NumPy's random generator.

You only do this ONCE in your code
You only do it in the testing / development of your code.
If you want to reset it to being soemwhat random again, you can call seed() with n parameter.

#### --> 17  Random seed with 10 and generate 10 normal IQ number.

The last one should be 97.38099684 every time.

**Double Click Here for answer**
<!--
###
np.random.seed(10)
np.random.normal( 100, 15, 10)
-->






array([119.97379756, 110.72918462,  76.81899562,  99.87424225,
       109.32003961,  89.19871659, 103.98267379, 101.62822789,
       100.06437146,  97.38099684])

## Random integers

Syntax:

> `random.randint(  UpToAndIncludingValue,  size=(ArrayDimeionsions)`

#### --> 18.  Random int play

* Use the `size` named parameter and a tuple to initialize an 2,3 array of random numbers between 0 and 9 using the `random.randint` function.
* Assign that to variable x3.
* Print it.

**Double Click Here for answer**
<!--
x3 = np.random.randint(10, size=(2,3)) #two dimension
print(x3)
-->


[[1 8 4]
 [1 3 6]]


In [19]:
# And then show the dimensions of your array.
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim: 2
x3 shape: (2, 3)
x3 size:  6
