# Numpy

The `numpy` package (short for _"numerical python"_) provides an efficient interface for interacting with arrays of values in python. As you complete this notebook, I **strongly suggest** that you read [this chapter](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) in the _Python Data Science Handbook_, which describes the package in detail (you will need to read it to identify the appropriate syntax). The `numpy` array is at the heart of many data science packages and operations, so understanding how to efficiently compute values using the data type is key to doing data science projects.

In this notebook, make sure to both complete all code sections, and answer all questions, such as this one:

**Question**: Why might we use NumPy arrays instead of the built-in `array` module in Python?
We use Numpy arrays instead of the built-in 'array' module in python because NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. 


## Creating Arrays
In this section, you'll create arrays using various numpy methods. See [this section](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html#Creating-Arrays-from-Python-Lists) for reference.

In [7]:
# Import numpy as the conventional "np"
import numpy as np

In [8]:
# Create a NumPy array with 5 integers in it.
array = np.array([1,2,3,4,5])
# np.linspace, np.arrange, np.random

In [9]:
# Create an array of 5 zeros, specifying the data type as "int"
zeroarr = np.array([0, 0, 0, 0, 0], dtype='int')

In [10]:
# Create an array of all odd integers between 1 and 50.
odd = [i for i in np.arange(1,51,2)]

print(odd)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49]


In [11]:
# Create an array of 101 values that are evenly spaced between 0 and 10 (i.e., 0, .1, .2....., 9.9, 10)
arr2 = [i for i in np.arange(0,10.1,0.1)]

print(arr2)

[0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0, 1.1, 1.2000000000000002, 1.3, 1.4000000000000001, 1.5, 1.6, 1.7000000000000002, 1.8, 1.9000000000000001, 2.0, 2.1, 2.2, 2.3000000000000003, 2.4000000000000004, 2.5, 2.6, 2.7, 2.8000000000000003, 2.9000000000000004, 3.0, 3.1, 3.2, 3.3000000000000003, 3.4000000000000004, 3.5, 3.6, 3.7, 3.8000000000000003, 3.9000000000000004, 4.0, 4.1000000000000005, 4.2, 4.3, 4.4, 4.5, 4.6000000000000005, 4.7, 4.800000000000001, 4.9, 5.0, 5.1000000000000005, 5.2, 5.300000000000001, 5.4, 5.5, 5.6000000000000005, 5.7, 5.800000000000001, 5.9, 6.0, 6.1000000000000005, 6.2, 6.300000000000001, 6.4, 6.5, 6.6000000000000005, 6.7, 6.800000000000001, 6.9, 7.0, 7.1000000000000005, 7.2, 7.300000000000001, 7.4, 7.5, 7.6000000000000005, 7.7, 7.800000000000001, 7.9, 8.0, 8.1, 8.200000000000001, 8.3, 8.4, 8.5, 8.6, 8.700000000000001, 8.8, 8.9, 9.0, 9.1, 9.200000000000001, 9.3, 9.4, 9.5, 9.600000000000001, 9.700000000000

In [12]:
# Create an array of 10 normally distributed values with a mean of 100 and a standard deviation of 5.
mu, sigma = 100, 5 # mean and standard deviation
s = np.random.normal(mu, sigma, 10)
print(s)

[ 98.36247882  98.5444068  102.6875241  103.9036032  113.73417147
 102.68200029 101.53802619 107.24819681  97.5204911  100.03613634]


**Question**: How many data types can one NumPy array contain? How many can a list contain?

One Numpy array and list can contain many data types and Numpy array can contain many more data types than list array.

## Accessing and Slicing Arrays

**Question**: What number does the indexing for a NumPy array start at?

0

In [7]:
# Using your array of random values, access the element in the 3rd position in two ways: 
# 1) using positive indices 
print(ran_arr[3])
# 2) using negative indices.
print(ran_arr[-2])

**Question**: What is array slicing? Are all three arguments (start, stop, step) required to use it? 

slicing extends Python’s basic concept of slicing to N dimensions. Basic slicing is also initiated if the selection object is any non-ndarray sequence (such as a list) containing slice objects.No it does not require all three arguments for it to be used. THere are some of the ways it can be used:
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array


In [8]:
# Create and store a one dimensional array with the integers 0-9. 
# Use a built in functionality of NumPy instead of specifying each number individually.
x = np.arange(10)

print(x)

In [9]:
# Using slicing to print out a subarray of the previous array with every 3rd element in it.
print(x[::3])

**Question**: Subarrays are "no-copy" views of an original array. If you want a slice that you can modify without the modification of the slice changing an original array, what can you do?

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:
x[start:stop:step]



# Computation on NumPy Arrays

**Question**: Can you use basic arithmetic operators on NumPy arrays?

yes.

In [14]:
# Create an array of 5 values, then add the scaler "1" to the array
arr = np.array([5, 3, 2, 7, 3])
arr+=1

print(arr)

[6 4 3 8 4]


In [16]:
# Take the __natural log__ (a logarithm to the base e, which is about 2.7182) 
# of all values in an any one dimensional array.
arr_log = np.log(arr)
arr_log

array([1.79175947, 1.38629436, 1.09861229, 2.07944154, 1.38629436])

In [19]:
# Using the built in methods for a numpy array, compute the min, max, standard deviaion, mean, and variance
# of your array
print(np.min(arr))
print(np.max(arr))
print(np.std(arr))
print(np.mean(arr))
print(np.var(arr))

3
8
1.7888543819998317
5.0
3.2


# Comparison operators

**Question**: How are the letter-based `and` and `or` operators and the `|` and `&` operators different in Python?

logical operators 'and' and 'or are used when making bitwise comparison and the logical operators '|' and '&' are used for element wise comparison.

In [41]:
# Make a one dimensional numpy array with 20 random values between 0.0 and 4.0 to represent student grades in a course. 
# Then, round the values so that there is on one decimal place

array = np.linspace(0.0,4.0,20)
array = np.round(array,1)
array

array([0. , 0.2, 0.4, 0.6, 0.8, 1.1, 1.3, 1.5, 1.7, 1.9, 2.1, 2.3, 2.5,
       2.7, 2.9, 3.2, 3.4, 3.6, 3.8, 4. ])

In [42]:
# How many of the students' grades are not between a 3.0 and 3.8?
# Hint: use a comparion operator to return true/false values, and then sum() them
count = (array < 3.0) | (array > 3.8)

countval = sum(count)

countval


16

In [43]:
# How many of the students got a grade between 3.0 and 3.5?

count1 = (array > 3.0) & (array < 3.5)
count1val = sum(count1)
count1val

2

In [44]:
# Did any students get (exactly) a 3.8?
count2 = (array == 3.8)

count2val = sum(count2)
count2val


1

In [51]:
# Using a boolean array as a mask, select all of the grades that are between 3.0 and 3.5.
count1 = (array > 3.0) & (array < 3.5)
print(array[count1])


[3.2 3.4]
