# Numpy Lesson Plan and Hacks
> The outline for our lesson plan and hacks
- toc: true
- categories: []
- type: ap
- week: 29

## Lesson Plan

- The lesson will cover various topics about Numpy and how to use it to perform mathematical functions and analyze data
    - How to 
    - How to manipulate arrays
- There will be multiple "popcorn" hacks within the lesson as well

# 1. Intro to NumPy and the features it consists

Numpy, by definition, is the fundamental package for **scientific computing** in Python which can be used to perform **mathematical operations, providing multidimensional array objects, and makes data analysis much easier**. Numpy is very important and useful when it comes to data analysis, as it can easily use its features to complete and perform any mathematical operation, as well as analyze data files. 

If you don't already have numpy installed, you can do so using ```conda install numpy``` or ```pip install numpy```

Once that is complete, to import numpy in your code, all you must do is:

In [3]:
import numpy as np

# 2. Using NumPy to create arrays
An array is the central **data structure** of the NumPy library. They are used as **containers** which are able to store more than one item at the same time. Using the function ```np.array``` is used to create an array, in which you can create multidimensional arrays. 

Shown  below is how to create a 1D array:

In [4]:
a = np.array([1, 2, 3])
print(a) 
# this creates a 1D array

[1 2 3]


How could you create a 3D array based on knowing how to make a 1D array?

In [5]:
# create 3D array here

Arrays can be printed in different ways, especially a more readable format. As we have seen, arrays are printed in rows and columns, but we can change that by using the ```reshape``` function 

In [6]:
c = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(c.reshape(1, 9)) # organizes it all in a single line of output

[[1 2 3 4 5 6 7 8 9]]


In the code segment below, we can also specially select certain rows and columns from the array to further analyze selective data.

In [7]:
print(c[1:, :2])
# the 1: means "start at row 1 and select all the remaining rows"
# the :2 means "select the first two columns"

[[4 5]
 [7 8]]


# 3. Basic array operations

One of the most basic operations that can be performed on arrays is **arithmetic operations**. With numpy, it is very easy to perform arithmetic operations on arrays. **You can add, subtract, multiply and divide** arrays, just like you would with regular numbers. When performing these operations, numpy applies the operation element-wise, meaning that it performs the operation on each element in the array separately. This makes it easy to perform operations on large amounts of data quickly and efficiently.

In [119]:
import numpy as np

a = np.array([1, 2, 3, 4, 5, 6])
a = a * 2
z = a/4

print(a)
print(z)

[ 2  4  6  8 10 12]
[0.5 1.  1.5 2.  2.5 3. ]


In [129]:
b = np.array([1, 2, 3])
c = np.array([4, 5, 6])
print(b + c) # adds each value based on the column the integer is in
print(b - c) # subtracts each value based on the column the integer is in
print(b * c) # multiplies each value based on the column the integer is in
print(b / c) # divides each value based on the column the integer is in


[5 7 9]
[-3 -3 -3]
[ 4 10 18]
[0.25 0.4  0.5 ]


In [130]:
d = np.exp(b)
e = np.sqrt(b)
print(d)
print(e)

[ 2.71828183  7.3890561  20.08553692]
[1.         1.41421356 1.73205081]


From the knowledge of how to use more advanced mathematical expressions than the basic 4 mathematical operations such as exponent and square root, now can you code how to calculate the 3 main trig expressions (sin, cos, tan), natural log, and log10 of a 1D array.

In [None]:
# calculate sin
# calculate cos
# calculate tan
# calculate natural log
# calculate log10

# 4. Data analysis using numpy
Numpy provides a convenient and powerful way to perform data analysis tasks on **large datasets**. One of the most common tasks in data analysis is finding the **mean, median, and standard deviation** of a dataset. Numpy provides functions to perform these operations quickly and easily. The mean function calculates the average value of the data, while the median function calculates the middle value in the data. The standard deviation function calculates how spread out the data is from the mean. Additionally, numpy provides functions to find the minimum and maximum values in the data. These functions are very useful for gaining insight into the properties of large datasets and can be used for a wide range of data analysis tasks.

In [9]:
data = np.array([2, 5, 12, 13, 19])
print(np.mean(data)) # finds the mean of the dataset
print(np.median(data)) # finds the median of the dataset
print(np.std(data)) # finds the standard deviation of the dataset
print(np.min(data)) # finds the min of the dataset
print(np.max(data)) # finds the max of the dataset

10.2
12.0
6.04648658313239
2
19


Now from learning this, can you find a different way from how we can solve the sum or products of a dataset other than how we learned before?

In [None]:
# create a different way of solving the sum or products of a dataset from what we learned above

Numpy also has the ability to handle CSV files, which are commonly used to store and exchange large datasets. By importing CSV files into numpy arrays, we can easily perform complex operations and analysis on the data, making numpy an essential tool for data scientists and researchers.

```genfromtxt``` and ```loadtxt``` are two functions in the numpy library that can be used to read data from text files, including CSV files.

```genfromtxt``` is a more advanced function that can be used to read text files that have more complex structures, including CSV files. ```genfromtxt``` can handle files that have missing or invalid data, or files that have columns of different data types. It can also be used to skip header lines or to read only specific columns from the file. 

In [108]:
import numpy as np

padres = np.genfromtxt('files/padres.csv', delimiter=',', dtype=str, encoding='utf-8')
# delimiter indicates that the data is separated into columns which is distinguished by commas
# genfromtxt is used to read the csv file itself
# dtype is used to have numpy automatically detect the data type in the csv file

print(padres)

[['Name' ' Position' ' Average' ' HR' ' RBI' ' OPS' ' JerseyNumber']
 ['Manny Machado' ' 3B' ' .298' ' 32' ' 102' ' .897' ' 13']
 ['Tatis Jr' ' RF' ' .281' ' 42' ' 97' ' .975' ' 23']
 ['Juan Soto' ' LF' ' .242' ' 27' ' 62' ' .853' ' 22']
 ['Xanger Bogaerts' ' SS' ' .307' ' 15' ' 73' ' .833' ' 2']
 ['Nelson Cruz' ' DH' ' .234' ' 10' ' 64' ' .651' ' 32']
 ['Matt Carpenter' ' DH' ' .305' ' 15' ' 37' ' 1.138' ' 14']
 ['Cronezone' ' 1B' ' .239' ' 17' ' 88' ' .722' ' 9']
 ['Ha-Seong Kim' ' 2B' ' .251' ' 11' ' 59' ' .708' ' 7']
 ['Trent Grisham' ' CF' ' .184' ' 17' ' 53' ' .626' ' 1']
 ['Luis Campusano' ' C' ' .250' ' 1' ' 5' ' .593' ' 12']
 ['Austin Nola' ' C' ' .251' ' 4' ' 40' ' .649' ' 26']
 ['Jose Azocar' ' OF' ' .257' ' 0' ' 10' ' .630' ' 28']]


```loadtxt``` is a simpler function that can be used to read simple text files that have a regular structure, such as files that have only one type of data (such as all integers or all floats). ```loadtxt``` can be faster than ```genfromtxt``` because it assumes that the data in the file is well-structured and can be easily parsed.

In [110]:
import numpy as np

padres = np.loadtxt('files/padres.csv', delimiter=',', dtype=str, encoding='utf-8')
print(padres)

[['Name' ' Position' ' Average' ' HR' ' RBI' ' OPS' ' JerseyNumber']
 ['Manny Machado' ' 3B' ' .298' ' 32' ' 102' ' .897' ' 13']
 ['Tatis Jr' ' RF' ' .281' ' 42' ' 97' ' .975' ' 23']
 ['Juan Soto' ' LF' ' .242' ' 27' ' 62' ' .853' ' 22']
 ['Xanger Bogaerts' ' SS' ' .307' ' 15' ' 73' ' .833' ' 2']
 ['Nelson Cruz' ' DH' ' .234' ' 10' ' 64' ' .651' ' 32']
 ['Matt Carpenter' ' DH' ' .305' ' 15' ' 37' ' 1.138' ' 14']
 ['Cronezone' ' 1B' ' .239' ' 17' ' 88' ' .722' ' 9']
 ['Ha-Seong Kim' ' 2B' ' .251' ' 11' ' 59' ' .708' ' 7']
 ['Trent Grisham' ' CF' ' .184' ' 17' ' 53' ' .626' ' 1']
 ['Luis Campusano' ' C' ' .250' ' 1' ' 5' ' .593' ' 12']
 ['Austin Nola' ' C' ' .251' ' 4' ' 40' ' .649' ' 26']
 ['Jose Azocar' ' OF' ' .257' ' 0' ' 10' ' .630' ' 28']]


In [60]:
for i in padres:
    print(",".join(i))

Name, Position, Average, HR, RBI, OPS, JerseyNumber
Manny Machado, Third Base, .298, 32, 102, .897, 13
Fernando Tatis Jr, Right Field, .281, 42, 97, .975, 23
Juan Soto, Left Field, .242, 27, 62, .853, 22
Xanger Bogaerts, Shortstop, .307, 15, 73, .833, 2
Nelson Cruz, Designated Hitter, .234, 10, 64, .651, 32
Matt Carpenter, Designated Hitter, .305, 15, 37, 1.138, 14
Jake Cronenworth, First Base, .239, 17, 88, .722, 9
Ha-Seong Kim, Second Base, .251, 11, 59, .708, 7
Trent Grisham, Center Field, .184, 17, 53, .626, 1
Luis Campusano, Catcher, .250, 1, 5, .593, 12
Austin Nola, Catcher, .251, 4, 40, .649, 26
Jose Azocar, Outfield, .257, 0, 10, .630, 28
