# Using Python Packages

Now that we covered how to make functions can you imagine how tedious it would be for us to come up with functions for every little thing we want to do? Like making a function to add any two numbers, How to read in a file from your computer. It is a lot of work and can be quite daunting to undertake especially when you are just starting out. But fear not there is hope in the form of something called Python Packages. 

Python packages are essentially a collection of functions that you import from a certain library. So what does this mean? Well, for example, there is this python library called Numpy that deals with some math stuff and in this library there are tons of useful functions to do any math operation you want, it also incudes a features called arrays which we will cover later in the notebook which will make list operations soooo much smoother. 

The cool thing is that someone already coded up these functions and all you need to do is call the library into python and BAM you have all the functions from that library available for you to use. 

# Some Noteworthy Packages:

1. Numpy: Mathematical analysis done easy
2. Scipy: statistical functions as well as wave analysis
3. matplotlib: Python's Plotting Package
4. pandas: Data analysis and dataframe
5. astropy: For dealing with astronomy related data

## Import vs From Statements

In [None]:
#example with Numpy
import numpy as np

In [None]:
#whenever we want to reference numpy we do not need to write out numpy but np

#Example of using numpy to take the absolute value of a number
np.abs(-5)

This call selects a certain function within the library to import into your python program

from 'Library' import 'Function' as 'Shorter name for function' #the as statement is optional

In [None]:
#here we are only importing from numpy the greater_equal function and giving it the shorter name g_eq
from numpy import greater_equal as g_eq

## How to know what functions are available in the Python package!!

### Method 1: Googling

Sometimes you are working on a problem in your code or are wondering if someone has figured out the solution and you type it into google and most likely someone had the same problem and has a solution and through this you can learn about functions within a package.

### Method 2: Package Documentation

This is a little daunting as the websites have a lot of functions to choose from but once you have googled a few function you get more familiar with it and the documentation is actually quite helpful. 

To know what functions we can use once we import the python package we need to go to their website and see the documentation for all the functions the package has to offer. 

For Numpy we would go to https://numpy.org/doc/1.26/user/index.html

For Scipy we would go to https://docs.scipy.org/doc/scipy/reference/ 

If we look at the scipy website we can see that they have different classes within scipy such as scipy.optimize, scipy.stats, and scipy.interpolate for example. All of these are classes that focus on that specific aspect. So if we were to look at stuff in scipy.stats it would give us everything we need to do stats with, such as function to get mean, medians, summary table, statistical distributions and more. 

# 1. Numpy

Numpy is by far the most important package we will use in astronomy. It is the package that allows us to perform mathematical operations on a large amount of data. The bulk of which is facilitated by a feature in numpy called Arrays. Arrays are akin to vectors in that you can you multiply them by a scalar and all the elements of an array will be multiplied by that scalar and similarly for all the other mathematical operations. You can even perform mathematical operation between two array given that both array are the same shape and size. We will cover these in detail in the next few cells.

In [2]:
#importing numpy
import numpy as np

In [None]:
#Array definition
array = np.array([])

#this makes an empty array

In [None]:
array

In [None]:
non_empty_arr = np.array([12, 34, 232, 54, 3, 4, 123, 34, 35345])

In [None]:
non_empty_arr

In [None]:
mix_match_arr = np.array(['What Up', 34, 232, False, 3, 4, True, 34, 35345])

In [None]:
mix_match_arr

In [None]:
np.pi

## Differences between Arrays and Containers

Unlike lists where you can have different datatypes within the same list, with arrays they all have to be the same datatype. What will happen is if you insert a string anywhere in the array then numpy will automatically make the entire array strings. If you happen to put Boolean values into an array with numbers then those boolean values will get converted to a number using the convention True = 1 and False = 0. 

# Math with arrays vs other containers

## Problem: Give me a list with every entry in the list multiplied by 2


In [None]:
#Solution using Lists

#when working with lists we would need to go through each and every element and multiply it by 2
number_list = [1, 4, 6, 10, 40, 100]

#for loop that goes through the index values of the number_list list
for i in range(len(number_list)):
    
    #We replace the current value at index i with the value multiplied by 2
    number_list[i] = number_list[i] * 2

print(number_list)

In [None]:
#With arrays it is as simple as multiplying the array by 2
number_list = [1, 4, 6, 10, 40, 100]

#To do this we simply convert the list to an array using np.array()
num_arr = np.array(number_list)

#then multiply the array by 2
print(2*num_arr)

#Wow!!! Super easy and no need for for-loops :D!!!

This same ease can be acquired through all the other mathematical functions such as division, adding, subtracting, raising to a power

In [None]:
#division

#with one line we can divide the ENTIRE array by 2
num_arr/2

In [None]:
#adding

#with one line we can add the ENTIRE array by 2
num_arr + 2

In [None]:
#subtracting

#with one line we can subtract the ENTIRE array by 2
num_arr - 2

In [None]:

#with one line we can square the ENTIRE array
num_arr**2

## Exercise

Make your own numpy array and do the following: 

- a) multiply the array by 3
- b) divide the array by 100
- c) compute the line equation 3*x + 5 where x is your array

In [None]:
########## Code Here ###########


# Array math between multiple arrays

The way that array math works between 2 or more arrays is by doing element wise arithmetic. The entries at the same location undergo the operation being performed. So if you have 2 arrays with 3 numbers in them say arr1 = np.array([1, 5, 10]) and arr2 = np.array([10, 20, 30]) when you do an operation say addition. This operation is applied to the same indexes so 1 will get added to 10, 5 will get added to 20 and 10 will get added to 30 making the resulting array be np.array([11, 25, 40]). This same methodology is applied if you add another array into the mix where similar indexes have the operation applied to them. See the example below of this in action.


NOTE: The array $\textbf{must}$ be the same shape and size for this to work. If you have an array that is shorter or longer then you will get an error.

In [None]:
arr1 = np.arange(1, 11, 1)
arr2 = np.arange(21, 31, 1)
arr3 = np.arange(101, 111, 1)

In [None]:
print(arr1)
print(arr2)
print(arr3)

In [None]:
arr1+arr2

In [None]:
arr1+arr2+arr3

In [None]:
arr1*arr2

In [None]:
arr1*arr2*arr3

In [None]:
arr1/arr2

In [None]:
arr1/arr2/arr3

In [None]:
arr1-arr2

In [None]:
arr1-arr2-arr3

In [None]:
arr2//arr1

In [None]:
arr2%arr1

# Generating Arrays

Numpy has many built in functions that allows you to quickly get a large array between a starting and ending number. Some of those functions are np.linspace(), np.arange(), np.logspace(). These functions are really great when we need to generate an array of X-values for our plots. We will cover all three of them in detail in the next few cells. 

In short np.linspace() produces an array that is equally spaced between a starting and ending number

np.arange() has a step size argument so you can give np.arange() a starting and ending point and a step size and you will get an array of values separated by the step size you inputted. This is really good when you are doing histograms as you have full control over the bin size through the step_size argument in np.arange()


np.logspace() is similar to np.linspace() but it generates a set of equally spaced values from starting number to ending number in log-space. 

In [None]:
#quickly generating an array of 1000 equally spaced values between 1-100
#Note how the ending number 100 is included
np.linspace(1, 100, 1000)

In [None]:
#quickly generating an array of 10 equally spaced values between 1-100
np.linspace(1, 100, 10)

In [None]:
#let us try to create the same array as above but with np.arange
np.arange(1, 100, 11)

In [None]:
#np.arange is exclusive of the ending number so if we want it to end at 100 we need to increase the ending number
np.arange(1, 101, 11)

In [None]:
#np.logspace works in logspace and the starting and ending values are the log of those numbers
#so a value of -2 is the same as 10^-2, a value of 4 is 10^4, value of 0 is 10^0 = 1
#this genereates an equal spacing array in log-space which is different than linear spacing
np.logspace(0, 2, 10)

# Exercise 

Using linspace, logspace or building your own arrays make 5 arrays of length 10 of:

- Mass1 Array: An array of masses in units of kg, should be between $10^{29} - 10^{34}$ kgs
- Mass2 Array: An array of masses in units of kg, should be between $10^{32} - 10^{34}$ kgs
- Distance Array: An array of distances in meters of order $10^{10} - 10^{30}$ m
- Radius1 Array: An array of raidus in meters of order $10^{7} - 10^{10}$ m
- Radius2 Array: An array of raidus in meters of order $10^{7} - 10^{10}$ m


Once you have created the arrays please do the following: 

1. Compute the gravitational attraction between the two Stars in Mass1 and Mass2 using:

$F = \frac{G m_1 m_2}{d^2}$

2. Compute the escape velocity of each of the stars:

$v_{esc} = \sqrt{\frac{2 G M}{R}}$


3. Compute the average of the Mass, Radius and Distance



In [None]:
########## Code Here ###########


# Useful Numpy Functions

In [None]:
example_array = np.array([-10, -9, -30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
                           -134, 332, 324, -3312, 23213, -423321, 12321])

In [None]:
np.where(example_array > 0)

In [None]:
np.ones(10)

In [None]:
np.zeros(10)

# Boolean Masking

A very cool thing that arrays can do is that we can perform conditional expressions across an entire array. This will result in an array of True and False values which we can use to subselect the relevant values from a larger array. Let us see this in action below.

In [None]:
random_numbers = np.random.normal(loc = 2, scale = 1, size = 100)

print(random_numbers)
print()
print('Length: ', len(random_numbers))

In [None]:
#making a conditional to only select numbers between 2 and 4
range_2_to_4 = ((random_numbers > 2) &
                (random_numbers < 4))

print(range_2_to_4)

In [None]:
#let us now apply this boolean mask to the random numbers to subselect the values within the range 2 to 4

subsample = random_numbers[range_2_to_4]

print(subsample)
print()
print('Length: ', len(subsample))

# Note:

Boolean masks are a great way to subselect data from an array with ease but it does come at a cost in that when you apply the mask onto an array it alters the size of the array, we started off with an array of length 100 and when we applied the mask it down selected reducing the size of the array. This has important implication if you are working with many different arrays and they need to be the same size for plotting. The way to fix this is that you would need to apply this mask to the other arrays as well to make sure they are the same size and match up for any subsequent analysis and plotting purposes.

# 2D Arrays

In the next few notebooks, we will encounter a type of array called ND-Arrays these arrays are N-dimensional arrays but for the sake of introducing ND-arrays we will be working with 2D-arrays but the same concepts learned here can be extrapolated to higher dimensional arrays. 2D arrays are the typical data structure of images which will be the focus of the Photometry Module. So knowing how to manipulate 2D arrays and performing mathematical operations is crucial for that module. 

2D arrays are made by turning a list of list into an array. Take, for example. that we have the following list within list [[1, 2, 3, 4], [10, 11, 12, 13]], we can turn this into a 2D array using the np.array() function.

    twoD_array = np.array([[1, 2, 3, 4], [10, 11, 12, 13]])
    
What numpy will do is that it will make the first list, [1, 2, 3, 4], be the first row in the 2D array and the second list, [10, 11, 12, 13], will be the second row. This will make a matrix with row and column values. With this example shown here it will make a 2x4 array with 2 rows and 4 columns. Let us apply this in python.

In [None]:
#defining a 2D array with 2 rows and 4 columns
twoD_array = np.array([[1, 2, 3, 4],[10, 11, 12, 13]])

print(twoD_array)

# Indexing 2D Arrays

Indexing 2D-arrays builds on indexing 1D arrays as we have two locations to specify we need to specify. The index across the row and the index across the column to grab the value that we want. Let us say we want to get the value 12. By eye we can see that it is in the second row and in the third column. All the rules of indexing still apply as well as splicing, striding and negative indexing. In this simple example of getting 12 we would do the following:

In [None]:
#getting the value of 12, 2nd row is index 1 and 3rd column is at index 2
twoD_array[1, 2]

In general to get a single value from a 2D array requires us to use the syntax below:

twoD_array[Row_index, Column_index]


This can be generalized to include slicing:

twoD_array[start_row_index:ending_row_index, start_col_index: ending_col_index]

Note that the ',' is what tells python when the index is for row or for columns

If you want all the rows of a column the syntax for that is as follows:

twoD_array[:, col_index]

If you want all the columns from a row you can use either of the two methods shown below:

1. twoD_array[row_index]
2. twoD_array[row_index, :]

Both will give you the entire row corresponding to the row_index. 

Let's apply this to a larger 2D array to see this in action.

In [None]:
big_2d_arr = np.random.normal(loc = 2, 
                              scale = 1, 
                              size = (10, 12), #this is how you tell numpy you want a 10x12 array of values
                              )

print(big_2d_arr)

In [None]:
#let's select the entire 5th row
print(big_2d_arr[4])
print('or')
print(big_2d_arr[4, :])

In [None]:
#let's select the entire 5th column
print(big_2d_arr[:, 4])

In [None]:
#let us select row 2 - 5 and columns 6 - 10

    #row indexing , column indexing
big_2d_arr[1:5    ,  5:10]

# Math with 2D Arrays and Scalars

Everything we covered with 1D arrays for scalar arithmetic stills applies with 2D array you can take any number and perform any math operator on a 2D-array and every single element will have that operation applied.


In [None]:
twod_arr = np.ones(shape = (4, 5))
print(twod_arr)

In [None]:
twod_arr + 5

In [None]:
twod_arr - .5

In [None]:
twod_arr / 2

In [None]:
twod_arr * 5

In [None]:
(2*twod_arr)**2

# Math between 2D Arrays and 1D Arrays

You can also perform mathematical operators between 2D arrays and 1D arrays but there is a criteria that must be met, the 1D array $\textbf{MUST}$ be the same size as the columns of the 2D array. So if you have a 10 x 12 array the 1D array needs to be length 12 to perform mathematical operations. 

The way math is carried out between these two is that every row gets the math operator applied making sure that the same elements are undergoing the operation. To explain this with an example let us take the following array: np.array([[1, 2, 3], [2, 4, 8], [3, 6, 9]]) and the 1D-array np.array([2, 2, 2]) and let's say I need to multiply these two arrays. 

What will happen is that numpy will take the first row of the 2D array and perform the multiplication [1, 2, 3] * [2, 2, 2] and since we have two 1D arrays of the same length we perform the operation element-wise. The output for this is going to be: [2, 4, 6] and this will be the new row1 in the output 2D array. The second row then does the same operation: [2, 4, 8] * [2, 2, 2] = [4, 8, 16] and this will be the second row and so on. so the resulting 2D array from this operation will be: np.array([[2, 4, 6], [4, 8, 16], [6, 12, 18]]). 


In [None]:
ex_2darr = np.array([[1, 2, 3], [2, 4, 8], [3, 6, 9]])
ex_1darr = np.array([2, 2, 2])

In [None]:
ex_2darr + ex_1darr

In [None]:
ex_2darr - ex_1darr

In [None]:
ex_2darr * ex_1darr

In [None]:
ex_2darr / ex_1darr

# Math Between 2D Arrays and 2D Arrays

You can also perform mathematical operations between many other 2D arrays. This also has a strict criteria that must be met and this is that the $\textbf{shape}$ of the arrays $\textbf{MUST}$ be the same. The number of rows and columns must match exactly for mathematical operations to be performed between any number of 2D arrays. The math performed here is done element wise so every item with a similar row and column index will have the operation applied to it. The first row and first column element of all the arrays will have the math operation performed, The first row and second column element of all the arrays will have the math operation performed, and so on. Let's see an example of this in use.

In [None]:
twod_arr1 = np.random.randint(1, 100, size = (5, 7))
twod_arr2 = np.random.random(size = (5, 7))

In [None]:
#you can check the shape of an array using the shape attribute
twod_arr1.shape

In [None]:
#you can check the shape of an array using the shape attribute
twod_arr2.shape

In [None]:
twod_arr1 + twod_arr2

In [None]:
twod_arr1 - twod_arr2

In [None]:
twod_arr1 * twod_arr2

In [None]:
twod_arr1 / twod_arr2

# Boolean Masking for 2D Arrays

Everything we covered about masking for 1D array is transferable to 2D arrays. We just have an added dimension to work with. The main thing to note about applying boolean masks to 2D arrays is that it changes both the row and column size based off of the mask being applied.

In [None]:
ex_2d_arr = np.random.normal(loc = 2, scale = 1, size = (5, 7))

ex_2d_arr > 3

In [None]:
bool_mask = ex_2d_arr > 3

ex_2d_arr_masked = ex_2d_arr[bool_mask]

print(ex_2d_arr_masked)

In [None]:
#you can apply a 1D boolean mask so long as it matches the row or column length you are trying to mask

#this will mask out the 2nd and last row of the 2D array
bool_mask1d = np.array([True, False, True, True, False])

print(ex_2d_arr[bool_mask1d])
print()
print(ex_2d_arr[bool_mask1d].shape)

In [None]:
#you can apply a 1D boolean mask so long as it matches the row or column length you are trying to mask

#this will mask out the 2nd and 5th columns of the 2D array
bool_mask1d = np.array([True, False, True, True, False, True, True])

print(ex_2d_arr[:, bool_mask1d])
print()
print(ex_2d_arr[:, bool_mask1d].shape)