# Packages

We've seen how functions make it easy to write general-purpose routines that we can reuse, saving us from rewriting the code. You can imagine that there are a vast number of functions that might be useful for our coding purposes. Some of these are implemented for us in default Python. Things like addition, writing to files, appending elements to lists, are functions that someone wrote and made accessible to you when you launch the Python executable. However, despite the large set of functions packaged with Python by default, there are many possibly useful functions that are not. We wouldn't want the Python install to be a terrabyte, after all. 

Since Python doesn't include all the functions we want, we can always just write what we need ourselves. But writing software, especially efficient, reliable, and complicated software, is time consuming and requires a lot of background knowledge. It might even require writing in other programming languages.

For example, we might want a function that takes 2 lists and creates an image showing the data plotted as pairs of coordinates on a set of axes. We could spend weeks or months toiling away trying to write our own function to do this, or we could just let someone else do it and copy their function (with their permission of course). That was the beauty of functions after all, they're portable code blocks that aren't tied to any specific use-case/program. This is where packages come in.

Python packages are essentially libraries of functions written by others and kindly distributed to us for free. So how does it work? Well, first we need to download and install these fancy libraries. Then we simply load them into our environment and we're free to use them to our heart's content. It's as easy as that. 

## Some Noteworthy Packages:
1. NumPy: Elaborate suite of math and array functions. Maybe the most common package in Python
2. SciPy: Extensive set of useful scientific functions useful for statistics, math, and more
3. Matplotlib: An excellent package for creating custom plots and data visualizations
4. Pandas: A data analysis package built around implementing DataFrame objects
5. AstroPy: A utility package for all things astronomy

Note that even these packages use packages! In fact, #2-5 are all built on top of NumPy!

There's genuinely nothing special about the code contained in packages. You could write it yourself given enough time. They just save us from re-doing things other people already did, allowing us to focus on the fun science. It also helps maintain interoperability. Many packages work seamlessly together because they are both built on NumPy. You can imagine how complicated things would get if everyone did everything their own way.

## Let's import a package

Before running the next cell, ensure that you have NumPy installed. If you're using a conda distribution of Python, go to the terminal, activate your desired environment and run 'conda install numpy'. If that fail or you are not using conda, go to a terminal and run 'pip install numpy'.

Remember, packages are libraries *external* to Python. We have to download and install them in order to use them. We'll talk more about package managers at the end of this notebook, but you've been reading for a while already so let's run some code!

In [None]:
# To load a package for use in your program, simply use the 'import' keyword 
#  followed by the name of the package
import numpy

Now that we've loaded in NumPy, we have access to tons of new functions. Suppose NumPy contains a function with the same name as a built-in function, though (it does). How can Python know which one to use? 

By default, Python will make a packages functions accessible via the '.' syntax on the package name. So to access the 'add' function from NumPy, we would write 'numpy.add()' rather than simply add(), even though there is no default Python function names add(). 

In [None]:
# This runs the add function implemented in NumPy
print(numpy.add(1,2))

# Python has a sum() function built in for adding all the numbers in a list
print(sum([1,2,3]))

# NumPy also has a sum() function which can do the same thing
print(numpy.sum([1,2,3]))

#Note that the two functions achieve the same result, but they are actually running different code.

Sometimes package names are long, so it can get annoying writing them every time we want to use one of their functions. Luckily, Python has aliasing built-in to allow us to use whatever name we want to reference a package. 

This can make your code less readable, so many packages have standard abbreviations that almost everyone uses. This makes it easier to understand what someone else's code does without having to think too hard.

In [None]:
# Let's import NumPy with an alias
# Simply use the 'as' keyword (just like we did with file I/O)
import numpy as np

# Now 'np' will be interpreted as numpy and can be used in its place
np.add(1,2)

# Note Python will not recognize the original name if we alias it
# That means that in this case, if we load numpy as np, we can no longer 
#  write something like numpy.add(), we have to use the np alias.


Like we said, NumPy is a very large library of functions. That means loading them all into memory may be laborious for certain applications. Or we may have other libraries that are extremely bloated. If this is the case, and we only need 1 or 2 functions, Python allows us to import only specific functions from within libraries by using the 'from' keyword.

In [None]:
# If we only want to import the add() function, we just use this call
from numpy import add

# Now we can use add() without the numpy.add syntax
add(1,2)

In [None]:
# We could also alias just the function
from numpy import add as addition

addition(1,2)

In [None]:
# If we import a function that is already defined, the new function takes precedence
sum?

In [None]:
from numpy import sum
sum?

Ok, so packages have a ton of useful functions, and there are tons of useful packages. How can you find what packages to use and what functions they contain?

### Good ol' Google

On of the first things you can do when you're stuck in your programming or you feel like there must be a better way is to simply Google how to do a particular thing. The internet is bursting with info on coding, and the most popular packages have been used and reused in almost every way imagineable. You would be hard-pressed to encounter an issue that someone else hasn't already had and solved.

### Documentation

Any package that is meant to be useful to others will have some sort of online documentation. Most packages have their documented source code publicly available on GitHub. Larger packages like NumPy and AstroPy have dedicated sites for hosting extensive documentation, examples, and troubleshooting. 

This can be daunting as the packages generally have a lot of functions, but if you hone in on specific functions at first, you'll see that the documentation is no harder to read than the doc strings we have already worked with. Also, you can usually scroll down to see examples that you can even copy-paste to run/modify just to get a feel for the usage. 


# Challenge

Go to one of the documentation sites below and find a function you like or think would be useful. Read over the docs for that function and how to use it, and come to the next class ready to briefly present the function to the class.

NumPy: https://numpy.org/doc/stable/

SciPy: https://docs.scipy.org/doc/scipy/

## Extras

If you spend some time on the docs for NumPy or similar packages, you'll find that the functions are often catagorized under certain banners. These are submodules contained within one pacakge. For example, in SciPy has a stats module, a linalg module, an optimize module, etc. To import or access submodules or their functions, we need to use the '.' syntax again.

In [None]:
# Suppose I want to use SciPy's curve fitting function
# Since it is in the optimize submodule, I need to tell Python to look there for it

import scipy
scipy.optimize.curve_fit

In [None]:
# Alternatively, I can import just the submodule or just the function itself as before
import scipy.optimize
scipy.optimize.curve_fit

In [None]:
from scipy.optimize import curve_fit
curve_fit

In [None]:
from scipy import optimize
optimize.curve_fit

While the above are all valid ways of loading the curve_fit() function, it is good to consider readability and reliability when writing code. Importing the function directly as curve_fit() may be less typing, but since it could overwrite other functions/variables in the namespace, it is often better to use the dotted namespace to alieviate possible confusion.

In [15]:
#Solution using Lists

#when working with lists we would need to go through each and every element and multiply it by 2
number_list = [1, 4, 6, 10, 40, 100]

#for loop that goes through the index values of the number_list list
for i in range(len(number_list)):
    
    #We replace the current value at index i with the value multiplied by 2
    number_list[i] = number_list[i] * 2

print(number_list)

[2, 8, 12, 20, 80, 200]


### Be careful with aliases

Note that Python will put virtually no limits on the alias you assign to a package. This means you can override an existing word in the namespace. This example should illustrate the danger.

In [None]:
# We'll alias numpy using print which is a function in default Python
import numpy as print

In [None]:
# Before running this cell, guess what you think the result will be
print("hi")

In [None]:
# Take a guess what will happen when you run this cell
print.arange(10)

## Making Our Own Packages

There's nothing particularly special about the code in very popular packages like NumPy or Matplotlib. We are fully capable of writing our own packages, for either our own or others' use. Packaging your most used functions in separate files makes them easier to access, less likely to get corrupted, and keeps your programs cleaner. 

To write our own packages, the only requirement is that we use .py files rather than Jupyter notebooks. Once we have some functions written into a .py file, we can import it just like any other package using the file name (without the .py extension), assuming the file is in the working directory or a different recognized package directory. 

Take a look at the files in the Week_06 directory called *my_package.py* and *import_example.py*. Does the syntax in these files (and their #2 counterparts) make sense to you? There should only be 1 part which seems foreign, and it's in *my_package2.py*. So what do you think this code does?:

In [None]:
if __name__ == "__main__":
    #code

if statements shouldn't be new, but you might be confused by this strange double underscore syntax. The astute might also notice this is referencing a variable we haven't defined. Try running *my_package.py*. Does it run? What does this tell us about what Python is doing behind-the-scenes when we run a program?

Based on the behavior of those 4 files (two import examples and 2 my packages), when is the above **if** condition true?

Hopefully, you've guessed that the **if** statement allows us to have a program alongside function definitions that only gets executed if we run the file directly. This can be useful if we want to let our functions be import by other programs without also running our main code. A cleaner solution is to separate your library into its own file and import it to the file you intend to run directly, but sometimes this is an important structure to know.