<a href="https://colab.research.google.com/github/poepping/hello-world/blob/main/Code0_PythonBasics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preliminary: This is a Jupyter Notebook!

Notebooks allow the author to include both text and code in the same file. This part is text, but the next part is code:

In [None]:
print("Hello World!")

Hello World!


Notice how the output also showed up. This allows for **reproducible research**. The reader does not need to search around to find which code was used to calculate the values that appear in the paper or report. Increasingly, scientific papers are including Juptyer notebooks as supplementary material to their published academic papers. This makes peer review much easier and makes it easier for other authors to build on their work.

The code cells must be run from within the notebook, then the output is saved. When notebooks are opened, the saved output is shown. If you change the code, the notebook will still show the saved output until the code is run again. This is convenient since code that takes a long time to run doesn't need to be run every time you open the notebook.

Since this is my notebook, your changes won't be saved. That means you're free to add as many code blocks as you like without breaking anything! You're free (and **STRONGLY** encouraged) to experiment with anything and everything. If you want to save any changes, you'll need to save it to your own Google Drive Folder.

To view the associated video, click [here.](https://www.youtube.com/watch?v=cElPzxEwIA4)

# Basic Introduction to Python for Data Science

Python is a general-purpose language. There are tons of people using it for tons of different things in tons of different ways. We're going to go over some of the topics required for data science, but be warned that you'll find a wide variety of answers when you put your questions through the Google machine.

Before we begin, some extraordinarily important rules for learning to code: 

1. NEVER COPY/PASTE
    - You learn nothing unless you've written it yourself.
2. ALWAYS CHECK THE RESULTS
    - Python is interactive; interact with it! Look at what you created, find out what you can do with it, learn it's properties.
3. SAVE OFTEN
    - Should be self-explanatory. Python doesn't auto-save the way Word does.
4. BREAK IT
    - At this level, there's very, very little that you can do that will affect your computer at all. As long as you're SAVING YOUR CODE OFTEN, you can just restart the program and fix any changes that you can't undo.
5. TEACH IT
    - Teaching it forces you to evaluate what you know and how well you know it. It also helps to create your own examples so that you can't just copy/paste code from what you've seen.
    - I suggest keeping track of the things that you found tricky, then making your own Jupyter Notebook with explanations that make sense to you.

### Lists and 0-Indexing

Python has many data types. We're going to focus on data frames (similar to Excel spreadsheets) later, but lists are a good introduction to how Python works. In the code below, we make a list and immediately display it.

In [None]:
["1", 2, 3.0, True]

['1', 2, 3.0, True]

This list contains the number 1 (as a **string**), the number 2 (as an **integer**, which has no decimals), the number 3 (as a **float**, so Python expects decimals), and the value True (a **boolean**, i.e. either `True` or `False`). All **elements** of this list are different **types**, and Python does not need to change this.

Python has already forgotton about that list. We created it, displayed it, then moved on. We never told Python to keep track of it. In order to do so, we **assign** the list to an **object**.

In [None]:
my_list = ["1", 2, 3.0, True]
# Type the name of an object to see what it contains:
my_list

['1', 2, 3.0, True]

Now Python will remember this list. The next time we want to see what's in the list, we can just type the name `my_list` in a code block and run that block. 

In order to access the individual elements, we can use square brackets. 

In [None]:
my_list[1]

2

The first element is... 2?

Python uses **0-indexing**; it is a zero-indexed language. The human way to read indices is to say `my_list[1]` is the first element, but this is not the computer way (although some other programming languages do use 1-indexing).

Computers see it this way: The actual list is stored in 1s and 0s in your RAM, with each element being a separate entry. Your computer stores this kind of like spaces on a board game. The name `my_list` is a pointer to the space on the board where the first element is. From here, `mylist[0]` tells the computer "go to `my_list` and move 0 spaces; `mylist[1]` says "go to the space that my list is on and then move one more space," and so on.

# Lists with One Kind of Element

In data science, we'll usually be working with list-like objects rather than proper lists. These list-like objects are based on lists, but all of the elements are the same type. 

For this, we'll be using a **package** called `Numpy`. To bring in a package, we use the line of code `import numpy`. To use functions, we then write `numpy.array()` to make a numpy array. We are far too lazy, so instead we import it with an **alias**.


In [None]:
import numpy as np

# Create an array, which is a list where all of the 
# elements are the same type
np.array(["1", 2.0, 3, True])

array(['1', '2.0', '3', 'True'], dtype='<U4')

As you can see, we gave numpy a list with a string, a float, an integer, and a boolean. The array came back as all strings. This is because anything can become a string, whereas only certain things can be converted into boolean. Another example: all ints can becomes floats, but not all floats can become ints. The hiearchy is as follows: booleans can always become integers (with False == 0 and True == 1), integers can become floats, and floats can become strings, but we can't always go backwards in the list.

In [None]:
np.array(["1", 2.0, 3, True], dtype = "int")

array([1, 2, 3, 1])

The `dtype` argument told numpy to force everything to be integers. The `"1"` is a string and this might have caused errors, but `1` can be cast as an integer so this worked. 

**Homework:** What would have happened if we included a float or string that could not be converted into an integer, e.g. if there were a 2.4 or "two point four" in the list?

In the next code cell, I've dumped a bunch of syntax all over the screen. I've started adding comments to describe what's going on, but I got lazy. Your **homework** is to add descriptive comments to this notebook so that you know what's going on.

In [None]:
my_ints = np.array(["1", 2.0, 3, True, 7, 9, "-2", -1], dtype = "int")
print(my_ints)
# I've added print() statements so the Python shows you the output of each line.
# Without the print statements, Python evaluates it and moves on to the next line
print(my_ints[1]) # second element only (one step from the start of the list). The output is NOT a list.
print(my_ints[1:4]) # The second to the fourth element. 
# Note that the 4 refers to the fourth element but the 1 refers to the second. Python is weird.
print(my_ints[:4]) # 
print(my_ints[:]) # The whole list
print(my_ints[0:6:2]) # 
print(my_ints[[1, 2, 4]]) # 

[ 1  2  3  1  7  9 -2 -1]
2
[2 3 1]
[1 2 3 1]
[ 1  2  3  1  7  9 -2 -1]
[1 3 7]
[2 3 7]


# Functions, Methods, and Attributes

We'll be doing a lot with arrays (and other objects) in the future, and there are three main ways we interact with them.

- **Functions** take in 0 or more arguments and return something.
    - `functionname(argument1, argument2, ...)` or `functionname()`
- **Methods** are like functions that are tied to a particular object.
    - `object.methodname(argument1, argument2, ...)`
- **Attributes** look like methods, but don't take arguments (and don't have brackets). Each object has attributes associated with it, and an attribute call just shows you what they are. It doesn't calculate anything new.
    - `object.attribute` 
        - Notice that there are no brackets.

In [None]:
# The max function works on many different types of objects
print(max(my_ints)) 
# Arrays have a "max" method (not all objects do)
print(my_ints.max())
# the size attribute is just how long the array is (NOT the maximum element)
print(my_ints.size) 

9
9
8


Finally, you can get help for each using the help function for functions and methods.

In [None]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



In [None]:
help(my_ints.max)

Help on built-in function max:

max(...) method of numpy.ndarray instance
    a.max(axis=None, out=None, keepdims=False, initial=<no value>, where=True)
    
    Return the maximum along a given axis.
    
    Refer to `numpy.amax` for full documentation.
    
    See Also
    --------
    numpy.amax : equivalent function



This concludes the introduction. Most of what you'll learn in this course is:

- Programming concepts, like `for` loops.
- New object types, such as data frames and objects that are the result of models.
- New functions
- New methods

I recommend keeping a running list of objects and functions/methods that you encounter, along with a brief example. Create your own ipython notebook so that you can explain everything to yourself.

# Further Reading

The following links provide a little bit more information, but are not necessary. All of these links are from W3Schools, which is a fantastic reference but it is very dry and would be very difficult to use as the main learning resource. Remember the rules from above!

- https://www.w3schools.com/python/python_casting.asp Converting from one type to another (a.k.a. **casting**)
- https://www.w3schools.com/python/python_datatypes.asp Other data types
    - Not required for this module, but will be important for you eventually.
- https://www.w3schools.com/python/numpy_intro.asp Intro to Numpy


