# Python and Jupyter Basics
### Workshop 0 of DASIL's series on "Data Science with Python"
### Created by Martin Pollack

Welcome to the series!

Right now you are looking at a Jupyter Notebook. This is a popular filetype for doing Data Science in Python, and we will be using it often.

Jupyter Notebooks consist of two kinds of blocks, or groupings of text.

In "code" blocks you can write your actual Python code, and outputs coming from your code are printed out nicely at the end of each block. To run your code, just hit the "play button" or right-pointing arrow in the upper left corner of a selected code block.

Then there are also "markdown" blocks where you can write normal text to describe or give context for your code, like the one you are reading right now! There are lots of things you can do in these to customize your text, but we will only briefly discuss one of them: headers.

# If you put a single "#" before text, it will be a large header
## With two "#" symbols, you get a medium-sized header
### Three of them means you get a small header

Blocks can be edited by double-clicking on them. They can be deleted by double-clicking on them and then clicking on the trash can icon in the top-right corner of the block.

You can also create new blocks by clicking on the block you want to be before your new one and then selecting either "+ Code" or "+ Text" in the top-left corner of Google Colab.

And that's basically all you need to know about Jupyter Notebooks. In time you'll learn to love them as much as I do.

# Now let's get to Python!

## Commenting your code

Besides writing code in our code blocks, it is important to comment our code. This means giving short descriptions of what our code does. This makes is so that when we come back and look at our code later, or if we share our code with someone else, it is properly understood.

If you write a `#` symbol on the start of a line in a code block, everything after that symbol will be interpreted by Python as a comment. This means that it will not try to run the text as code. As you can see below, if you forget the `#` symbol, you will get a weird error message.

In [None]:
# This is a comment
This is not a comment

SyntaxError: invalid syntax (762433041.py, line 2)

## Numbers

Python can do basically anything a basic calculator can do.

Typing numbers and basic arithmetic operations (+, -, *, /) we can do things like

In [4]:
1 + 1

2

And the result of our addition is displayed below our code block.

Some more special operations are ** for exponentiation, // for quotient division (divide and then round down to the nearest integer), and % for remainder.

In [7]:
2 ** 3

2 // 3
4 // 3

10 % 3

1

Notice that in this last code block we typed four expressions, each on their own line.

However, Jupyter by default only returns the result of the last line of a code block. To make sure we see the results of doing all expressions, we can use Python's `print()` function. See below.

In [8]:
print(2 ** 3)

print(2 // 3)
print(4 // 3)

print(10 % 3)

8
0
1
1


So far we have only used integers, or numbers without decimal portions.

But of course Python can also deal with numbers with decimals, and these numbers are called `floats`.

In [14]:
print(1.7 + 0.01)

1.71


#### Exercise #1
Use the `print()` function to output the result of dividing 5 by 7.

In [None]:
# Your code here

## Strings

Another common data type in Python is strings, or sequences of characters. These are surrounded by "".
Below are some examples.

In [19]:
print("123")
print("abc")

123
abc


You can create new strings by putting together two other strings. This is done with the `+` operator.

In [23]:
print("String1" + "&" + "String2")

String1&String2


Python also makes it easy to choose specific characters from a string.

This is done with square brackets `[]` right after a string.

You can select an individual character by enclosing the index of that character in the brackets.

NOTE: Python uses zero-indexing, meaning the first element has index 0, the second element has index 1, etc.

In [31]:
print("abcdefgh"[0])
print("abcdefgh"[3])

a
d


You can also select a range of characters. 

In the square brackets type the index of the first character you want to choose, then a colon `:`, then one more than the index of the last character you want.

This ensures that the difference of the second number you type and the first number you type is the number of characters returned.

In [33]:
print("abcdefgh"[0:2])
print("abcdefgh"[3:7])

ab
defg


If you want your range to start from the beginning of the string, leave the first number blank. Similarly, if you want your range to go until the end of the string, leave the second number blank.

In [37]:
print("abcdefgh"[5:])
print("abcdefgh"[:2])
print("abcdefgh"[:])

fgh
ab
abcdefgh


Also, you can start counting from the back of a string using a negative sign.

So, for example, an index of -1 refers to the last character, and -2 refers to the second to last character.

In [62]:
print("abcdefgh"[1:-1])
print("abcdefgh"[1:-6])
print("abcdefgh"[-3:-1])

bcdefg
b
fg


#### Exercise #2
Pring out the results of joining the strings "Hello" and " World" together and then selecting first five characters of the new combined string.

In [None]:
# Your code here

## Booleans

Another important data type is the boolean, which is either the value `True` or the value `False`.
These are typed without quotes, differentiating them from strings.

In [41]:
print(True)
print(False)

True
False


Booleans are usually seen as the result of some sort of test.

To test for equality of numbers and strings, make sure to use two equal signs `==`. Then testing if things are NOT equal uses the operator `!=`.

We can also compare these types using the following symbols: `<`, `<=`, `>`, `>=`. For numbers the meanings of these comparators is straight forward. For strings, it ignores case and looks at the alphabetical order of the characters, and non-alphabetic characters are considered less than alphabetic characters.

In [3]:
print(1 == 2)
print(1 != 2)

print(2 < 1)

print("c" == "c")

print("a" < "b")
print("A" < "b")

print("1" <= "A")

False
True
False
True
True
True
True


Sometimes it is also helpful to chain together multiple tests into one large test. For that we can use the logical operators `and` as well as `or`.

Then `(test1) and (test2)` is `True` only if both `test1` and `test2` are true. If at least one of `test1` or `test2` is false, then the overall test is `False`.

Next consider `(test1) or (test2)`. The overall test is `True` if `test1`, `test2`, or both are `True`. The overall test is `False` only if both `test1` and `test2` are `False`.

In [80]:
print((1 == 1) and (2 != 1))
print((5 > 1) and (5 < 1))

print(("Test" == "Test") or ("Test" != "Test"))
print(("1" > "2") and ("1" >= "2"))

True
False
True
False


When comparing booleans, things are a little different.

Instead of using `==` to test for equality, we use the keyword `is`. Also, instead of using `!=` for inequality, we use the two keywords `is not`.

In [4]:
print(True is True)
print(True is not False)

True
True


## Variables

So far we have only used various data types once. But for most applications we will want to save values to use later.

This is what variables are for. They give a name to a piece of data so you can refer to or alter it later.

To create the variable just put the name of the variable you want to create, an equal sign, and the piece of data you want to be referenced by your variable.

You can then reference your variable later by typing its name.

In [63]:
# create var1 variable
var1 = 1

# change var1 variable so that it refers to its old value plus 1
var1 = var1 + 1

# create var2 variable
var2 = 3

# create sum variable which is the result of adding var1 and var2
sum = var1 + var2

# print out sum variable
print(sum)

5


Python is a dynamically typed language, meaning you do not have to explicitly say what type of data a variable references. This also means that a variable is very flexible and can change its type whenever you want.

In [5]:
num = 1
print(num)
num = "one"
print(num)

1
one


#### Exercise #3
Save the result of multiplying 2 by 9 to a variable called `result`. Then, print out `result`.

In [6]:
# Your code here

## Collections of data: Lists

So far we have only looked at single pieces of data at a time, be that a single number or a single sequence of characters in the form of a string.

But many times, especially for Data Science, we want to look at collections of multiple individual pieces of data.

One of the simplest forms of collections is a list. It is a one-dimensional collection with a finite number of values that are ordered. The same value can appear multiple times in a list.

Lists are created by typing square brackets `[]` with the individual values separated by commas.

In [94]:
# make a list of just numbers
nums = [1, 2, 5, 7]
print(nums)

# make list of different types
otherList = [1, "one", "1"]
print(otherList)

[1, 2, 5, 7]
[1, 'one', '1']


Accessing and changing elements of a list uses a fairly similar strategy to accessing individual characters in a string from before.

Directly after an actual list or a variable referencing a list, use square brackets and indices to select values from the list.

In [92]:
# get first element from list created above
print(nums[0])

# get second element from a new list
print(["apple", "banana", "cherry"][1])

1
banana


#### Exercise #5
Create a list containing the strings "USA", "Iowa", and "Grinnell" and save it to variable. Then select the first two elements using square brackets `[]` and a `colon`. 

Hint: look back at the section on strings to remember how to select multiple indices at once.

In [None]:
# Your code here

## Functions

Functions are very important in programming. They take various inputs, run lines of code, and potentially return an output.

They allow you to write code once and then use it over and over again with just a simple call to the function. Here we will only be using functions other people have already written, so let's focus on using a function.

We have already actually done this already. `print()` is function that takes a single input, like a number, and then prints it out at the bottom of a code chunk.

So to call a function, start by writing the name of the function and an open parentheses `(`. Then type the various inputs separated by commas, and then lastly type a close parentheses `)`. The order in which you type in the inputs matters, and Python will match things up. But if you want to make sure that the input is doing what you thing, you can put the name of the input before the value separated with an equal sign `=`.

An example is below. We are using the `round()` function, which has two parameters. First we type the number that is being rounded, and second we type to how many digits we are rounding. The name of this second input is `ndigits`, so to be sure things are matched up well we type `ndigits=1` inside our parentheses. But by default the second input is `ndigits`, so we really do not need to be that specific here.

In [19]:
# make things are matched up with ndigits
round(5.1234, ndigits=1)

5.1

In [7]:
# however, things are matched up on their own
round(5.1234, 1)

5.1

## Classes and Objects

Python is an object-oriented programming language. This means that the language makes use of classes and objects.

A class is like a template or blueprint for a data type. It has fields, or values, and methods, or functions that can be called on things of the class.

Below is an example of a class called `SimpleClass`. This class has two fields, `x` and `y`, and one method, `sum`, that adds the two fields together and returns the result. Don't worry about all the details here. We will not be building classes but rather using ones others have built.

In [41]:
class SimpleClass:
    x = 1
    y = 2

    def sum(self, z):
        return self.x + self.y + z

Then an object is a filled-in version of the class. We use the template of the class to build an actual object of which we can access fields and call methods.

Below we create an object from our `SimpleClass` and save that object to the variable titled `object`.

In [42]:
object = SimpleClass()

The field of an object can be accessed by typing the variable containing the object, then a period `.`, and then the name of the field you want.

Below we access the `x` and then the `y` field of our object.

In [43]:
print(object.x)
print(object.y)

1
2


Then methods can be run by typing the variable containing the object, then a period `.`, then the name of the method, followed by parentheses containing sometimes inputs.

Below we call our `sum` method. Once we make `z=0` in our method so our sum is just `x+y`. The other time we give an input of 3 meaning the result is `x+y+3`.

In [45]:
print(object.sum(0))
print(object.sum(3))

3
6


Some of the data types we have come across already are actually objects stemming from classes.

For example, there is a string class, and everytime we type characters between double quotes we are actually creating an object.

In [53]:
stringObject = "aBc,123"

Strings actually have a lot of useful methods, a few of which we show below.

In [56]:
# makes all characters lowercase
print(stringObject.lower())

# makes all characters uppercase
print(stringObject.upper())

# split the string on its commas, which was specified as an input
# Returns a list containing the various parts after splitting
print(stringObject.split(","))

# Returns the index of the character "a" in the string object
print(stringObject.find("a"))

abc,123
ABC,123
['aBc', '123']
0


Lists are also a class, meaning actual lists we create are objects, which have methods.

In [61]:
# create list object
ls = [1, 4, 52]

# add the number 4 to the end of the list
ls.append(4)
print(ls)

# Returns the number of 4's in the list
ls.count(4)

# Sorts the list in descending order
ls.sort(reverse = True)
print(ls)

[1, 4, 52, 4]
[52, 4, 4, 1]


#### Exercise #6

Create an empty list and save it to a variable. An empty list is just square brackets with nothing in them `[]`.

Then use the append method on your variable to add the numbers 5, 6, and 7 in that order to your list.

Finally, print out your list.

In [None]:
# Your code here

## Loops

Sometimes we want to do a similar task over and over again. One option to do this is write each line of code, making the slight adjustment we need to each time.

For example, we can print out the numbers 1 to 5 the following way:

In [None]:
print(1)
print(2)
print(3)
print(4)
print(5)

1
2
3
4
5


But what if we want to print out the numbers 1 to 20? Then typing out 20 commands would be really time consuming, and it is also pretty easy to make a typo this way.

Instead, Python allows us to make loops, which perform certain actions a specified number of times. We create a variable and assign it to each number in a range, each time running a specific command.

For example, let's use a loop to print out the numbers 1 to 5.

We use the keyword `for` followed by the variable we want to create, in this case `i`. We then use the keyword `in` followed by the `range()` function, which tells Python want range of numbers we want to consider.

If you only give `range()` one input, then your variable `i` will take on values from 0 to that input minus 1.

Then you write a colon, with the statement you want repeated below what you have already written and indented. This statement will be repeated a number of times equal to your input, meaning in this case our statement will run 5 times. But each time it runs, `i` will be the next number in the range. 

First `i` will be 0, then 1, then 2, then 3, and lastly 4. To make it so the numbers 1 to 5 are printed instead of 0 to 4, we add a `+1` inside our print statement.

In [None]:
for i in range(5):
    print(i+1)

1
2
3
4
5


#### Exercise #4

Use a for loop to print out the string "Hello" 7 times.

In [None]:
# Your code goes here

## Importing Modules

Sometimes we have to write our own functions and classes. But usually the things we want have already been written by someone else. It saves us a lot of time then, to reuse other people's code.

This is where modules come into play. Modules are collections of classes, objects, and functions that are all ready to use. All you have to do to use the things in them is import the module. This is done by typing the keyword `import` followed by the name of the module.

In [2]:
import numpy

Then everytime you want to use a function or object from the module you imported, you have to type the name of the module, a dot `.`, and then the name of the thing in the module you want to use.

In [21]:
# create a NumPy array object. 
# You will more about this in the first session
numpy.array([1,2,4])

array([1, 2, 4])

Typing out the full name of the module each time you want to use something in it can get a little annoying. If you want to give your module a new name (maybe a shorter one) that you can use to refer to it, you can use the `as` keyword after the import statement.

In [22]:
# import with abbreviation np
import numpy as np
# create another NumPy array object using abbreviation
np.array([1,2,4])

array([1, 2, 4])

Lastly, some modules can be really big. Importing every single class and function from a module could take up extra space on your computer and slow it down. To avoid this, you can specify which exact things should be taken from the module.

Just type `from` followed by the module name, and then type `import` followed by the specific things you want.

The only problem is that now you are forced to type out the full name of the module.

In [14]:
# just get the array class and the sum function from NumPy
from numpy import array, sum
# Find the sum of a NumPy array
numpy.sum(numpy.array([1,2,4]))

7