<a href="https://colab.research.google.com/github/grahamachristie/GGE6505/blob/main/Introduction_to_Python_Part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GGE-5405/6505 Big Data

# **Intro to Python - Part 1**





---



# **1. Introduction to Google Colab**
---- *adapted from the original notebook* [Overview of Colaboratory Features](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)

Why Google Colab? 

*   Can access GPU for enhanced processing
*   Many common python libraries pre-installed
*   Built on top Jupyter Notebook
*   Basic use is free of charge
*   Supports Bash commands
*   Can store on Drive or GitHub, good for collaboration








**NOTE**: Check out the [Google Colab FAQ](https://research.google.com/colaboratory/faq.html) page for a good overview of the platform and a description of its uses and capabilities.




## Cells
A notebook is a list of cells. Cells contain either explanatory text or executable code and its output. Click a cell to select it.

### Code cells
Below is a **code cell**. Click in the cell to select it and execute the contents in the following ways:

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu.

In [None]:
a = 10
a

Note that we have just created a variable called 'a' which now shows up in the 'Variables' tab to the left.

### Text cells
This is a **text cell**. You can **double-click** to edit this cell. Text cells
use markdown syntax. To learn more, see our [markdown
guide](/notebooks/markdown_guide.ipynb).

### Adding and moving cells
You can add new cells by using the **+ CODE** and **+ TEXT** buttons that show when you hover between cells. These buttons are also in the toolbar above the notebook where they can be used to add a cell below the currently selected cell.

You can move a cell by selecting it and clicking **Cell Up** or **Cell Down** in the top toolbar. 

Consecutive cells can be selected by "lasso selection" by dragging from outside one cell and through the group.  Non-adjacent cells can be selected concurrently by clicking one and then holding down Ctrl while clicking another.  Similarly, using Shift instead of Ctrl will select all intermediate cells.

## Setting the GPU Accelerator

With Google Colab, you have the option to use CPU or GPU. By default, Google Colab uses CPU. To change to GPU, do the following:

```
1. Click on ‘Edit’ > ‘Notebook Settings’ > ‘Hardware Accelerator’ > ‘GPU’.

OR

2. Click on ‘Runtime’ > ‘Change Runtime Type’ > 'Hardware Accelerator' > ‘GPU’.

```

Wen running a cell, make sure the runtime is connected. The notebook shows a green check and ‘Connected’ on the top right corner.
There are various runtime options in ‘Runtime’.


## Coding in Google Colab

Long running python processes can be interrupted. Run the following cell and select Runtime -> Interrupt execution (hotkey: Cmd/Ctrl-M I) to stop execution.

In [1]:
import time
print("Sleeping")
time.sleep(30) # sleep for a while; interrupt me!
print("Done Sleeping")

Sleeping


KeyboardInterrupt: ignored

Bash commands can be used in Code cells by prefixing with '!'

In [2]:
# check what path we are in
!pwd

/content


In [3]:
# list contents
!ls

sample_data


In [4]:
# make a new directory
!mkdir demo2
!ls

demo2  sample_data


In [5]:
# check python version
!python --version

Python 3.8.10


## Saving colab notebook to GitHub

You can save your notebook on google drive, download it to your machine, or save a copy to GitHub. For this course, we will be using GitHub to share content. If you do not have a GitHub account, you can go to https://github.com/join and follow the instructions to create an account. 


To create a repository:


1. Open your GitHub account.
2. Click on the + sign in the top right corner of the page.
3. Create a name for your repository and add a description.
4. Choose if repository will be public or private.
5. Select 'Add a README file' option to initialize the repo.
6. Click 'Create Repository'.

Now that you have a repository created for the class, you can save your colab notebooks to github and share it with others. Note that if your repo is private, you will need to add members to your repo to be able to share your notebooks and other content.




---



---



# **2. Introduction to Python Programming Language**

This tutorial offers a basic introduction to Python syntax, variable assignment, and basic operations.



Just for fun, try reading over the code below and predicting what it's going to do when run. (If you have no idea, that's fine!)



Python is a high-level, dynamic programming language. Python code is often said to be almost like **pseudocode**, since it allows you to express very powerful ideas in very few lines of code while being very readable. 


There are currently two different supported versions of Python, 2.7 and 3.7. Somewhat confusingly, Python 3.7 introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under 3.7 and vice versa. For this laboratory all code will use Python 3.7+. The latest python version can be found here: https://www.python.org/downloads/


## **Variables and Types**
Python is not "statically-typed". This means you do not have to declare all your variables before you can use them. You can create new variables whenever you want.

```
A variable is a symbolic name that 'points' to an object.
```

Lets go through some basic syntax when working with variables:


```

*   To assign a variable with a specific value, use `=`

      *  A variable name must start with a letter or the underscore character.
      *  A variable name cannot start with a number.
      *  A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ ).
      *  Variable names are case-sensitive (name, Name and NAME are three different variables).
      *  The reserved words(keywords) cannot be used naming the variable. https://realpython.com/lessons/reserved-keywords/


```





```
*   To test whether a variable has a specific value, use the boolean operators:
        equal: `==`
        not equal: `!=`
        greater-than: `>`
        less-than: '<'
```

```
*   comments: You can also add helpful comments to your code with the `#` symbol. Any line starting with a `#` is not executed. 
```

In [None]:
# This is a comment
# If you execute this cell, nothing will happen!

### **1. Numeric** 

Common numeric data types supported by python incude integers and floating point numbers. Integers are whole numbers (e.g. 9), while floats are fractional (e.g. 9.321). You can also convert integers to floats, and vice versa, but you need to be aware of the risks of doing so.


**Follow along with the code below.**

*Note: A green check to the left of the code cell indicates that the cell has been run successfully.*

In [None]:
# Create a variable named 'intnum' and assign it the integer value 7
intnum = 7

In [None]:
# Check the data type using type()
type(intnum)

In [None]:
# Create a variable named 'floatnum' and assign it the floating point value 7.4
floatnum = 7.4
type(floatnum)

In [None]:
# Or you could convert the integer you already have
int_to_float = float(intnum)
int_to_float

In [None]:
# Check the new data type
type(int_to_float)

In [None]:
# Now see what happens when you convert a float to an int
float_to_int = int(7.3)
float_to_int

We can use print() to display the value of a variable on the screen. 

In [None]:
intnum = 7
floatnum = 7.4

print(intnum)
print(floatnum)

This is especially useful to display multiple things at once. Note that when using print(), text must be inside quotation marks.

In [None]:
intnum = 7
floatnum = 7.4

print(intnum, "is an integer and",floatnum,"is a floating point number.")

Other ways to format your print() command:

In [None]:
intnum = 3
print("My lucky number is: {}".format(intnum))

If a variable is assigned a value more than once, it will take the last value assigned.

In [None]:
intnum = 7
intnum = 8

print(intnum)

We can also assign values to two variables at the same time.

In [None]:
intnum, floatnum = 3, 3.
print(intnum, floatnum)

In [None]:
type(intnum)

In [None]:
type(floatnum)

#### **Arithmetic operators**
As you would expect, you can use the various mathematical operators with numbers (both integers and floats).

In [None]:
# Does Python follow order operations hierarchy?
num1 = 1 + 2 * 3 / 2.0
num2 = (1 + 2) * 3 / 2.0
print(num1, num2)

In [None]:
# The modulo (%) returns the integer remainder of a division
remainder = 11 % 3
print(remainder)

In [None]:
one = 1
two = 2
three = one + two
print(three)

In [None]:
# Two multiplications is equivalent to a power operation
squared = 7 ** 2
print(squared)

###**2. Booleans**
Boolean data type is a data type that has one of two possible values which is intended to represent the two truth values of logic: if an expression is True or False.

In [None]:
print(4 < 7)

In [None]:
a = 6
print(a > 4)

In [None]:
a == 7

In [None]:
a < 7

### **3. Strings**
Strings are the Python term for text. You can define these in either single or double quotes.



In [None]:
mystring = "Hello Class and Happy Tuesday!"
print(mystring)

You can also apply simple operators to your string variables, or assign multiple variables simultaneously.


In [None]:
# In this example, we concatenate two string variables to form a new variable
hello = "Hello,"
world = "World!"
helloworld = hello + " " + world
print(helloworld)


Note, though, that mixing variable types causes problems.

In [None]:
one = 1
two = 2
print(one + two + hello)
type()

Python will throw an error when you make a mistake like this and the error will give you as much detail as it can about what just happened. This is extremely useful when you're attempting to "debug" your code.

In this case, you're told: `TypeError: unsupported operand type(s) for +: 'int'` `and 'str'`

And the context should make it clear that you tried to combine two integer variables with a string.

You can also combine strings with placeholders for variables:



In [None]:
print(one, two, hello)

You can use len() to get the length of your variable (e.g., number of characters in a string)

In [None]:
a_string = "Hello, World!"
len(a_string)

In [None]:
print("String length: {}".format(len(a_string)))

Some more string operations:

In [None]:
# You've already seen arithmetic concatenations of strings
helloworld = "Hello," + " " + "World!"
print(helloworld)

In [None]:
# You can also multiply strings to form a repeating sequence
manyhellos = "Hello " * 10
print(manyhellos)

In [None]:
# But don't get carried away. Not everything will work.
nohellos = "Hello " / 10
print(nohellos)

In [None]:
# upper() and lower() are methods that work on strings
# unlike a function, a 'method' is used on an object
sent = "The weather is Nice"
sent = sent.lower()
sent

In [None]:
sent = sent.upper()
sent

In [None]:
# you can use a boolean operation for verification
print(sent.isupper())

In [None]:
# or include the entire operation within the print statement
print(sent.upper().isupper())

In [None]:
# print(sent.upper())
print(sent.lower())
print(sent.lower().isupper())
print(len(sent))

In [None]:
# Access certain characters using their indices (known as slicing)
sent[0:3] # square brackets are used to extract indices/positions of elements in the string

In [None]:
# return the location of a charactor 
sent.index("i")

In [None]:
# we need to change back to lower case
sent = sent.lower()
sent.index("i")

In [None]:
# count how many times a character is present
sent.count("e")

In [None]:
# if a character is present more than once, the first position will be returned
sent.index("e")

In [None]:
# find and return position of first character
sent.find("is")

In [None]:
# replace 
sent.replace("the", "In")

For more string operations and to get more practice, check out the following link: https://www.w3schools.com/python/python_ref_string.asp


### **4. Lists**
Lists are used to store multiple items in a single variable.

`Lists` are an ordered collection of elements that are assigned a unique value. Each element in a list has an index value associated with it to identify its postion in the list. You can **combine as many variables as you like**, and they could even be of multiple types. Ordinarily, unless you have a specific reason to do so, lists will contain variables of one type.

You can also **iterate** over a list (use each item in a list in sequence).

A list is placed between square brackets: `[]`

In [None]:
# created using square brackets
mylist = [1,2,3]
print(mylist)

In [None]:
# append items to the list
# note that 'append()' changes the variable and a new variable assignment is not necessary

mylist.append(4)
mylist.append(5)
mylist.append(6)

print(mylist)

Like with strings, you can also extract subsets of the data in a list.

In [None]:
# The first element in a Python list starts at position 0
# similar to strings, we can use [] to extract list elements by index number
# Here, 1:4 indicates a range; from position 1 up to BUT NOT INCLUDING position 4

print(mylist[1:4])

In [None]:
# The last item in a Python list can be addressed as -1. 
# This is helpful when you don't know how long a list is likely to be.
print(mylist[-1])

In [None]:
# adding a third argument to the range = step size
print(mylist[1:-1:2])

If you try access an item in a list that isn't there, you'll get an error.

In [None]:
print(mylist[10])

In [None]:
# We can concatenate two lists using '+'
even_numbers = [2, 4, 6, 8]
uneven_numbers = [1, 3, 5, 7]
all_numbers = uneven_numbers + even_numbers
print(all_numbers)

In [None]:
# You can also repeat sequences of lists
print([1, 2, 3] * 3)

We can also create a list of values with datatype string

In [None]:
friends = ["Mary", "George" , "Ali", "Sher"]
print(friends)

In [None]:
friends[1]

In [None]:
friends[0:2]

In [None]:
friends.append("Alicia")
print(friends)

In [None]:
friends.remove("Sher")
print(friends)

In [None]:
#if an element is in the list? check the index
friends.index("Ali")

In [None]:
friends.count("Ali")

In [None]:
# Insert a value in a specified index position
friends.insert(0, "")
print(friends)

In [None]:
friends.sort()


In [None]:
print(friends)


We can also iterate through a list using a 'for' statement to perform some function on each element individually. (more on looping below!)

Note that this is the first time that we need to run a block of code simultaneously. Python knows when parts of code need to be executed together based on the indentation. This can be either tabs or spaces (but do not mix the two).

In [None]:
# Note that `x` is a new variable which takes on the value of each item in the list in order.
for x in friends:  
  print(x)

Careful when modifying data in place: if two variables point to the same list, modifying an item in one variable will modify that item in the second variable as well. If that is undesirable, you can make a copy instead.

In [None]:
yellow_fruit = ["lemon", "banana", "pineapple"]
fruit = yellow_fruit # both variables point to same list
fruit.append("cherry")
print(yellow_fruit)

In [None]:
yellow_fruit = ["lemon", "banana", "pineapple"]
fruit = list(yellow_fruit) # creates a copy of the list
fruit.append("cherry")
print(yellow_fruit)
print(fruit)

####**Nested (2D) List**

Sometimes, items in a list are composed of multiple values. For such a case, nested lists can be used (i.e., lists inside a list).

In [None]:
numbers = [
         [1,2,3],
         [4,5,6],
         [7,8]
]
numbers

In [None]:
numbers = [[1,2,3], [4,5,6],[7,8]]
numbers

Extracting subsets from a nested list is a bit different.

In [None]:
# extract elements from 2D list
numbers[0]

In [None]:
numbers[0][0]

With 2D lists, we can use 'for' loops to iterate through the rows and columns.

In [None]:
# nested for loop

for row in numbers:
  for col in row:
    print(col)
    

*Note that looping through lists can be very inefficient and we will make use of the library numpy in future labs.*

###**5. Tuples**


A Tuple is a collection of Python elements separated by commas. In someways a tuple is similar to a list in terms of indexing, nested objects and repetition but a tuple is **immutable** unlike lists which are mutable. 

With Tuples, You cannot changes or modify, delete, add them.

In [None]:
coordinates = (118, 441)

In [None]:
# Access the elements 
coordinates[0]

In [None]:
# Can we change the items?
coordinates[1] = 6669796685

In [None]:
# use built-in function list() to list the tuple values
print(coordinates)
print(list(coordinates))


In [None]:
coordinates2 = [(1,2), (3,7), (7,5)]
coordinates2[1]