<a href="https://colab.research.google.com/github/henrikalbihn/henrikalbihn/blob/main/intro2python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Crash Course in Python 🐍

### *Welcome* to Google Colab! 🎉

Colab (short for Colaboratory) is a Python 3 environment run on GCE (Google Compute Engine) that supports Jupyter notebooks (.ipynb files) and is stored in Google Drive.

> Why Colab over another python environment?

I like Colab because it requires basically no setup and Google makes it super easy to share code using Google Drive. That being said, you should still install [Python](https://www.python.org/downloads/) and [Anaconda](https://www.anaconda.com/products/individual) on your local machine.

> Why Jupyter notebooks over .py files?

I like using Jupyter notebooks because it's easy to teach other people what your code does using ***markdown*** and you can embed images, code cells, ***LaTeX***, and more. That being said, .py files have their use cases (like for automation) and we use both here at Westcliff. Colab also supports downloading your notebooks as .py files. So you can always start with a notebook and convert to a .py.

> See links for more info on [Google Colab](https://research.google.com/colaboratory/faq.html) and [project Jupyter](https://jupyter.org/).



**The below modules are to give you some basic understanding of Python functionality whether you're a complete beginner or you have some experience with programming in another language (like [R](https://ftp.osuosl.org/pub/cran/)).**

## Hello, World! 👋 🌎

As with any intro to Python, we will start by calling the most basic function you can write: `print()`.

If it's not already obvious, `print()` prints a message to the console. We will begin with the tradition used by computer scientists since [1974](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program)!

> **"Hello, world!"**

To run the code cell below, click the play button on the left side of the code cell or click inside the cell and press (CMD/CTRL + ENTER).

In a Jupyter notebook, instead of printing the output to the console, it prints just below the code cell.

In [None]:
# call print() function and pass a string "Hello, world!" as the argument.
print("Hello, world!")

Hello, world!


The code cell begins with a ***comment*** (the line starting with `#`). Comments do not run, they are used to explain what your code does. 
> ***Note on commenting:***
using readable and easily understandable comments whenever possible is good practice. In the event of someone else using your code, this is the only way for them to make sense of your operations. The same goes for looking at your own code months or years later so please ***write good comments***!

### Variable Assignment
Notice that the function prints characters *within* the `string` in double quotes `" "`. You can also pass a ***variable*** to the `print()` function. Here we will create a variable called `message` and assign it the `string` value `"Hello, world!"`. This is called ***variable assignment***. This single `=` is only used for assigning variables, not for checking equality between two values (in that case use `==`).

In [None]:
# single and double quotes are interchangeable,
# just make sure to use the same on both sides
message = 'Hello, world!'
print(message)

Hello, world!


Notice that the output is the ***same***, even though the code in the two cells are ***different***!

> In Python, there are many different ways to code the same thing!

## Data Types/Structures

Python supports many data types, we will cover just the basics. Run the cell below to see what data types are supported:

In [None]:
# run this cell to see the different data types in Python
import pandas as pd # more on this later
pd.read_csv('/content/drive/MyDrive/Colab Data/pythonDataTypes.csv')

Unnamed: 0,Name,Type,Description,Example
0,Integers,int,Whole numbers,3 300 200
1,Floating point,float,Decimal point number,2.3 4.6 100.0
2,Strings,str,Ordered sequence of characters,"""hello"" 'Sammy' ""2000"""
3,Lists,list,Ordered sequence of objects,"[10, ""hello"", 200.3]"
4,Dictionaries,dict,Unordered Key:Value pairs,"{""mykey"" : ""value"", ""name"" : ""Frankie""}"
5,Tuples,tup,Ordered immutable sequence of objects,"(10, ""hello"", 200.3)"
6,Sets,set,Unordered collection of unique objects,"{""a"", ""b""}"
7,Booleans,bool,Logical value,True False


Notice that the first row of data has an index of zero instead of one (*R's index begins with one for example*).

Python is what's known as a 'zero-index language' meaning all indexes begin with zero.

> The `type` function

You can use the `type` function to check the data type of any object. In the code cell below, click a line you would like to run then press CMD/CTRL + / to comment or un-comment a line. You can also highlight multiple lines.

In [None]:
# type(1) # int (integer)
# type(2.3) # float
# type("hello") # str (string)
# l = [1, 2, 3] # assign list
# type(l) # list
# d = {"key":"value","name":"Henrik"} # assign dictionary
# type(d) # dictionary (dict)
# t = (10, "hello", 200.3) # assign tuple
# type(t) # tuple
# s = {"a","b"} # assign set
# type(s) # set
# type(True) # bool (boolean)
# type(message) # remember our message variable? Remember the type?
# # You guessed it, it's a str (string).

float

Notice the *case* of the logical operator `True`. It starts with a capital T, but the rest of the letters are lower-case.

> Python is a ***case-sensitive*** language.

`True` and `TRUE`/`true`/`"True"` are not the same. Mix them up and you'll throw an error. `True` and 1 ***are*** treated the same however (and thus `False` and 0 are also treated the same). More on this in the logical operators section below.

## Logical Operators and Loops 🔃

### Logical Operators

Python supports logical operators. These operations return a boolean (`True` or `False`).

1. `>` = `True` if ***LHS*** is *greater than* ***RHS***
2. `<` = `True` if ***LHS*** is *less than* ***RHS***
3. `>=` = `True` if ***LHS*** is *greater than or equal to* ***RHS***
4. `<=` = `True` if ***LHS*** is *less than or equal to* ***RHS***
5. `==` = `True` if ***LHS*** is *equivalent* to ***RHS***
6. `!=` = `True` if ***LHS*** is *NOT equivalent* to ***RHS***
7. `and` = `True` if both sides are `True`.
8. `or` = `True` if one side (***LHS*** or ***RHS***) is true.
9. `not` = `True` if `False` (flips value).

Let's check if 1 and `True` are the same using the `==` (equivalency) operator and our first `if`, `then` statement.

In [None]:
# Check if 1 and True are the same
t = 1
if t == True: # remember == is for checking equivalency
  print("They're the same!")
else:
  print("Hmm, not quite.")

They're the same!


### `if`, `else` Statements

Notice the syntax and indentations for the `if`, `else` statement. We often use pseudocode to explain the logic behind our code.

> ***Pseudocode*** is a fancy way of saying 
a plain language description of what we want our code to do. Often times in developing, we can use pseudocode to hash out our ideas before we actually turn them into executable code.

The syntax for an `if`, `elif`, `else` statement in pseudocode is like so:

```
if logical condition is True:
  do something
elif different condition is True:
  do something different
else:
  do something else
```

The `if` statement is the first logical condition evaluated. If it evaluates as `True`, then it executes the indented statement below it (`do something`). If it evaluates as `False`, then the loop moves on to the next `elif` condition. This is short-hand for *else-if* and you can have as many `elif` statements as you like. `elif` works just like the `if` condition.

> If the `if` statement and all subsequent `elif` statements evaluate as `False`, then the `else` statement is executed and we exit and move on.

Integers and floats can be evaluated as `True` / `False` as well. Numbers (positive or negative, `int` or `float`) are treated as `True` and zero is treated as `False`.

In [None]:
# try changing the value of c so that the output prints the else statement.
a = 10
b = -12
c = 1.1

if a and b and c:
	print("All the numbers have boolean value as True")
else:
	print("At least one number has boolean value as False")

Let's put together some logical operators and an `if`, `elif`, `else` statement.

In [None]:
# try changing the value b so you can see how the output changes
a = 33
b = 10

if b > a:
  print("b is greater than a")
elif a == b:
  print("a and b are equal")
else:
  print("b is less than a")

 > Also notice the ***indentation***.

**Whitespace is very important in python.** If you tried running the same code with no indents, you will throw an error. Try it out:

In [None]:
# this code will throw an error
a = 33
b = 10
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("b is less than a")

### `for` Loops

`for` loops are very useful.

### `while` Loops

`while` loops are also very useful.

## Dependencies: Modules/Packages/Libraries 📚

Python comes with built-in modules (also called "packages" or "libraries", I will use the terms interchangeably).

They are essentially stores of pre-written functions, data, and more that you can access and use in your code. This allows you to build off of very complicated implementations instead of having to build them from scratch.

(P.S. The one we will focus on for data analysis is called `pandas` 🐼.)

To use a package, start with:

>`import <packagename>`

`import` is used to make functions, data, classes, and more available in the current python environment.

*If you've used R you may have seen `library('tidyverse')` to call the module `tidyverse`.

The package **must** first be installed in your python environment. You cannot use a package without installing and importing it first.

To install a package, run:

>`%pip install <packagename>`

*In R, you would run `install.packages('tidyverse')` for example.

In Google Colab, the package is only installed in the current notebook so to use a package like `seaborn` which is not installed by default, you must install using `%pip install seaborn` every time you open a new notebook. Try it out:

In [None]:
# In a local python environment, you would run this in your console,
# but here in Colab, you can run %pip in the code cells.
%pip install seaborn

In Google Colab, many packages are already installed.

Here's a short script to see what packages are currently installed with version number:

In [None]:
import pkg_resources
import pprint as pp # This one is called 'pretty print'
installed_packages = pkg_resources.working_set
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
   for i in installed_packages])
pp.pprint(installed_packages_list) # Print the list with each item on a new line

> Aliases

In the python community, there are naming conventions for package aliases (ex. `import pandas as pd` or `import numpy as np`). This is so you can reference the package (or a function from within it) with less typing.

So to read in a .csv you can use `pd.read_csv('example.csv')` instead of `pandas.read_csv('example.csv')`.

With `pandas` you only save 4 letters with the alias, but some package names are very long.

You may also encounter `from` which is for referencing just one class from a module (this uses less memory than importing the entire module). e.g. `from datetime import datetime`.

Below I have imported several useful packages with their (conventional) aliases:



In [None]:
from datetime import datetime # datetime class within datetime module
import os # good for portability (operating system)
import pandas as pd # more on this below
import numpy as np # stands for numerical python 
import matplotlib.pyplot as plt # plotting library for visualizations
import seaborn as sns # another good plotting library
import sklearn # scikit-learn is for machine learning/statistical analysis

## Functions $f(x)$

Functions are ...

In [None]:
# yo dawg
def wassup(name):
  print(f'Wassup, {name}?!')

wassup("Trent")

Wassup, Trent?!


# Data Analysis with `pandas` 🐼


### Reading Data

Okay, all of that was pretty longwinded, so let's finally read in our first bit of data. First we assign a string to the variable `file` then we pass it to the first argument of `pd.read_csv` and assign the output to a variable called `df` (short for ***DataFrame***). If you've used R, you might be familiar with ***DataFrames***.

In [None]:
import pprint as pp
file = '/content/sample_data/california_housing_test.csv'
df = pd.read_csv(file)

pp.pprint(df.head(5))
# pp.pprint(df.columns)

# Working with Google Drive using `gspread` ⚡

This section is about how to connect to and read data from Google Sheets (and other files) in your Google Drive.

It is important that you run the Authentication code first before trying any other operations.
> You must authenticate your notebook under your Google account to access any of the files within it.

## Authentication

The below code cell must be copy-and-pasted into your notebook and run to authenticate the notebook under your Google account.

The output will provide a link to accept the Google Drive SDK, press ***Accept***, copy the key provided, and enter it in the input box below the code cell. Then press ***Enter***. Your notebook should now be able to mount your Google Drive files.

In [None]:
# Import PyDrive and associated libraries.
# This only needs to be done once per notebook.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import gspread

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


## Find a Sheet by its name and read data

How to find a Google Spreadsheet by it's `fileName`:

In this case, I named the spreadsheet `"211221 gspread test"`. It was created on December 12th, 2021 so I named it using this date.

This is per Westcliff naming conventions (two-digit `<year><month><day>` then `<a name describing your file>`)

[Link to the spreadsheet we're referencing here.](https://docs.google.com/spreadsheets/d/1tEvRRpn7B176igUdoni6MGeUjYnRaekgAFe7SOY6m1M/edit#gid=0)


In [None]:
# pass the name of the file as a string
fileName = "211221 gspread test"

gc = gspread.authorize(GoogleCredentials.get_application_default())

sh = gc.open(fileName).sheet1

# get_all_values gives a list of rows.
rows = sh.get_all_values()

# and convert to a Pandas DataFrame
pd.DataFrame.from_records(rows)

Unnamed: 0,0
0,Hello world!


## Print all files in Google Drive root folder:

Here's a snippet for printing all files in your Google Drive root directory (assuming you've authenticated).

In [None]:
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
my_files = [print(f"Title:  {file1['title']}\
\n   ID:    {file1['id']}") for file1 in file_list]

## Sharing a Google Sheet with another user:

***Try it out:*** change the string assigned to the `email` variable to your Westcliff email address to send yourself an email.

In [None]:
email = 'henrikalbihn@westcliff.edu' # who do you want to share with?
role = 'writer' # this makes them an editor

try:
  sh.share(email, perm_type='user', role=role, email_message=True)
  print(f'Successfully shared {role} access to user: {email}.')
except Exception as e:
  print(f'Share failed.\n {e} ')

With the `email_message` argument of the `share()` function set to `True`, we can send a notification to the share-ee.

> Notice the `try:` and `except:` block. This is called exception handling and it's used to handle errors if they happen. More on this later.

# The Zen of Python 🧘

> There is ***good*** coding style and there is ***bad*** coding style.

We want our code to be ***pythonic***.

Pythonic is a term which means to write your code as simply and legibly as possible while still being verboose (explicit) about what each line does. Basically, it means write **good** code. You can read a poem written by ***Tim Peters***, one of the original authors of python about it's guiding principles by running the below code cell (sort of an easter egg `import this`).

In [None]:
# Run this cell to read 'The Zen of Python'
# a poem about the guiding principles of Python
# written by one of it's creators: Tim Peters.
import this

## Congrats, you're now a Pythonista 🐍💃
No, I'm not kidding. That's the [real term](https://pythonistaplanet.com/pythonista/) for someone who codes in Python. Enjoy this fun and rewarding language.
> ***Happy coding!***

I know this is a long one, so if you've made it this far, thank you! This took a long time to put together but it was worth it. This is meant to be a resource for you to check back on whenever you have a question about Python. 

I hope it will be valuable to you in your journey to learn Python and Data Science. Thanks for reading

In [None]:
# change the list to your name and see just how thankful I am.
names = ["Jason", "Trent"]
def ThankYouMsg(Names):
  for name in Names:
    print(f"Thanks for reading, {name}!")

ThankYouMsg(names)

### ~ *Henrik*

# Additional Resources

Below are just a few additional resources you may find useful when you have questions or need practice.

*   [Stack Overflow](https://stackoverflow.com/): is a Q/A board where you can ask coding questions and get answers from experts. Most questions I've asked are answered same day. Make sure to tag your questions with `python` and the packages in question.
> *Stack Overflow is public so be sure not to post any sensitive data.
*   [GitHub](https://github.com/) is a code repository for using `git`. You can backup your code and enforce version control. I recommend you setup an account and learn how to use it. We do not currently use GitHub as a department, but would like to in the future.
> *Same goes for GitHub, make your repositories private and don't push anything sensitive.
*   [DataCamp](https://www.datacamp.com/) is an interactive learning platform that focuses on coding, Data Analytics, and Data Science. You can try it for free but the membership is $25/month. I recommend just because I've used it extensively. There are free alternatives like [Free Code Camp](https://www.freecodecamp.org/learn/data-analysis-with-python/), but I haven't used it much.

## Package Documentation

Various documentation by package:
* [gspread](https://docs.gspread.org/en/latest/api.html): API reference
* [gspread-pandas](https://github.com/aiguofer/gspread-pandas): experimental package GitHub page
* [gspread-dataframe](https://github.com/robin900/gspread-dataframe): experimental package GitHub page
* [matplotlib](https://matplotlib.org/stable/api/index): API reference
* [numpy](https://numpy.org/doc/stable/reference/): API reference
* [pandas](https://pandas.pydata.org/docs/reference/index.html): API Reference
* [pandas](https://pandas.pydata.org/docs/getting_started/index.html#coming-from): Getting started if you're coming from R, SQL, Excel, and more.
* [seaborn](https://seaborn.pydata.org/api.html#): API reference

## More cool stuff about Colab
[Link to notebook on Data Table](https://colab.research.google.com/notebooks/data_table.ipynb#scrollTo=JgBtx0xFFv_i): Data Table is a cool interactive display kind of like Excel for Colab notebooks.