<a href="https://colab.research.google.com/github/junting-huang/data_storytelling/blob/main/case0_intro_to_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Description

This notebook serves as an introduction to coding in Python for class xxxx. We'll begin by presenting the outcome of today's session. After that, we'll delve into some fundamental concepts that will lead us to that final outcome.

## Learning Goals

    To understand the basics of Python "grammar".
    To grasp fundamental concepts of IPython Notebooks.
    To demonstrate that even a complex goal can be achieved step by step.

By the end of this workshop, students should be able to comprehend the structure of a basic Python script and conduct rudimentary data analysis. They should also gain confidence in coding.

## Requirements

To participate in this workshop, you'll need a computer and a Google account (to access Colab). There's no need to install Python or any other software.

Note: This course is tailored for literature majors. We simplify Python terminology and omit more advanced concepts to facilitate easier understanding. As a result, this workshop might differ significantly from typical introductory Python workshops. Students from STEM backgrounds might find it less comprehensive or detailed.

Acknowledgement and reference: This notebook is based on [Introduction to Python workshop](https://hbctraining.github.io/Training-modules/Python/) by [hbctraining](https://github.com/hbctraining)

-----
# Section 1: Colab interface

---
Please follow the instructions below to set up for this workshop:
**Create your own copy of this python notebook (the workshop materials)**: Click "File" at the menu bar -> Click "Save a copy in Drive". This notebook will be automatically named "Copy of case0_intro_to_python" and copied in your Google Drive. You can rename it if you want.

The platform that we will use during this workshop is Google Colab. Colab is a free cloud service platform based on the [Jupyter notebook](https://jupyter.org/) environment. It does not require users to install python locally, and the code is run entirely on the cloud server.

### Vocabulary

| Term                       | Description                                                                                     | How to do in Google Colab                  |
|----------------------------|-------------------------------------------------------------------------------------------------|--------------------------------------------|
| Code Cell                  | A section in an IPython notebook where you can write and execute programming code.               | Click the `+ Code` button to add.          |
| Text Cell                  | A section in an IPython notebook used for writing descriptive text, explanations, or notes.      | Click the `+ Text` button to add.          |
| Creating and Deleting Cell | The process of adding a new cell to an IPython notebook or removing an existing one.            | Use `+ Code` or `+ Text` to add; hover over the cell and click the trash bin icon to delete.|
| Saving the Document        | Storing the current state of an IPython notebook so that changes are preserved for later access.| Click on `File` > `Save`, or use `Ctrl + S`.|

---

### Exercise

Exercise 1: Document Your Day


    Task: Create a new text cell and briefly describe your learning objective

---

# Section 2: Basic Python terminologies (20 min)

## Python Variables
A variable serves as a modifiable storage location for data values in your program. To assign a value to a variable, Python uses the `=` sign. Variable names in Python must comply with specific rules:
- Initiation of a variable name must be with a letter or an underscore (_).
- Only alphanumeric characters and underscores (A-z, 0-9, _) are allowed within the variable name.
- The name must not coincide with any of Python's reserved keywords, a full list is available [here](https://docs.python.org/3.8/reference/lexical_analysis.html#keywords).
- Variables are case-sensitive, which means `Year` and `year` would be recognized as two separate variables.


## Example of Valid and Invalid Variable Names  
| Valid Variable Names                  | Invalid Variable Names           | Reasons for Invalidity                                    |
|---------------------------------------|----------------------------------|------------------------------------------------------------|
| `volumes_InSearchOfLostTime`          | `7volumesOfLostTime`             | Starts with a number, not allowed.                         |
| `averageRating_Ulysses`               | `average-rating`                 | Hyphens are not allowed, should be an underscore.          |
| `title_Raven`                         | `"TheRaven"`                     | Quotes are not allowed in variable names.                   |
| `nobelPrizeWinner_ToniMorrison`       | `nobelPrizeWinner Toni Morrison` | Spaces are not allowed in variable names.                   |



In [6]:
# Number of volumes in Proust's "In Search of Lost Time"
volumes_InSearchOfLostTime = 7

# Average rating of a critically acclaimed novel out of 10
averageRating_Ulysses = 9.3

# Title of a famous poem
title_Raven = "The Raven"

# Flag to check if the author is a recipient of the Nobel Prize in Literature
nobelPrizeWinner_ToniMorrison = True


## Assigning Values and Inspecting Variables

Variables such as `volumes_InSearchOfLostTime`, `averageRating_Ulysses`, `title_Raven`, and `nobelPrizeWinner_ToniMorrison` are now stored within the current Python computing environment. To observe these variables, you can locate them in the variable inspector pane, typically found on the left side of the screen in various Python IDEs.

To display the values of these variables, the `print()` function can be utilized. This is particularly useful for printing out the values to the console. For instance, you could execute `print(volumes_InSearchOfLostTime)` to see the number of volumes in Proust's "In Search of Lost Time."

In interactive environments, like Jupyter notebooks, simply typing the variable name and executing the cell will display the value of that variable. For example:


In [7]:
volumes_InSearchOfLostTime

7

>  This markdown code provides an updated introduction on how to assign values to variables of different data types and check these values within a Python environment. It includes examples of valid variable names related to comparative literature.


## Exercise 1: Variable Assignment and Correction

Objective: Create valid variable names and assign appropriate values of different data types, then correct the provided invalid variable names.

    Valid Assignment:
        Assign an integer value to a variable named century_FirstNovelPublished that represents the century when the first novel is believed to have been published.
        Assign a floating-point value to averageWordCount_ShakespeareSonnet representing the average word count in a Shakespearean sonnet.
        Assign a string to theme_CommonLitFiction that holds a common theme found in literature fiction.
        Assign a boolean to isEpic_Gilgamesh indicating whether "The Epic of Gilgamesh" is considered an epic.

    Write your Python code for the above tasks here:


Invalid Correction:
Below are some incorrectly named variables. Correct each variable name and assign a suitable value related to comparative literature.

    19CenturyGenres = ?
    total#Pages = ?
    author-Name = ?
    fiction Or Nonfiction = ?

Write your corrected variable names and assignments here:

## Data types
Data comes in different types. For example, the newly created variables `x` and `mean` are numeric. `x` is a whole number, so its data type is `int`, or "integer"; `mean`, on the other hand, is a number with decimal places, so its data type is `float`.

> These are similar to the "integer" and "numeric" data types in R, respectively.

We can use the `type()` function to check what data type a given variable has.

In [None]:
# Check the data type for 'x' and 'mean'


Another commonly used data type is `str`. String stores a sequence of characters, and can be created by enclosing characters inside single quotation marks `''` or double quotation marks `""` .

> This is similar to the "character" data type in R.

In [None]:
# Generate a str variable called 'text', with the value 'hello world!'. Check its data type


The last data type we introduce here is `bool`. The boolean data type can be either `True` or `False`. It is to specify if an expression is true or false. We will cover this data type in the conditional statement section.

> This is similar to the "logical" data type in R.

In [None]:
# Generate a boolean variable called 'test', which judges whether 10 is smaller than 8. Check its data type


## Recap
In this section, we introduced some basic terms in Python. We learned **how to assign variables**, and what rules to follow. We also described several important **data types** - `int`, `float`, `str`, `bool`. Hope it has been fun so far!

| **Data type** | **Examples** |
| :---: | :---: |
| int (numeric) | 2 |
| float (numeric) | 3.5 |
| str | 'hello world!' |
| bool | True, False|

In the next section, we will focus on one important concept - Python lists. This will be something you use all the time in Python.

---

# Section 3: Python List (40 min)

## Create a List
Let's talk about data structures. In Python, data is stored in specific ways within variables. A frequently used "data structure" is `list`. A Python list is a collection of data stored within a square bracket `[]`.

A list has the following features:
- order of its elements matters
- can store mixed data types that we introduced above
- can even contain a sublist

> Note: There are other Python data structures, including `tuple`, `dictionary`, `sets`. We will not cover those in this workshop, but they can be very useful in some situations. If you are interested in learning more about them, this [website](https://thomas-cokelaer.info/tutorials/python/data_structures.html) has more information.

In [None]:
# Create an empty list called 'empty'


In [None]:
# Create a list called 'species', containing three strings: ecoli, human, corn.


In [None]:
# Create a list called 'glengths', containing three numeric values that correponds to genome length (in Mb): 4.6, 3000, 2500


In [None]:
# Create a list called 'combined', containing all three species and corresponding genome lengths as pairs


In [None]:
# Create a list called 'combined2', with each species and genome length pair as a sublist


## Subsetting a single element from a list
Now that we created a list, how do we access the data from it?

We can do so by specifying the "index" number - the location of the data within the list. **Python index starts from 0** (it is not intuitive, we know! Please just bear with it).

The first element of a list is `list[0]`. Alternatively, we can also use `-` to access the data starting from the last element. The last element of a list is `list[-1]`. The image below illustrates the index for each elements.

<p align="center">
<img src="https://github.com/hbctraining/Training-modules/blob/master/Python/img/list1.png?raw=true" width="500"/>
</p>



In [None]:
# Get the 3rd element from list 'combined'


In [None]:
# Get the 3rd element from list 'combined2'. Notice that the result is a sublist!


In [None]:
# Get the 3rd from the last element from the list 'combined'


## Subsetting multiple elements from a list
Now, what if we want to access multiple elements in a list?

Here we introduce the slicing `:` operator. The syntax of "slicing" is `[start:stop:step]`. *start* refers to the starting index of the slice. *stop* refers to the index of the first element just **after** the finish of our "slice". *step* refers to step value of the slice.
> Note: You don't have to specify all slicing elements; when it is not specified, Python will use default value - **by default, it will start from the first element, stop at the last element, and use step of 1**.

<p align="center">
<img src="https://github.com/hbctraining/Training-modules/blob/master/Python/img/list2.png?raw=true" width="500"/>
</p>

In [None]:
# Get the first two elements from the list 'combined' - method 1: specify both start and stop position


In [None]:
# Get the first two elements from the list 'combined' - method 2: specify only stop position


In [None]:
# Get the last two elements from the list 'combined' - method 1: use normal index


In [None]:
# Get the last two elements from the list 'combined' - method 2: use negative index


In [None]:
# Get every other element from the list 'combined'


## Recap
In this section, we introduced **what is a list** and **how to create a list in Python**. We learned **how to access and manipulate one or more elements in a list**. Sometimes there are multiple ways to achieve this goal.

In the next section, we are going to introduce tools that make Python programming truly powerful - functions.

---

# Section 4: Functions (20 min)

## Built-in function
A function is a collection of reusable code that performs a particular task. Python has a set of built-in [functions](https://docs.python.org/3/library/functions.html). For example, `max()` returns the maximum value of a list consisting of numeric numbers.

In [None]:
# Use the max() function to return maximum value of the list 'glengths'


Let's take another example - `round()` - this function rounds a numeric value to a certain decimal point. By default, the output will be a whole number.

In [None]:
# define a variable with the value of pi, and then output the corresponding whole number using the round() function


What if we want to round the value to specific number of decimal places? In that case, we would have to use additional *arguments* when using the function.

To check the available arguments and usage information for a function, one can use the `help()` function. However, we would recommend that you search the web for the function you want to use. For instance, [this webpage](https://www.programiz.com/python-programming/methods/built-in/round) shows some nice examples for the `round()` function. You can easily find similar resources online for most other functions.

In [None]:
# Use the `help()` function to display the usage of the `round()` function


We now know that we can specify number of digits using the `ndigits` argument within `round()`. Let's try that with pi!

In [None]:
# round the value of pi to 2 decimal places


### Exercise
Another useful base function is `sorted()` - it sorts the elements of a given list in a specific order. Use this function to reorder the `glengths` list in **descending** order. Check [here](https://www.programiz.com/python-programming/methods/built-in/sorted) if you are not sure what argument to use.

In [None]:
# Sort the glengths list in descending order
#### Insert your code below ####


## Object-specific function
Python has a lot of functionality beyond the basic built-in functions. Recall the data types and data structures that we learned earlier? They are all called Python **objects**. Depending on the object type, there are functions to perform object-specific tasks.

Let's take a concrete example. One function for a Python string is `count`. `count` searches the substring in the given string and returns how many times the substring is present within the object. The syntax is `string.count(substring)`.

In [None]:
# Count number of T in a DNA sequence 'ACTGAT'


Pretty handy, right? We have just touched the tip of the iceberg so far. There are many more [functions](https://docs.python.org/3/library/stdtypes.html#string-methods) for strings in Python. Below we list a few more functions that you will likely see or may use.

| **Function** | **Description** | **Example** | **Output** |
| :---: | :---: | :---: | :---: |
| capitalize | Converts the first character to upper case | 'atgc'.capitalize() | 'Atgc' |
| count | Returns the number of times a specified value occurs in a string | 'atgc'.count('c') | 1 |
| islower | Returns True if all characters in the string are lower case | 'atgc'.islower() | True |
| join | Joins the elements of an iterable to the end of the string | ''.join(['a', 't', 'g', 'c']) | 'atgc' |
| replace | Returns a string where an old value is replaced with a new value | 'atgc'.replace('a', 'g') | 'gtgc' |
| split | Splits the string at the specified separator, and returns a list | 'hello world'.split() | ['hello', 'world'] |

---

# Section 5: Packages (10 min)

A Python package contains a collection of pre-defined scripts for specific tasks. It allows us to directly use these scripts to accomplish a task of interest, without having to write everything from scratch.

We need to install a Python package if it is not already present. [pip](https://pip.pypa.io/en/stable/) is the package installer for Python. To install a package, we could use `!pip install package_name` command. For example, to install [scanpy](https://scanpy.readthedocs.io/en/stable/), a popular package for single-cell RNAseq analysis, we could use `!pip install scanpy`.


In [None]:
# error when using numpy package without importing first
numpy.array([2, 3, 4, 5]) + numpy.array([1, 10, 100, 1000])


All major Python libraries are already installed on Colab, so we do not need to install any packages for now. However, we need to `import` a package before we could use it - this import step is required everytime we initiate a new Python environment. As a result, we usually place these `import` codes at the beginning of a script.

Sometimes, we name an alias for a package, using the `import package_name as alias` syntax. This way, we just need to use the alias when citing a function from the package, which is convenient if we use the package often. For some popular packages, people set some conventions on what alias to use.

In [None]:
# import numpy library, and name it as np
import numpy as np

How would you re-write the code from above (the one that caused an error)? Give it a try below.

In [None]:
# use numpy package after importing (note: you need to use "np", instead of "numpy", because np is the alias we set earlier)


Python is unique in how you use functions from a specific package and also on specific data, it has a very specific syntax for anything that you bring into the Python environment that is not available by default.

# Section 8 Exercise