# Lecture 1. Introduction to Python Programming

## 1.1 Logistics

- Name: Jiantao Huang 
 * PhD in Finance from London School of Economics
 * Research interests: Empirical Asset Pricing & Bayesian Econometrics <br>
<br>
- Office: KKL Room 723 <br>
<br>
- Email address: huangjt@hku.hk <br>
<br>
- Format: 
 * Weekly Lecture (three hours, with two 10-min breaks)
 * No Lecture in the reading week
 * Office hours: 10:30 am - 11:30 (Thursday, in person, please book the time slot via email)
 * HKU Moodle: Includes ALL code materials (*some large datasets will be shared by Dropbox*) <br>
<br>
- TA Information   
 * Frankie HO, with email address: cfhoad@hku.hk <br>
<br>
- Reference books:
 * Python for Data Analysis, 3E
 * Online version of this book: https://wesmckinney.com/book/preliminaries.html  <br>
<br>
- Performance Evaluation:
 * Attendance and participation (5%)
 * Homework assignments (45%): Three assignments, each of which accounts for 15%
 * Final exam (50%)
 * Cheating of any form will not be allowed!!!  <br>

---

## 1.2 What Is This Course About?

* Python programming you need for data analysis

### Why Python?

* Python is the most important programming language in data analysis, especially due to the increasing popularity in machine learning and AI. <br>
<br>
* Compared to R, MATLAB, SAS, Stata, and others, Python is a more general language, i.e., used in general-purpose software engineering. <br>
<br>
* It is extremely simple to conduct Python computations in GPUs (while it is almost unlikely in R, MATLAB, SAS, Stata, etc.). <br>
<br>
* It solves the “Two-Language” Problem. 
 * Old fashion: Test new ideas using a more specialized computing language like SAS or R and then later port those ideas to be part of a larger production system written in, say, Java, or C++.
 * Python can do these tasks in a reasonably satisfactory way. 

### Why Not Python?

* Python is slower than Java or C++. When the main objective is time efficiency (e.g., high-frequency trading), we need C++. <br>
<br> 
* Python can be a challenging language for building highly concurrent, multithreaded applications, particularly applications with many CPU-bound threads. The reason for this is that it has what is known as the **global interpreter lock (GIL)**. 

### Course Outline

- Weeks 1 - 3. Introduction to Python 
 * Week 1. Get started with Python, including assignment, numerical types, while and for loops, bisection search
 * Week 2. Functions, tuple, list, strings, dictionaries, set
 * Week 3. Testing, debugging, exceptions, classes <br>
   * ***Assignment 1 after week 3***.
<br>
- Weeks 4 - 11. Data Analysis 
 * Week 4. NumPy Basics
 * Week 5. Introduction to Pandas
 * Week 6. Data Loading, Storage, File Formats, and Introduction to SQL
 * Week 7. Data Cleaning and Wrangling
   * ***Assignment 2 after week 7***.
 * Week 8. Plotting, Data Aggregation, and Group Operations 
 * Week 9. Time-Series and Advanced Pandas
 * Week 10. Data Analysis Examples I
 * Week 11. Data Analysis Examples II
   * ***Assignment 3 after week 11***. <br>
<br>
- Week 12. Advanced topics and Revision
<br>

---

## 1.3 Installation and Setup

* Please use **Anaconda** distribution: https://docs.anaconda.com/free/anaconda/install/.  <br>
<br>
* Installing or Updating Python Packages (in command lines):
 * In general, packages can be installed with the following command: ```conda install package_name```.
 * If this does not work, use pip package management tool: ```pip install package_name```.
 * Updating packages is similar: ```conda update package_name``` or ```pip install --upgrade package_name```.<br>
<br>
* Throughout this course, we use only Jupyter notebook. 
 * To start up Jupyter, run the command ```jupyter notebook``` in a terminal.
 * On many platforms, Jupyter will automatically open up in your default web browser. Otherwise, you can navigate to the HTTP address printed when you started the notebook, here http://localhost:8888/.

---

## 1.4 The Basic Elements of Python

One basic command is the ```print ``` function. 

### 1.4.1 Objects, Expressions, and Numerical Types

* Objects are the core things that Python programs manipulate. <br>
<br>
* Every object has a type (either scalar or non-scalar) that defines the kinds of things that programs can do with objects of that type. <br>
<br>
* Python has four types of scalar objects:
 * ```int``` is used to represent integers (e.g., 2390).
 * ```float``` is used to represent real numbers (e.g., $\pi =$ 3.1415926535).
 * ```bool``` is used to represent the Boolean values ```True``` and ```False```.
 * ```None``` is a type with a single value (used to define a null value, or no value at all). <br>
<br>
* Objects and **operators** can be combined to form **expressions**, each of which evaluates to an object of some type. 
  * We will refer to this as the **value** of the expression.

### 1.4.2 Operators on types ```int``` and ```float```:

* ```i+j```, ```i-j```, and ```i*j``` denote the sum of ```i``` and ```j```, ```i``` minus ```j```, and the product of ```i``` and ```j```. 
 * If ```i``` and ```j``` are both of type ```int``` (```float```), the result is an ```int``` (```float```).
* ```i//j``` is integer division. 
* ```i/j``` is ```i``` divided by ```j```. 
* ```i%j``` is the remainder when the ```int``` ```i``` is divided by ```int``` ```j```.
* ```i**j``` is ```i``` raised to the power ```j```. 
 * If ```i``` and ```j``` are both of type ```int``` (```float```), the result is an ```int``` (```float```).
* The comparison operators are ```==``` (equal), ```!=``` (not equal), ```>``` (greater), ```>=``` (at least), ```<```, (less) and ```<=``` (at most).

### 1.4.3 The operators on type ```bool``` are:

* ```a and b``` is ```True``` if both ```a``` and ```b``` are ```True```, and ```False``` otherwise.
* ```a or b``` is ```True``` if at least one of ```a``` or ```b``` is ```True```, and ```False``` otherwise.
* ```not a``` is ```True``` if ```a``` is ```False```, and ```False``` if ```a``` is ```True```.

### 1.4.4 Variables and Assignment

Variables provide a way to associate names with objects.

In [None]:
pi = 3
radius = 11
area = pi * (radius**2) 

In [None]:
radius = 14


![Screenshot%202023-07-30%20at%2010.44.09%20AM.png](attachment:Screenshot%202023-07-30%20at%2010.44.09%20AM.png)


* In Python, **a variable is just a name**, nothing more. Remember this: it is important. <br>
<br>
* An **assignment** statement associates the name to the left of the = symbol with the object denoted by the expression to the right of the =. <br>
<br>
* An object can have one, more than one, or no name associated with it.

### 1.4.5. Make your codes understandable

Is the code fragment I or II more understandable? 

In [None]:
### Code fragment I
a = 3.14159
b = 11.2
c = a*(b**2)

### Code fragment II
pi = 3.14159
diameter = 11.2
area = pi*(diameter**2)

There are some tricks in variable names:

* Variable names can contain uppercase and lowercase letters, digits (but they cannot start with a digit), and the special character _.
 * Python variable names are case-sensitive e.g., ```Jiantao``` and ```jiantao``` are different.  <br>
<br>
* Avoid using ***reserved words*** (sometimes called ***keywords***) as the variable names. 
 * E.g., ```break```, ```lambda```, ```and```, etc. <br>
<br>
* Another good way to enhance the readability of code is to add comments. 
 * Text following the symbol ```#``` is not interpreted by Python.
```python
# subtract area of square s from area of circle c 
areaC = pi*radius**2
areaS = side*side
difference = areaC-areaS
```

#### Multiple assignment

### 1.4.6 Branching Programs

* Until now, we have seen only **straight-line programs** (execute one statement after another). <br>
<br>
* Branching programs are more interesting. The simplest branching statement is a conditional. The following conditional statement has three parts:
 * a test, i.e., an expression that evaluates to either ```True``` or ```False```;
 * a block of code that is executed if the test evaluates to ```True```; and
 * an optional block of code that is executed if the test evaluates to ```False```.

![Screenshot%202023-07-30%20at%2011.02.49%20AM.png](attachment:Screenshot%202023-07-30%20at%2011.02.49%20AM.png)

In Python, a conditional statement has the form
```python
if Boolean expression: 
    block of code
else:
    block of code
```

In [None]:
x = 4

if x%2 == 0:
    print('Even')
else:
    print('Odd')
print('Done with conditional')

#### How about checking whether ```x``` is divisible by 2 and/or 3?

### Exercise
Write a program that examines three variables—x, y, and z— and prints the largest odd number among them. If none of them are odd, it should print a message to that effect.

### 1.4.7 Strings and Input

* Objects of type ```str``` are used to represent strings of characters. <br>
<br>
* Literals of type ```str``` can be written using either single or double quotes, e.g., 'abc' or "abc". <br>

In [10]:
print('a')

a


In [11]:
print(3*4)
print(3*'a')    # equivalent to 'a'+'a'+'a'

12
aaa


In [12]:
print('a'+'a'+'a')

aaa


* The operator ```+``` is said to be **overloaded**: It has different meanings depending upon the types of the objects to which it is applied. For example, it means addition when applied to two numbers and concatenation when applied to two strings. The operator ```*``` is also overloaded. <br>
<br>
* How about the following?
```python 
'a'*'a'
```
* **Type checking** is a good thing: It turns careless (and sometimes subtle) mistakes into errors that stop execution, rather than errors that lead programs to behave in mysterious ways. <br>
<br>
* The type checking in Python is not as strong as in some other programming languages (e.g., Java). <br>
<br>
* The length of a string can be found using the ```len``` function.

In [None]:
print(len('abc'))

* **Indexing** can be used to extract individual characters from a string. In Python, all indexing is **zero-based**.

In [None]:
print('abc'[0])   # how about 'abc'[3]?

* Negative numbers are used to index from the end of a string.

In [None]:
print('abc'[-1])

* **Slicing** is used to extract substrings of arbitrary length.
 * If ```s``` is a string, the expression ```s[start:end]``` denotes the substring of ```s``` that starts at index ```start``` and ends at index ```end-1```.

* If the value before the colon is omitted, it defaults to 0. 

* If the value after the colon is omitted, it defaults to the length of the string.

### Input

In [1]:
name = input('Enter your name: ')
print('Are you really', name, '?')

Enter your name: Jiantao
Are you really Jiantao ?


In [None]:
n = input('Enter an int: ')
print(type(n))

* We can convert the string into numeric value via **type conversions**. 

### Some methods on strings

* ```s.count(s1)``` counts how many times the string s1 occurs in s .
* ```s.find(s1)``` returns the index of the first occurrence of the substring s1 in s , and -1 if s1 is not in s .
* ```s.rfind(s1)``` same as find , but starts from the end of s (the “ r ” in rfind stands for reverse).
* ```s.index(s1)``` same as find , but raises an exception if s1 is not in s.
* ```s.rindex(s1)``` same as index , but starts from the end of s.
* ```s.lower()``` converts all uppercase letters in s to lowercase.
* ```s.replace(old, new)``` replaces all occurrences of the string old in s with the string new .
* ```s.rstrip()``` removes trailing white space from s .
* ```s.split(d)``` Splits s using d as a delimiter. Returns a list of substrings of s. 
 * For example, the value of ```'David Guttag plays basketball'.split(' ')``` is ```['David', 'Guttag', 'plays', 'basketball']```. If d is omitted, the substrings are separated by arbitrary strings of whitespace characters (space, tab, newline, return, and formfeed).

### 1.4.8 Iteration

![Screenshot%202023-07-30%20at%201.59.31%20PM.png](attachment:Screenshot%202023-07-30%20at%201.59.31%20PM.png)

In [None]:
# Square an integer, a stupid way
x=3
ans = 0
itersLeft = x

while (itersLeft != 0): 
    ans = ans + x
    itersLeft = itersLeft - 1
print(str(x) + '*' + str(x) + ' = ' + str(ans))

![Screenshot%202023-07-30%20at%202.13.05%20PM.png](attachment:Screenshot%202023-07-30%20at%202.13.05%20PM.png)

#### A simple algorithm to find the cube root: ```ans**3=x```

In [16]:
x = int(input('Enter an integer: ')) 



Enter an integer: 16


Whenever you write a loop, you should think about an appropriate **decrementing function**. This is a function that has the following properties:
* It maps a set of program variables into an integer.
* When the loop is entered, its value is nonnegative.
* When its value is <=0, the loop terminates.
* Its value is decreased every time through the loop.

The decrementing function in the last example is ```abs(x) - ans**3```.

* The algorithmic technique used in this program is a variant of **guess and check** called **exhaustive enumeration**. We enumerate all possibilities until we get to the right answer or exhaust the space of possibilities.  <br>
<br>
* At first blush, this may seem like an incredibly stupid way to solve a problem. Surprisingly, however, exhaustive enumeration algorithms are often the most practical way to solve a problem, because modern computers are amazingly fast.
 * Try to find the cube root of ```1957816251``` and ```7406961012236344616```.

### Exercise

Write a program that asks the user to enter an integer and prints two integers, ```root``` and ```pwr```, such that ```1 < pwr < 6``` and ```root**pwr``` is equal to the integer entered by the user. If no such pair of integers exists, it should print a message to that effect.

### 1.4.9 For Loops

The general form of a for statement is:
```python
for variable in sequence: 
    code block
```
The process continues until the sequence is exhausted or a ```break``` statement is executed within the code block.

### ```range``` function

The range function takes three integer arguments: ```start```, ```stop```, and ```step```. It produces the progression ```start```, ```start + step```, ```start + 2*step```, etc.

If ```step``` is positive, the last element is the largest integer ```start + i*step``` less than ```stop```. 

If ```step``` is negative, the last element is the smallest integer ```start + i*step``` greater than ```stop```.

In [17]:
x = 10
for i in range(0, x, 3):
    print(i)

0
3
6
9


In [18]:
x = 10
for i in range(x, 0, -3):
    print(i)

10
7
4
1


### Using ```break``` to stop the loop

In [None]:
#Find the cube root of a perfect cube
x = int(input('Enter an integer: ')) 

for ans in range(0, abs(x)+1):
    if ans**3 >= abs(x): 
        break
if ans**3 != abs(x):
    print(x, 'is not a perfect cube')
else:
    if x < 0:
        ans = -ans
    print('Cube root of', x,'is', ans)

The ```for``` statement can be used to conveniently iterate over characters of a string.

In [None]:
total = 0
for c in '123456789':
    total = total + int(c) 
print(total)

### Exercise

Let ```s``` be a string that contains a sequence of decimal numbers separated by commas, e.g., ```s = '1.23,2.4,3.123'```. Write a program that prints the sum of the numbers in ```s```.

### 1.4.10 Caveats About Using Floats

In [20]:
x = 0.0
for i in range(10):
    x = x + 0.1 
#print(x)

In [None]:
if x == 1.0:
    print(x, '= 1.0')
else:
    print(x, 'is not 1.0')

* In real life, we use decimal system. 
 * When you first learned about decimal numbers, i.e., numbers base 10, you learned that a decimal number is represented by a sequence of the digits ```0123456789```. The rightmost digit is the 100 place, the next digit towards the left the 101 place, etc. For example, the sequence of decimal digits ```302``` represents ```3*100 + 0*10 + 2*1```. <br>
<br>
* In modern computers, we use **Binary numbers** (numbers base 2).
 * A binary number is represented by a sequence of digits each of which is either 0 or 1. These digits are often called bits. The rightmost digit is the 20 place, the next digit towards the left the 21 place, etc. For example, the sequence of binary digits ```101``` represents ```1*4 + 0*2 + 1*1 = 5```. <br>
 * ```0.1``` is stored as ```11001100110011001100110011001100110011001100110011001```, which is not exactly 0.1. As we compute ```10*0.1``` in Python, the output ```x``` is not exactly equal to 1.0.

In [None]:
x = 0.0
for i in range(10):
    x = x + 0.1 
if (x - 1.0) < 0.00000001:
    print(x, '= 1.0')
else:
    print(x, 'is not 1.0')

### 1.4.11 Bisection Search

* Suppose that someone asks you to write a program that finds the square root of any nonnegative number larger than 1 (without relying on outside packages). <br>
<br>
* We know that the square root of 2 is not a rational number: There is no way to precisely represent its value. <br>
<br>
* Therefore, we should find an approximation to the square root, i.e., an answer that is close enough to the actual square root to be useful.

In [22]:
import timeit

In [25]:
start = timeit.default_timer()

x = 1.21
epsilon = 0.001
step = epsilon**2
numGuesses = 0
ans = 0.0

while abs(ans**2 - x) >= epsilon and ans <= x:
    ans += step
    numGuesses += 1
print('numGuesses =', numGuesses)

if abs(ans**2 - x) >= epsilon:
    print('Failed on square root of', x)
else:
    print(ans, 'is close to square root of', x)

print(f'Total configuration execution time: {(timeit.default_timer() - start):.4f}s.', flush=True)

numGuesses = 1099546
1.0995459999997288 is close to square root of 1.21
Total configuration execution time: 0.2214s.


### Do we have a faster algorithm? Yes, bisection search!

### Exercise 

What would have to be changed to make the above code for finding an approximation to the cube root of both negative and positive numbers? (Hint: think about changing low to ensure that the answer lies within the region being searched.)

---

# END