# Introduction to Python
## June 6, 2019
## Jie Heng

In [40]:
print("Hello, world!")

Hello, world!


In [41]:
print("Hello, reader!")

Hello, reader!


In [1]:
import pandas as pd
import nltk
import numpy as np

Three fundamental concepts found in nearly any programming language:

Variables

Types

Expressions

### I. Variables

A pretty fundamental part of writing a computer program is keeping track of certain information throughout the lifetime of the program. For example, if you were writing a simple program to compute the average of a series of measurements, you would need to keep track of the running total of those measurements. This kind of information is stored in your computer’s memory while a program is running.

However, you will rarely (if ever) have to interact with your computer’s memory directly. Instead, most programming languages provide a convenient abstraction for storing information: variables. A variable is a symbolic name representing a location in the computer’s memory. You can store a specific value in a variable, such as numbers, text, etc. and retrieve that value later on.

In Python, you can assign a value to a variable like this:

In [45]:
variable = 9

In [46]:
print(variable)

9


### II. Types

In the above example, the message variable contained a piece of text ("Hello, world!"). However, variables can also contain other types of data. Most programming languages (including Python) support at least three basic types:

Numbers: Usually encompassing both integer numbers and real numbers.

Strings: Strings are how we refer to “text” in most programming languages (in the sense that text is a “string” of characters). We’ve actually already seen an example of a string: Hello, world! (the character H followed by the character e followed by l, etc.)

Booleans: To represent truth values (True or False)

### 1 Numbers + Booleans

In [4]:
9 + 2 

11

In [5]:
0 == 2

False

In [6]:
6 % 2

0

In [7]:
5 * 8

40

5 < 3

In [33]:
True == 1

True

False == 0

In [34]:
5 >= 3

True

In [47]:
2 ** 3 

8

In [49]:
x = 8
y = 9
x * y 

72

### 2 String

In [138]:
s1 = "foobar"
s2 = 'foobar'

In [139]:
s1[0]

'f'

In [140]:
s1[1:4]

'oob'

we cannot modify individual characters in a string:

In [141]:
s1[0] = 'F'

TypeError: 'str' object does not support item assignment

#### Question: how to modify? is there an alternative way?

We can use the find method to determine whether a given string is contained in another. If the substring provided to find is found in the string, the method will return the index of the first occurrence of that substring in the string. If the string does not contain the provided substring, then the method returns -1.

In [142]:
s1.find("oo")

1

In [143]:
s1.find("baz")

-1

In [36]:
# upper/lower and other operations

In [37]:
"TOPOER".lower()

'topoer'

In [39]:
"TOPdssdfsdf".upper()

'TOPDSSDFSDF'

In [144]:
"hello world".capitalize()

'Hello world'

In [148]:
"1.000.000".replace(".", ",").replace("1","9")

'9,000,000'

In [146]:
"...".join(["hello", "world"])

'hello...world'

The parameter to split is actually optional; if we omit it, split will assume that any whitespace characters (spaces, tabs, etc.) are the separator. For example:

In [149]:
s = "foo,bar,baz"

In [150]:
values = s.split(",")

In [151]:
values

['foo', 'bar', 'baz']

In [152]:
phrase = "The quick   brown  fox     jumps  over   the  lazy            dog"

In [153]:
phrase.split()

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [155]:
sep = "|"

In [156]:
sep.join(values)

'foo|bar|baz'

The method strip() returns a copy of the string in which all chars have been stripped from the beginning and the end of the string (default whitespace characters).

In [185]:
line = "Leo is watching TV.\n"

In [186]:
line.strip('\n').lower()

'leo is watching tv.'

### III Statements

### 1. conditional statements

In [53]:
n = 5
if n > 0:
    print(n, "is a positive number")

5 is a positive number


In [54]:
n = 5
if n > 0:
    print(n, "is a positive number")
    n = -n
    print("And now the number is negative:", n)

5 is a positive number
And now the number is negative: -5


In [55]:
n = 5
if n > 0:
    print(n, "is a positive number")
    n = -n
print("And now the number is:", n)

5 is a positive number
And now the number is: -5


In [56]:
n = 6
if (n % 2) == 1:
    print(n, "is odd")
else:
    print(n, "is even")

6 is even


In [57]:
n = 5
if (n % 2) == 1:
    print(n, "is odd")
    print("This concludes the odd branch.")
else:
    print(n, "is even")
    print("This concludes the even branch.")

5 is odd
This concludes the odd branch.


In [60]:
n = 17
if n < 0:
    print(n, "is negative")
elif n > 0:
    print(n, "is positive")
else:
    print(n, "is zero")

17 is positive


In [61]:
n = 17
if n < 0:
    print(n, "is negative")
elif n > 0:
    print(n, "is positive")
elif n % 2 == 1:
    print(n, "is odd")
elif n % 2 == 0:
    print(n, "is even")

17 is positive


In [62]:
n = 17
if n < 0:
    print(n, "is negative")
if n > 0:
    print(n, "is positive")
if n % 2 == 1:
    print(n, "is odd")
if n % 2 == 0:
    print(n, "is even")

17 is positive
17 is odd


In [63]:
n = 7
if n > 0:
   print(n, "is positive")
   n = -n
elif n < 0:
   print(n, "is negative")

7 is positive


### 2. Loops

Loops provide a mechanism for doing repeated work in a program. 

In [64]:
# for loops have the following syntax:

In [65]:
for p in [10, 25, 5, 70, 10]:
    print("The price is", p)

The price is 10
The price is 25
The price is 5
The price is 70
The price is 10


In [66]:
prices = [10, 25, 5, 70, 10]

for p in prices:
    print("The price is", p)

The price is 10
The price is 25
The price is 5
The price is 70
The price is 10


In [67]:
prices = [10, 25, 5, 70, 10]

total_tax = 0

for p in prices:
    tax = 0.10 * p
    total_tax = total_tax + tax

print("The total tax is", total_tax)

The total tax is 12.0


#### Quiz: please only calculate the tax of the prices that are higher than 10

for n in range(1, 11):
    if (n % 2) == 1:
        print(n, "is odd")
    else:
        print(n, "is even")

#### Quiz：please use for loop and conditional statements, write a function that print out all numbers from 3 to 100 that can be divided by 4 

While loops are a more general type of loop that, instead of repeating an action for each element in a sequence, will repeat an action while a condition is true. The condition is expressed using a boolean expression, which can allow us to express much more complex loops than for loops.

In [69]:
N = 10
i = 1
sum = 0

while i <= N:
    sum = sum + i
    i = i + 1

print(sum)

55


So when should we use a for loop or a while loop? As a rule of thumb, any time you have to iterate over a sequence of values, using a for loop is typically the best option. While a while loop can still get the job done, using this construct can be more error prone. There are, however, certain algorithms where the loop cannot naturally be stated as iterating over a sequence of values, so we need the more general mechanism provided by a boolean expression.

### IV Function

In [74]:
def multiply(a, b):
    n = a * b
    return n

In [75]:
multiply(123, 456)

56088

Let’s break down the above code:

The def keyword indicates that we are def-ining a function. It is followed by the name of the function (multiply).

The name of the function is followed by the names of the parameters. These names appear in parentheses, with the parameters separated by commas. The parameters are the input to the function. In this case, we are defining a function to multiply two numbers, so we need to define two parameters (the numbers that will be multiplied). Sometimes we refer to these names as the formal parameters of the function to distinguish them from the actual values provided when the function is used.

After a colon, we have the body of the function, with one level of indentation below the def line. This block is the actual code that defines what the function will do. Notice how the code operates on the parameters. As we’ll see later on, the parameters will take on specific values when we actually run the function. At this point, we are just defining the function, so none of the code in the function is being run yet; it is simply being associated with a function called multiply

Notice how the body of the function contains a return statement. This statement is used to specify the return value of the function (in this case, n, a variable that contains the product of parameters a and b). The last line of the body of the function is typically a return statement but, as we’ll see later on, this statement is not strictly required.

In [80]:
def absolute(x):
    if x < 0:
        return -x
    else:
        return x

In [81]:
absolute(3)

3

In [82]:
absolute(-3)

3

In [83]:
def f():
    y = 5
    print("The value of y before the the call to add_one is", y)
    z = add_one(y)
    print("The value returned by the call to add_one is", z)
    print("The value of y after the the call to add_one is", y)

#### Question how to run the function above?

### V Data Structures

### 1. lists

In [84]:
# creating lists

In [11]:
ls1 = [1,2,3,4]

In [12]:
ls2 = [4,5,6]

In [85]:
# creating empty lists

In [86]:
ls4 = []

In [87]:
ls4

[]

In [88]:
print(len(ls1), len(ls2), len(ls4))

4 3 0


#### What would happen if I run the following lines?

In [None]:
ls5 = [0] * 10

In [None]:
ls6 = [0, 1] * 10

In [89]:
# Accessing Elements in a List

Once we have a list, we can access and use individual values within that list. To do so, we just use the variable containing the list, followed by the position, or index, of the element we want to access between square brackets. Perhaps counter intuitively, indexes are numbered from zero so, if we wanted to access the third element in the list, we would actually use index 2

In [90]:
lang = ['C', 'C++', 'Python', 'Java']

In [91]:
lang[0]

'C'

In [92]:
lang[-1]

'Java'

In [95]:
lang[len(lang)]

IndexError: list index out of range

In [96]:
lang[len(lang)-1]

'Java'

In [97]:
for i in range(len(lang)):
    print(i)

0
1
2
3


In [98]:
lang[1:3]

['C++', 'Python']

In [99]:
lang[1:len(lang)-1]

['C++', 'Python']

In [100]:
lang[:2]

['C', 'C++']

In [13]:
ls1 + ls2

[1, 2, 3, 4, 4, 5, 6]

In [14]:
set(ls1) - set(ls2)

{1, 2, 3}

In [15]:
list(set(ls1) - set(ls2))

[1, 2, 3]

In [16]:
ls3 = list(set(ls1) - set(ls2))

In [17]:
ls3

[1, 2, 3]

In [18]:
ls3.append(0)

In [19]:
ls3

[1, 2, 3, 0]

In [20]:
ls3 - ls1

TypeError: unsupported operand type(s) for -: 'list' and 'list'

In [21]:
set(ls3) - set(ls1)

{0}

In [22]:
ls3[0] = 100

In [23]:
ls3

[100, 2, 3, 0]

In [24]:
ls3.insert(0, 0.5)

In [25]:
ls3

[0.5, 100, 2, 3, 0]

In [101]:
ls3.extend(ls1)

In [102]:
ls3

[0.5, 100, 2, 3, 0, 1, 2, 3, 4]

In [103]:
ls3.pop(2)

2

In [133]:
ls3

[3, 1, 0, 3, 100, 0.5]

In [105]:
ls3.pop()

4

In [106]:
ls3

[0.5, 100, 3, 0, 1, 2, 3]

In [134]:
del ls3[1]

In [135]:
ls3

[3, 0, 3, 100, 0.5]

#### Quiz create two lists of strings and get the union of two lists, save it, and insert a number as the second item of the list

In [76]:
# run the followings, an example of for loop in a function.
def addtwonumbers(ls_num):
    '''
    Given a list of numbers, the function returns the sum of the list.
    '''
    rs = 0
    for num in ls_num:
        rs += num
    return rs

In [77]:
addtwonumbers([1,2,3,4])

10

In [78]:
print("2 x 3 =", multiply(2,3))

2 x 3 = 6


In [108]:
ls3

[0.5, 100, 3, 0, 1, 2, 3]

In [107]:
min(ls3)

0

In [109]:
max(ls3)

100

In [112]:
ls3.count(3)

2

In [113]:
ls3.count(32)

0

In [114]:
ls3.count(0)

1

In [115]:
ls3.reverse()

In [116]:
ls3

[3, 2, 1, 0, 3, 100, 0.5]

In [117]:
sorted(ls3)

[0, 0.5, 1, 2, 3, 3, 100]

In [118]:
ls3

[3, 2, 1, 0, 3, 100, 0.5]

In [119]:
ls3_s = sorted(ls3)

In [121]:
ls3_s

[0, 0.5, 1, 2, 3, 3, 100]

In [123]:
ls3_s_r = sorted(ls3, reverse = True)

In [124]:
ls3_s_r

[100, 3, 3, 2, 1, 0.5, 0]

#### Quiz write a function to test if a number is a prime

In [125]:
m = [ [1,2,3,4], [5,6,7,8], [9,10,11,12] ]

In [None]:
m[0]

In [None]:
m[1][2]

### 2. Tuples

Tuples are another data structure available in Python that are very similar to lists. We can use them to store sequences of values, and we can create them in the same way we create lists, except we use parentheses instead of square brackets:

In [126]:
t = (100, 200, 300)

We cannot assign a new value to an element of the tuple. Nor can we append to the tuple, delete from the tuple, or carry out any operations that would modify the tuple in-place:

In [128]:
t.append(2)

AttributeError: 'tuple' object has no attribute 'append'

In [130]:
t[1] = 0

TypeError: 'tuple' object does not support item assignment

In [136]:
salaries = [ ("Alice", 5000), ("John", 4000), ("Carol", 4500) ]
for item in salaries:
    name = item[0]
    salary = item[1]
    print(name, "has a salary of", salary)

Alice has a salary of 5000
John has a salary of 4000
Carol has a salary of 4500


### 3. Dictionary

Suppose we were working with data on contributions to political campaigns, with each individual contribution including the following information:

First name of contributor

Last name of contributor

ZIP code where contributor lives

Campaign that received the contribution

Contribution amount

In [157]:
contributions = [
    ["John", "Doe", "60637", "Kang for President 2016", 27.50],
    ["Jane", "Doe", "60637", "Kodos for President 2016", 100.00],
    ["James", "Roe", "07974", "Kang for President 2016", 50.00]
    ]

In [160]:
john = {"first_name": "John",
     "last_name": "Doe",
     "zip_code": "60637",
     "campaign": "Kang for President 2016",
     "amount": 27.50}

In [161]:
john["zip_code"]

'60637'

In [162]:
contributions = [
   {"first_name": "John",
    "last_name": "Doe",
    "zip_code": "60637",
    "campaign": "Kang for President 2016",
    "amount": 27.50},
   {"first_name": "Jane",
    "last_name": "Doe",
    "zip_code": "60637",
    "campaign": "Kodos for President 2016",
    "amount": 100.00},
   {"first_name": "James",
    "last_name": "Roe",
    "zip_code": "07974",
    "campaign": "Kang for President 2016",
    "amount": 50.00}
]

In [164]:
def total_contributions_candidate(contributions, campaign):
    total = 0
    for contribution in contributions:
        if campaign == contribution["campaign"]:
            total += contribution["amount"]
    return total

In [165]:
total_contributions_candidate(contributions, "Kang for President 2016")

77.5

In [None]:
# modify dictionary

In [166]:
john["zip_code"]

'60637'

In [167]:
john["zip_code"] = "60616"

In [168]:
john

{'first_name': 'John',
 'last_name': 'Doe',
 'zip_code': '60616',
 'campaign': 'Kang for President 2016',
 'amount': 27.5}

In [169]:
john["registered_voter"] = True

In [170]:
john

{'first_name': 'John',
 'last_name': 'Doe',
 'zip_code': '60616',
 'campaign': 'Kang for President 2016',
 'amount': 27.5,
 'registered_voter': True}

In [173]:
# examine if a key is in a dictionary

In [174]:
john.get("middle_name")

In [176]:
a = john.get("middle_name")

#### Question what is the value of a

In [184]:
john.get("middle_name", 0)

0

In [175]:
john.get("first_name")

'John'

In [177]:
"first_name" in john

True

In [179]:
"name" in john

False

In [172]:
# create an empty list

In [171]:
d = {}

In [180]:
# loop key
for k in john:
    print(k)

first_name
last_name
zip_code
campaign
amount
registered_voter


In [181]:
# loop value
for k in john.values():
    print(k)

John
Doe
60616
Kang for President 2016
27.5
True


In [182]:
for k, v in john.items():
    print(k, v)

first_name John
last_name Doe
zip_code 60616
campaign Kang for President 2016
amount 27.5
registered_voter True


In [183]:
def total_contributions_per_campaign(contributions):
    rv = {}
    for contribution in contributions:
        campaign = contribution["campaign"]
        rv[campaign] = rv.get(campaign, 0) + contribution["amount"]
    return rv

### VI. Read/write txt files

In [None]:
with open(outputfilename, 'w') as file:
    file.write(" ".join(d[sort_w])+'\n')

In [None]:
with open(filename) as fp: 
    for line in fp:
        w = line.strip('\n').lower()
        if w and len(w) >= MIN_WORD_LEN:
            ......

In [None]:
# Could you read "english-words-235k.txt", line by line? Each line is a string element in a list?