**<font color='red' size="5pt">How to submit your homework:</font>**

After you complete this notebook:

**Print the page, then print as pdf**

Then upload the .pdf file to BlackBoard.

1-5 BCAAC
6-10 DCCBB
11-15 CABBD
16-20 CDAAA
21-25 CADBA
26-30 CDBCD

# Defining functions

Let's write a very simple function that converts a proportion to a percentage by multiplying it by 100.  For example, the value of `to_percentage(.5)` should be the number 50.  (No percent sign.)

A function definition has a few parts.

##### `def`
It always starts with `def` (short for **def**ine):

    def

##### Name
Next comes the name of the function.  Let's call our function `to_percentage`.
    
    def to_percentage

##### Signature
Next comes something called the *signature* of the function.  This tells Python how many arguments your function should have, and what names you'll use to refer to those arguments in the function's code.  `to_percentage` should take one argument, and we'll call that argument `proportion` since it should be a proportion.

    def to_percentage(proportion)

We put a colon after the signature to tell Python it's over.

    def to_percentage(proportion):

##### Documentation
Functions can do complicated things, so you should write an explanation of what your function does.  For small functions, this is less important, but it's a good habit to learn from the start.  Conventionally, Python functions are documented by writing a triple-quoted string:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
    
    
##### Body
Now we start writing code that runs when the function is called.  This is called the *body* of the function.  We can write anything we could write anywhere else.  First let's give a name to the number we multiply a proportion by to get a percentage.

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100

##### `return`
The special instruction `return` in a function's body tells Python to make the value of the function call equal to whatever comes right after `return`.  We want the value of `to_percentage(.5)` to be the proportion .5 times the factor 100, so we write:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100
        return proportion * factor

**<font color='red'>Question:</font>** Define `to_percentage` in the cell below.  Call your function to convert the proportion .2 to a percentage.  Name that percentage `twenty_percent`.

In [6]:
def to_percentage(proportion):
    """ ... """
    factor = 100
    return proportion * factor

twenty_percent = to_percentage(0.2)
twenty_percent

20.0

Like the built-in functions, you can use named values as arguments to your function.

**<font color='red'>Question:</font>** Use `to_percentage` again to convert the proportion named `a_proportion` (defined below) to a percentage called `a_percentage`.

*Note:* You don't need to define `to_percentage` again!  Just like other named things, functions stick around after you define them.

In [8]:
a_proportion = 2**(.5) / 2
a_percentage = to_percentage(a_proportion)
a_percentage

70.71067811865476

As we've seen with the built-in functions, functions can also take strings (or arrays, or tables) as arguments, and they can return those things, too.

**<font color='red'>Question:</font>** Define a function called `disemvowel`.  It should take a single string as its argument.  (You can call that argument whatever you want.)  It should return a copy of that string, but with all the characters that are vowels removed.  (In English, the vowels are the characters "a", "e", "i", "o", and "u".)

*Hint:* To remove all the "a"s from a string, you can use `that_string.replace("a", "")`.  And you can call `replace` multiple times.

In [9]:
def disemvowel(a_string):
    return a_string.replace("a","").replace("e","").replace("i","").replace("o","").replace("u","")

# An example call to your function.  (It's often helpful to run
# an example call from time to time while you're writing a function,
# to see how it currently works.)
disemvowel("Can you read this without vowels?")

'Cn y rd ths wtht vwls?'

##### Calls on calls on calls
Just as you write a series of lines to build up a complex computation, it's useful to define a series of small functions that build on each other.  Since you can write any code inside a function's body, you can call other functions you've written.

If a function is a like a recipe, defining a function in terms of other functions is like having a recipe for cake telling you to follow another recipe to make the frosting, and another to make the sprinkles.  This makes the cake recipe shorter and clearer, and it avoids having a bunch of duplicated frosting recipes.  It's a foundation of productive programming.

For example, suppose you want to count the number of characters *that aren't vowels* in a piece of text.  One way to do that is this to remove all the vowels and count the size of the remaining string.

**<font color='red'>Question:</font>** Write a function called `num_non_vowels`.  It should take a string as its argument and return a number.  The number should be the number of characters in the argument string that aren't vowels.

*Hint:* The function `len` takes a string as its argument and returns the number of characters in it.

In [15]:
def num_non_vowels(a_string):
    """The number of characters in a string, minus the vowels."""
    str = a_string.replace("a","").replace("e","").replace("i","").replace("o","").replace("u","")
    return len(str)

# Try calling your function yourself to make sure the output is what
# you expect. 

4

# Loops and condition

**<font color='red'>Question:</font>** Find all possible 4-digit numbers satisfying:

$$abcd\times e = dcba$$

the initial number can not be 0, and $a$, $b$, $c$, $d$ are different numbers. 

Print them out.

Hint: you can transfer between string and integer:

```Python
>>str(1234)
[out]: "1234"
>>int("1234")
[out]: 1234
```

In [46]:
for i in range(1000,10000):
    a = str(i)[0]
    b = str(i)[1]
    c = str(i)[2]
    d = str(i)[3]
    if (a != b) and (a != c) and (a != d)and (b != c) and (b != d)and (c != d):
        for e in range(2,10):
            if (i*e == int(d+c+b+a)):
                print(i)
        

1089
2178


**<font color='red'>Question:</font>**
Write a function `summation` that evaluates the following summation for $n \geq 1$:

$$\sum_{i=1}^{n} i^3 + 3 i^2$$

In [57]:
def summation(n):
    """Compute the summation i^3 + 3 * i^2 for 1 <= i <= n."""
    sum = 0
    for i in range(1,n+1):
        sum += (i**3+3*i**2)
    return sum
    
# test your function:
summation(500)

15812937750

**<font color='red'>Question:</font>**
Recall the formula for population variance below:

$$\sigma^2 = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N}$$

Complete the functions below to compute the population variance of `population`, an list of numbers. For this question, **do not use built in NumPy functions, such as `np.var`.** 


In [55]:
def mean(population):
    """
    Returns the mean of population (mu)
    
    Keyword arguments:
    population -- a list of numbers
    """
    # Calculate the mean of a population
    sum = 0
    for i in population:
        sum += i
    return sum / len(population)

def variance(population):
    """
    Returns the variance of population (sigma squared)

    Keyword arguments:
    population -- a list of numbers
    """
    # Calculate the variance of a population
    a = 0
    u = mean(population)
    for i in range(0, len(population)):
        a += (population[i] - u) ** 2
    return a / len(population)

In [57]:
# to test the function, at first we generate a list of variables
#that follow a normal distribution:
import numpy as np
population_generated = list(np.random.normal(1,0.3,500))
# test
# note that in each run, the results will differ a bit, 
#because the population we generated each time is ramdom samples

print("The mean of the population is ", mean(population_generated))
print("The variance of the population is ", variance(population_generated))

The mean of the population is  0.9935153846187327
The variance of the population is  0.08278228757330813


**<font color='red'>Question:</font>**
The **GPA**, or **Grade Point Average**, is a number that indicates how well or how high you scored in your courses on average.
Based on SUSTech's grading system, the scores, Grade and GPA have the following relationship:

| Grade | GPA | Score |
|:----:|:----:|:--------:|
| A+   | 4.00 | 97~100   |
| A    | 3.94 | 93~96    |
| A-   | 3.85 | 90~92    |
| B+   | 3.73 | 87~89    |
| B    | 3.55 | 83~86    |
| B-   | 3.32 | 80~82    |
| C+   | 3.09 | 77~79    |
| C    | 2.78 | 73~76    |
| C-   | 2.42 | 70~72    |
| D+   | 2.08 | 67~69    |
| D    | 1.63 | 63~66    |
| D-   | 1.15 | 60~62    |
| F    | 0    | <60      |

Define a function ``Score2GradeGPA`` to convert a score (of total 100) to Grade and GPA according to the table above:

In [58]:
def Score2GradeGPA(score):
    """
    Returns the Grade and GPA into a tuple.
    
    Keyword arguments:
    score -- an integer or float value in the range [0,100]
    """
    grade = ""
    gpa = ""
    if score>=96.5 and score<=100 :
        grade="A+"
        gpa = "4.00"
    if score>=92.5 and score<96.5 :
        grade="A"
        gpa="3.94"
    if score>=89.5 and score<92.5 :
        grade="A-"
        gpa="3.85"
    if score>=86.5 and score<89.5 :
        grade="B+"
        gpa="3.73"
    if score>=82.5 and score<86.5:
        grade="B"
        gpa="3.55"
    if score>=79.5 and score<82.5 :
        grade="B-"
        gpa="3.32"
    if score>=76.5 and score<79.5 :
        grade="C+"
        gpa="3.09"
    if score>=72.5 and score<76.5 :
        grade="C"
        gpa="2.78"
    if score>=69.5 and score<72.5:
        grade="C-"
        gpa="2.42"
    if score>=66.5 and score<69.5 :
        grade="D+"
        gpa="2.08"
    if score>=62.5 and score<66.5 :
        grade="D"
        gpa="1.63"
    if score>=59.5 and score<62.5 :
        grade="D-"
        gpa="1.15"
    if score>=0.5 and score<59.5 :
        grade="F"
        gpa="0"
    return grade,gpa

# test
# here is the scores of 10 students:
Scores = [50, 60.1, 71, 74.2, 99, 88, 79.5, 89, 83, 94]
# apply the function:
[Score2GradeGPA(score) for score in Scores]

[('F', '0'),
 ('D-', '1.15'),
 ('C-', '2.42'),
 ('C', '2.78'),
 ('A+', '4.00'),
 ('B+', '3.73'),
 ('B-', '3.32'),
 ('B+', '3.73'),
 ('B', '3.55'),
 ('A', '3.94')]

# Set

The **Jaccard index**, also known as the **Jaccard similarity coefficient**, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Paul Jaccard, and independently formulated again by T. Tanimoto. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

$$J(A,B)=\frac{|A \cap B|}{|A \cup B|}=\frac{|A \cap B|}{|A|+|B|-|A \cap B|}$$

Now we have list A_l and list B_l:

In [59]:
A_l = ['h','e','r','a','m','m','y','w','a','r','d','i','s','a','n','a','w','a','r','d',
     'p','r','e','s','e','n','t','e','d','b','y','t','h','e','e','c','o','r','d','i',
     'n','g','c','a','d','e','m','y','t','o','r','e','c','o','g','n','i','z','e','a',
     'c','h','i','e','v','e','m','e','n','t','i','n','t','h','e','m','u','s','i','c',
     'i','n','d','u','s','t','r','y','h','e','t','r','o','p','h','y','d','e','p','i']

B_l = ['s','o','r','i','g','i','n','a','l','a','n','u','a','r','y','d','a','t','e','d',
     'u','e','t','o','t','h','e','i','m','p','a','c','t','o','f','t','h','e','p','a',
     'n','d','e','m','i','c','o','n','t','h','e','m','u','s','i','c','i','n','d','u',
     's','t','r','y','i','n','a','n','d','a','r','o','u','n','d','t','h','e','o','s',
     'n','g','e','l','e','s','o','n','v','e','n','t','i','o','n','e','n','t','e','r']

**<font color='red'>Question:</font>** Try to define set A and set B based on the corresponding lists A_l and B_l, for set A (or B), the elements are unique members in list A_l (or B_l):

In [4]:
A_l = ['h','e','r','a','m','m','y','w','a','r','d','i','s','a','n','a','w','a','r','d',
     'p','r','e','s','e','n','t','e','d','b','y','t','h','e','e','c','o','r','d','i',
     'n','g','c','a','d','e','m','y','t','o','r','e','c','o','g','n','i','z','e','a',
     'c','h','i','e','v','e','m','e','n','t','i','n','t','h','e','m','u','s','i','c',
     'i','n','d','u','s','t','r','y','h','e','t','r','o','p','h','y','d','e','p','i']

B_l = ['s','o','r','i','g','i','n','a','l','a','n','u','a','r','y','d','a','t','e','d',
     'u','e','t','o','t','h','e','i','m','p','a','c','t','o','f','t','h','e','p','a',
     'n','d','e','m','i','c','o','n','t','h','e','m','u','s','i','c','i','n','d','u',
     's','t','r','y','i','n','a','n','d','a','r','o','u','n','d','t','h','e','o','s',
     'n','g','e','l','e','s','o','n','v','e','n','t','i','o','n','e','n','t','e','r']
A = set(A_l) 
B = set(B_l) 
print(A)
print(B)

{'y', 'r', 'z', 'b', 'm', 'n', 's', 'w', 'g', 'i', 'u', 'v', 'h', 't', 'a', 'd', 'p', 'o', 'e', 'c'}
{'y', 'r', 'm', 'n', 's', 'g', 'i', 'u', 'v', 'h', 't', 'a', 'l', 'd', 'p', 'o', 'f', 'e', 'c'}


**<font color='red'>Question:</font>** Try to calculate the Jaccard index of A and B, $J(A,B)$:

In [5]:
print(len(A & B)/(len(A)+len(B)-len(A & B)))

0.7727272727272727


# String

The Grammy Award, is an award presented by the Recording Academy to recognize achievement in the music industry. The trophy depicts a gilded gramophone. The annual award ceremony features performances by prominent artists and presentation of awards that showcase achievements made by industry recording artists. The first Grammy Awards ceremony was held on May 4, 1959, to honor the musical accomplishments of performers for the year 1958. After the 2011 ceremony, the Academy overhauled many Grammy Award categories for 2012. The 63rd Annual Grammy Awards were held on March 14, 2021 (after it was postponed from its original January 31, 2021 date due to the impact of the COVID-19 pandemic on the music industry), in and around the Los Angeles Convention Center. 

We have the full transcripts of this ceremony, see attached file "Transcripts_63RD_ANNUAL_GRAMMY_AWARDS_2021.txt".

**<font color='red'>Question:</font>** Read the text from the file, extract all words and save them into a list called **word_list**, remove all non-alphabet characters from each word.

In [61]:
f = open("./Transcripts_63RD_ANNUAL_GRAMMY_AWARDS_2021.txt",errors = 'ignore')
data = f.read()
word_list = data.split()
for i in range(len(word_list)):
    newstr = ""
    for j in word_list[i]:
        if j.isalpha():
            newstr += j
        word_list[i] = newstr
print(word_list)
f.close()

['WELCOME', 'TO', 'THE', 'rd', 'ANNUAL', 'GRAMMY', 'AWARD', 'IM', 'TREVOR', 'NOAH', 'AND', 'IM', 'YOUR', 'HOST', 'AS', 'WE', 'CELEBRATE', 'THE', 'LAST', 'TEN', 'YEARS', 'OF', 'MUSIC', 'THAT', 'GOT', 'US', 'THROUGH', 'ABOUT', 'TEN', 'YEARS', 'OF', 'CORONAVIRUS', 'I', 'KNOW', 'IT', 'IS', 'ONE', 'YEAR', 'BUT', 'IT', 'FEELS', 'LIKE', 'TEN', 'WE', 'HAVE', 'MADE', 'THE', 'DECISION', 'TO', 'SOCIALLY', 'DISTANCE', 'FROM', 'THE', 'PRESTIGIOUS', 'TO', 'SOCIALLY', 'DISTANCE', 'FROM', 'THE', 'PRESTIGIOUS', 'TO', 'SOCIALLY', 'DISTANCE', 'FROM', 'THE', 'PRESTIGIOUS', 'TO', 'SOCIALLY', 'DISTANCE', 'FROM', 'THE', 'PRESTIGIOUS', 'TO', 'SOCIALLY', 'DISTANCE', 'FROM', 'THE', 'PRESTIGIOUS', 'PEER', 'VOTED', 'TROPHIES', 'IN', 'MUSIC', 'GIVING', 'SHIEBEE', 'NEW', 'GRAMMY', 'AWARDS', 'LIVE', 'THROUGHOUT', 'THE', 'EVENING', 'BUT', 'WE', 'HAVE', 'TO', 'DO', 'IT', 'QUICKLY', 'BECAUSE', 'TOMORROW', 'THIS', 'TENT', 'IS', 'RESERVED', 'FOR', 'AN', 'OUTDOOR', 'WEDDING', 'IN', 'MALIBU', 'AND', 'I', 'DO', 'IN', 'THE',

**<font color='red'>Question:</font>** Count the frequency of each unique word in **word_list**, and save the results into a dictionary called **word_count**: each key in the dictionary is a word, and the corresponding value is the frequency.

In [5]:
word_count = {}
for i in word_list:
    word_count[i] = word_list.count(i)
for a in word_count:
    print(a,':',word_count[a])

WELCOME : 21
TO : 447
THE : 679
rd : 8
ANNUAL : 1
GRAMMY : 86
AWARD : 30
IM : 161
TREVOR : 4
NOAH : 2
AND : 543
YOUR : 55
HOST : 3
AS : 39
WE : 147
CELEBRATE : 6
LAST : 14
TEN : 5
YEARS : 19
OF : 233
MUSIC : 64
THAT : 196
GOT : 59
US : 35
THROUGH : 21
ABOUT : 37
CORONAVIRUS : 1
I : 569
KNOW : 78
IT : 208
IS : 200
ONE : 28
YEAR : 66
BUT : 83
FEELS : 5
LIKE : 129
HAVE : 58
MADE : 18
DECISION : 1
SOCIALLY : 5
DISTANCE : 5
FROM : 34
PRESTIGIOUS : 5
PEER : 1
VOTED : 1
TROPHIES : 1
IN : 183
GIVING : 7
SHIEBEE : 1
NEW : 19
AWARDS : 24
LIVE : 12
THROUGHOUT : 2
EVENING : 2
DO : 42
QUICKLY : 3
BECAUSE : 28
TOMORROW : 2
THIS : 173
TENT : 2
RESERVED : 1
FOR : 185
AN : 35
OUTDOOR : 1
WEDDING : 2
MALIBU : 1
WANT : 70
LOSE : 5
MY : 233
SECURITY : 2
DEPOSIT : 1
REST : 8
ASSURED : 1
EVERYONE : 11
HERE : 66
FOLLOWING : 2
COVID : 2
PROTOCOLS : 1
GUIDE : 1
LINES : 3
WILL : 45
BE : 91
SHOW : 13
WHERE : 23
WHITE : 10
STUFF : 1
GOING : 29
UP : 132
PEOPLES : 1
NOSESES : 1
COTTON : 1
SWABS : 1
LOOKS : 1
DIFFER

THANKING : 1
NONE : 1
GROW : 1
GOAL : 1
RODEO : 1
DESTINYS : 1
CHILD : 1
CARRY : 1
WHACK : 1
WOULDBE : 2
YONS : 2
ENCOURAGING : 1
RESPECT : 2
CANNED : 1
THT : 1
SEAU : 1
TIED : 3
ALLTIME : 2
WINS : 2
SINGER : 2
MALE : 2
QUEEN : 3
ACHIEVEMENT : 2
FILM : 4
EXTENDED : 3
VERSION : 2
AVAILABLE : 1
CHARMING : 1
CHARACTER : 1
HITS : 1
JAMMING : 1
TERRIBLE : 1
DECENT : 1
EXPRESS : 1
EXTROVERTED : 1
SUPER : 1
SHOCKED : 1
TERRIFYING : 1
ARENA : 1
HYPED : 1
EASIER : 2
BUCK : 2
WHENEVER : 1
MUS : 1
IG : 1
HAPPEN : 1
HOUSEHOLD : 1
SING : 6
BLESS : 1
AUTO : 1
COUPLE : 1
BLESSED : 1
JOB : 5
HOLLYWOODS : 3
BLEEDING : 5
VAMPIRES : 1
FEEDIN : 1
DARKNESS : 1
TURNS : 1
DUST : 1
EVERYONES : 1
ONES : 1
CHASE : 1
RIDIN : 1
DYIN : 1
SLEEP : 4
LIVIN : 1
SCARED : 3
LOSIN : 3
FOUND : 3
REASONS : 2
OUTSIDE : 1
WINTER : 1
TURNIN : 1
GREY : 1
CITY : 9
ASH : 2
RAINS : 1
HOWL : 1
MOON : 1
DRUGS : 1
FADE : 1
BLOCKING : 1
SUN : 1
SHADES : 1
PULSE : 1
SEEM : 2
DYING : 1
WHOD : 1
FUNERAL : 1
CLOSIN : 1
SHARPEN : 1
TEETH 

**<font color='red'>Question:</font>** Based on **word_count**, what's the top 10 frequently used words in this ceremony? Try to print the top 10 words and their frequency below:

``For example:
    word    frequency
    of      100
    happy   20
    ...
``

In [25]:
word_count_ordered = sorted(word_count.items(),key = lambda d:d[1],reverse=True)
word_count_ordered.pop(0)
print("word"+ '{0:>14}'.format('frequency'))
for i in range(10):
    print(word_count_ordered[i][0],'\t',str(word_count_ordered[i][1]))

word     frequency
THE 	 679
I 	 569
AND 	 543
TO 	 447
YOU 	 434
A 	 378
ME 	 243
OF 	 233
MY 	 233
IT 	 208


**<font color='red'>Question:</font>** For the top 100 frequently used words, what's top 5 frequently used alphabets? Try to print them out with the percentage they appear in the 100 words:

``For example:
    alphabet percentage
    a   10.25%
    b   20.01%
    ...
``

In [52]:
word1 = ''
for i in range(100):
    word1 += word_count_ordered[i][0]
lowstr = word1.lower()
wordlist = list(lowstr)
number = len(lowstr)
dict1 = {}
for key in wordlist:
    dict1[key] = dict1.get(key,0)+ 1
order = sorted(dict1.items(),key = lambda d:d[1],reverse = True )
print("alphabet percentage")
for i in range(5):
    percentage = "{:.2%}".format(round(order[i][1]/number,2))
    print('{:<4}{}'.format(order[i][0],percentage))

alphabet percentage
e   11.00%
t   9.00%
a   9.00%
o   9.00%
h   6.00%


# Flip a coin

Python's built-in ``random`` module, can do the pseudorandom number generation. 
You can get more info. about it here:https://docs.python.org/3/library/random.html

You can import the module:

In [110]:
import random

 The method ``random`` in the module ``random`` will return a random values between the interval 0 and 1. For example:

In [111]:
random.random() # you can run this code multiple times to see how it works.

0.8416061247926858

**<font color='red'>Question:</font>** Define a function called ``flip_coin`` that return 0 or 1. Each time we call the function, it will return 0 with probability $p=0.2$, and or return 1 with probability $1-p=0.8$. Try to use the method ``random.random()``:

In [114]:
def flip_coin():
    """
    Return 0 with probability 0.2, or return 1 with probability 0.8
    """
    p = 0.2
    a = random.random()
    if a>=0 and a<0.2:
        return 0
    else:
        return 1



In [116]:
# this code is used for verifying your flip coin function
import random
def flip_coin():
    """
    Return 0 with probability 0.2, or return 1 with probability 0.8
    """
    p = 0.2
    a = random.random()
    if a>=0 and a<0.2:
        return 0
    else:
        return 1
flip_times = 10000
flip_coin_records = [flip_coin() for i in range(flip_times)]
flip_1 = sum(flip_coin_records)
print ("Flip coins {0} times, in which {1:.3%} are 0, and {2:.3%} are 1.".format(
    flip_times,1-flip_1/flip_times, flip_1/flip_times))

Flip coins 10000 times, in which 20.200% are 0, and 79.800% are 1.
