<div style="text-align: right">
    <i>
        LIN 537: Computational Lingusitics 1 <br>
        Fall 2019 <br>
        Alëna Aksënova
    </i>
</div>

# Notebook 1: Basic IO, variables, boolean expressions

This notebook introduces the basic programming terminology such as *funcitons*, *arguments*, *variables*, *IO* etc. It shows how to implement basic input-output operations using `print` and `input`. Then it explains such fundamental data types as `str`, `int`, `float` and `bool`. Finally, it demonstrates how to create and evaluate Boolean expressions.

## 1. The Jupyter environment

In this class we are going to use Jupyter notebooks for coding. They are minimalistic, easy to read, they make it easy to write explanation in Markdown, and it is very convenient to work with the code in separate cells.

In order to avoid the hassle of installing Jupyter notebooks locally, we will be using [Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) that makes it easy to upload and work with the notebooks.

## 2. Functions and their arguments

The notion of functions and arguments is probably not new to you, especially if you took formal semantics classes. Roughly, _John sleeps_ can be expressed as a function `sleep` getting `John` as an argument and therefore yielding `sleep(John)`.

In Python, and in programming in general, **functions** can be thought of as descriptions of actions. They always return some value. For example,
 * function that reverses strings returns another string;
 * function that adds two numbers together returns their sum;
 * function that calculates number of symbols in a sentence returns that number.
 
However, sometimes it might seem that a function is not returning anything. Spoiler: it returns something, and this something is *nothing*, or `None`.

In order to perform the intended chain of action, a function needs zero or more **arguments**, or objects that are required in advance by that function. For example,
 * function that reverses strings needs to have 1 argument: a string to reverse;
 * function that removes first _n_ words from a sentence needs 2 arguments: the number of words to remove, and the sentence itself;
 * function that prints "Hello world!" needs 0 arguments: we know exactly what we are printing.


**Question:** what arguments can a function that draws a circle have?

The simplest function in Python is `print`. It simply displays on the screen its argument(s):

In [None]:
print("Hello world!")

In [None]:
print("Mary", "John")

In [None]:
print()

In [None]:
print("September", 9)

In simple words, the `print` function simply prints on the screen whatever it has in the parentheses. But in the previous cell, we see that the two arguments of `print` are colored in different colors. The reason for it that the Python syntax highlighting shows that they belong to _different data types_.

## 3. Basic data types: int, float, str, bool

We will start exploting Python data types by looking at integers, floats, strings, and booleans.

**Integers** (`int`) are numbers without a fractional component. For example, `8`, `0`, `-1` and `-9248` are integers, whereas `3.14` or `-1.333` are not. 


**Floating point numbers** (`float`) are numbers with the fractional component, i.e. `9.8`, `4.3958` or `-8.000001`.  

This is a very important distinction, since integers and floating point numbers are stored differently in the memory of the computer. A function that conveniently shows the type of its argument is `type`. Like this, we can ensure that `8` is an integer, whereas `8.5` is not:

In [None]:
type(8)

In [None]:
type(8.5)

Note, that when `8` is written as `8.0`, it is a `float` and not an integer!

In [None]:
type(8.0)

We can perform arithmetic operations with integers and floats.

In [None]:
# addition
6 + 9

In [None]:
# substraction
99 - 0.5

In [None]:
# multiplication
5 * 2

In [None]:
# division that returns a floating point number ("classic division")
115 / 2

In [None]:
# division that rounds down the result to a nearest integer ("floor division")
115 // 2

In [None]:
# exponentiation
2 ** 10

**Practice:** how to calculate the square root of `1024` using Python knowledge that we already have?

In [None]:
# your code

On a separate note, notice two things:
 * there is an orange _Out\[number\]_ right next to the outputs of every cell, and
 * we didn't use `print`, and still saw the results of the operations!
 
This happens because when we run a cell, the output of the last operation is being displayed on the screen. How can we check that it's only the last operation that is displayed and not all of them?

In [None]:
5 + 8   # 13
9 + 2   # 11

If we want to make sure that every output is displayed, we should use the `print` function.

In [None]:
print(5 + 8)   # 13
print(9 + 2)   # 11

**Strings** (`str`) are sequences of characters: `"apple"`, `'Hello world!'` or `"My phone number is 123."`. **Strings are always surrounded by quotes!** These quotes can be either single or double, just use them consistently.

In [None]:
type("My phone number is 123.")

When a number is surrounded by quotes, it is not an integer or a float, but it is a string!

In [None]:
print(type(5))
print(type("5"))

In the previous code cell, we see the following line: `print(type(5))`. It simply means that the output of `type(5)` is passed to the `print` function as an argument, i.e. `type(5)` tells that the type of `5` is an integer, and the `print` function catches that output and displays it on the screen.

For strings, `+` operator defines concatenation.

In [None]:
"artificial" + "ly"

In [None]:
# "15" and "1" are strings, not integers!
"15" + "1"

**Practice:** what will happen if we add string "15" and integer 1?

In [None]:
# your code

A frequent task is to convert a variable from one type to another, or to perform _typecasting_. If we want to change the type of a string to an integer, for example, to be able to perform arithmetic with a number that was represented as a string, we can use the `int` or `float` functions.

In [None]:
number = "55"
print("Old type:", type(number))
number = int(number)
print("New type:", type(number))

A value can be converted from another type to a string by using the `str` function.

In [None]:
number2 = 4.7
print("Old type:", type(number2))
number2 = str(number2)
print("New type:", type(number2))

Finally, **booleans** (`bool`) are `True` and `False`, or simply `1` and `0`.

In [None]:
print(type(True))
print(type(False))
print(type("False"))

Booleans are the "answers" to such questions like the following ones.
 * Does this phrase contain the word "linguistics"?
 * Is the sum of those two number bigger than 17?
 * Have we already seen this sentence before?
 
We will see very soon how extremely useful booleans are.

## 4. Variables

The way to store some value in the memory of the computer is to define a _variable_ that refers to that value. In some sense, the variable is the name of the value. As soon as we _declare_ that variable, we can use it to refer to its value.

For example, we can define a variable `name` and then use it if we want to greet someone:

In [None]:
name = "John"
print("Hello,", name)

The value of the variable can be of any data type.

In [None]:
var1 = "banana"
var2 = 9
var3 = 0.2
var4 = True

print("The type of var1 is", type(var1))
print("The type of var2 is", type(var2))
print("The type of var3 is", type(var3))
print("The type of var4 is", type(var4))

If there are several lines where the same variable name is defined, only the last definition matters.

In [None]:
name = "John"
name = "Alice"
print(name)

**Laws of variable names**
 * Variable names are not strings: they are not surrounded by quotes!
 * They cannot start with a digit.
 * They cannot contain spaces or special symbols such as $, !, ~, etc. (The underscore is fine though!)

**Warning:** never (unless you are doing it on purpose!) define a variable using the term that already means something for python (`print`, `int`, `type`, etc.) It is possible, but it will break _a lot_ of things.

We can store a result of an operation in a variable.

In [None]:
hello = type("Hello!")
print(hello)

In [None]:
large_number = 193425 + 32532513
print(large_number)

In [None]:
parent1 = "Mary"
parent2 = "John"
parents = parent1 + " and " + parent2
print(parents)

If the variable value needs to be updated with respect to its old value, we can use the following operators:
 * `var += some_value` (same as `var = var + some_value`);
 * `var -= some_value` (same as `var = var - some_value`);
 * `var *= some_value` (same as `var = var * some_value`);
 * `var /= some_value` (same as `var = var / some_value`).

In [None]:
var1 = 15
print("Old value:", var1)
var1 += 1
print("New value:", var1)

In [None]:
var2 = "mild"
print("Old value:", var2)
var2 += "ly"
print("New value:", var2)

In [None]:
var3 = 19
print("Old value:", var3)
var3 /= 2
print("New value:", var3)

## 5. Basic IO

We already know that the way to display values on the screen is to `print` them. However, many tasks, especially the chatbots, rely on the input from a user. In Python, `input` takes care of it!

The `input` function asks users to enter the information and returns the string containing the user input, so in most of the cases, it is useful to save the results of the input into some variable.

If `input` is called without any arguments (i.e. as `input()`, don't forget the parentheses!), it simply waits for the user to type in some information.

In [None]:
user_input = input()
print("The user input is:", user_input)

However, if `input` is called with an argument, this argument is displayed next to the input window.

In [None]:
name = input("What is your name? ")
print("My name is", name)

## 6. Boolean expressions

Booleans expressions are expressions that can be evaluated to `True` or `False`. There are multiple logical operators that help us to form them.

The operator `==` checks for the equality of its left and the right sides.

In [None]:
10 + 5 == 15

In [None]:
10 == 15

In [None]:
"Apple" == "apple"

In [None]:
(10 + 5 == 20) == False

The opposite operator to `==` is `!=`, it checks for non-equality:

In [None]:
1 != 10

Operators `>`, `>=`, `<` and `<=` are defined as well.

In [None]:
7 < 9

In [None]:
8 >= 8

The operator `in` checks if the left-hand side object is contained within the right-hand side one.

In [None]:
"world" in "Hello world!"

In [None]:
"Apple" in "I love apples"

The operator `not` reverses the truth value to the opposite one.

In [None]:
not True

In [None]:
not False

In [None]:
"peach" not in "I love apples"

In [None]:
not (10 + 5 == 15)

Apart from the above listed operators, there are _complex operators_ `and` and `or`. Boolean expressions can be combined using these operators.
* **`and`** returns true if it combines two expressions, and both of them evaluate to True;
* **`or`** returns true if at least one of the expressions it combines evaluates to True.

_Beware of the scope_: `(A and B) or C` is not the same thing as `A and (B or C)`!

In [None]:
True and True

In [None]:
True and False

In [None]:
False or True

In [None]:
(False and True) or True

In [None]:
False and (True or True)

In [None]:
("apple" in "apples") and (1 + 1 == 2)

In [None]:
("apple" in "apples") or (1 + 1 == 5)

In [None]:
(("apple" in "apples") and (1 + 1 == 3)) or (5 < 10)

## 7. More magic with strings

**Bag-of-words** model of meaning assumes that the meaning of the text can be represented by all the words found in the text and their frequency. Indeed, if the text is about pets, we expect words such as "cat" and "dog" to be more frequent in it, and if the text is about politics, words such as "president", "market",  and "GDP" will occure more often.

However, there are words that are frequent in all types of texts: "and", "of", "the", "a(n)", "there", and so on. These words are called **stop words**, and since they are not informative for modeling the meaning of the text, they are frequently removed from it.

For many linguistic tasks, capitalization of the words does not matter. For example, when the task is to get rid of the stop words, we want to get rid of them independently of the capitalization ("THE", "the", "The", etc.) However, for Python, "the" and "The" are completely different words.

In [None]:
"the" == "The"

There is a way to map all the versions of "the" with different capitalizations to "the": `str.lower("ThE")`.

In [None]:
str.lower("ThE")

In [None]:
"the" == str.lower("ThE")

Similarily, there are functions `upper` and `title` that convert a string to uppercase or capitalize it.

In [None]:
print("The uppercase of 'the' is '" + str.upper("the") + "'.")

In [None]:
print("The title version of 'hello world' is '" + str.title("hello world") + "'.")

In the examples below, we are printing parentheses inside of the other parentheses by alternating their types (i.e. if double quotes are marking the string, the single quotes are used inside, or vice versa).

However, the other way to do it is to use a special _escape symbol_ `/` before the quotation mark that we want to have as a part of the string.

In [None]:
print("Both single quotations \' and double quotations \" are interpreted literally this way.")

Another important special symbols are the `\n` (new line) and `\t` (tabulation).

In [None]:
print("This line \ncontinues on the second line, and tabulation is here\tas well.")

There are also functions that allow to check if the string is uppercase, lowercase, or title, and these functions are:
 * `str.isupper("Hello")` checks if a string is uppercase;
 * `str.islower("Hello")` checks if a string is lowercase;
 * `str.istitle("Hello")` checks if a string is a title.

In [None]:
str.isupper("HELLO WORLD!")

In [None]:
str.islower("hello world!")

In [None]:
str.istitle("Hello World!")

Another very useful function is `len`. Can you guess what it does based on the outputs of the following cells?

In [None]:
len("Hello world!")

In [None]:
len("computational linguistics")

In [None]:
len(25)

# Homework 1

**Due on Saturday, September 14th, 11.59pm**

Send your notebook (don't forget to save your solutions!) to <alena.aksenova@stonybrook.edu> with the subject **\[CompLing1\] Homework 1**.

**Problem 1.** You are given the following paragraph.

In [None]:
text = "A glance around her studio reveals some of the complexity. The place is packed chockablock " \
       "with clusters of objects grouped by type: alarm clocks (maybe two dozen), antique books, model " \
       "clipper ships, African masks, birdcages, globes, painted wood watermelon slices, the Mexican " \
       "healing charms known as milagros and so-called mammy dolls piled on a chair."

Write a code that will ask the user to enter a word. Then check if this word is contained in the text given above.

**Problem 2.** Write a condition that checks if the word "idiot" is present in the user input. Make sure that it works independently of the capitalization!

**Problem 3.** Ask the user for the year in which they were born, and print the age of the user. (Assume that the user's birthday is always January 1 so that the calculation is simple.)

_Hint:_ the `int` function might be useful here.

**Problem 4.** With string concatenation you can also play a round of *Mad Libs*.
If you aren't familiar with the game, here's how it works: you have a predetermined text where certain words have been replaced by their part of speech, for example *verb*, *noun*, *adjective*.
For each gap, you ask a friend to say a word of that part of speech.
You then put those words in the gap and read out the text aloud.
Ideally, hilarity ensues.

Here's an example adapted from the very first Mad Libs book:

~~~
"[exclamation]! He said [adverb] as he jumped into his convertible
[noun] and drove off with his [adjective] friend."

"Ron! He said better as he jumped into his convertible
Tesla and drove off with his irritating friend."
~~~

Write a program that allows the user to play a single round of Mad Libs with the computer.

**Problem 5.** Ask the user to define what is the minimal number of characters for a word to be considered long. Ask for another input, in this case, a word. Afterwards, write a boolean expression that checks if the word provided by a user is long or not.