<div style="text-align: right">
    <i>
        LIN 537: Computational Lingusitics 1 <br>
        Fall 2019 <br>
        Alëna Aksënova
    </i>
</div>

# Notebook 1: Basic IO, variables, boolean expressions

This notebook introduces the basic programming terminology such as *funcitons*, *arguments*, *variables*, *IO* etc. It shows how to implement basic input-output operations using `print` and `input`. Then it explains such fundamental data types as `str`, `int`, `float` and `bool`. Finally, it demonstrates how to create and evaluate Boolean expressions.

In [0]:
import this


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## 1. The Jupyter environment

In this class we are going to use Jupyter notebooks for coding. They are minimalistic, easy to read, they make it easy to write explanation in Markdown, and it is very convenient to work with the code in separate cells.

In order to avoid the hussle of installing Jupyter notebooks locally, we will be using [Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) that makes it easy to upload and work with the notebooks.

## 2. Functions and their arguments

The notion of functions and arguments is probably not new to you, especially if you took formal semantics classes. Roughly, _John sleeps_ can be expressed as a function `sleep` getting `John` as an argument and therefore yielding `sleep(John)`.

In Python, and in programming in general, **functions** can be thought of as descriptions of actions. They always return some value. For example,
 * function that reverses strings returns another string;
 * function that adds two numbers together returns their sum;
 * function that calculates number of symbols in a sentence returns that number.
 
However, sometimes it might seem that a function is not returning anything. Spoiler: it returns something, and this something is *nothing*, or `None`.

In order to perform the intended chain of action, a function needs zero or more **arguments**, or objects that are required in advance by that function. For example,
 * function that reverses strings needs to have 1 argument: a string to reverse;
 * function that removes first _n_ words from a sentence needs 2 arguments: the number of words to remove, and the sentence itself;
 * function that prints "Hello world!" needs 0 arguments: we know exactly what we are printing.


**Question:** what arguments can a function that draws a circle have?

The simplest function in Python is `print`. It simply displays on the screen its argument(s):

In [0]:
print("Hello world!")

Hello world!


In [0]:
print("Mary", "John")

Mary John


In [0]:
print()




In [0]:
print("September", 9)

September 9


In simple words, the `print` function simply prints on the screen whatever it has in the parenthesis. But in the previous cell, we see that the two arguments of `print` are colored in different colors. The reason for it that the Python syntax highlighting shows that they belong to _different data types_.

## 3. Basic data types: int, float, str, bool

We will start exploting Python data types by looking at integers, floats, strings, and booleans.

**Integers** (`int`) are numbers without fractional component. For example, `8`, `0`, `-1` and `-9248` are integers, whereas `3.14` or `-1.333` are not. 


**Floating point numbers** (`float`) are numbers with the fractional component, i.e. `9.8`, `4.3958` or `-8.000001`.  

This is a very important distinction, since integers and floating point numbers are stored differently in the memory of the computer. A function that conveniently shows the type of its argument is `type`. Like this, we can ensure that `8` is an integer, whereas `8.5` is not:

In [0]:
type(8)

int

In [0]:
type(8.5)

float

Note, that when `8` is written as `8.0`, it is a `float` and not an integer!

In [0]:
type(8.0)

float

We can perform arithmetic operations with integers and floats.

In [0]:
# addition
6 + 9

15

In [0]:
# substraction
99 - 0.5

98.5

In [0]:
# multiplication
5 * 2

10

In [0]:
# division that returns a floating point number ("classic division")
115 / 2

57.5

In [0]:
# division that rounds down the result to a nearest integer ("floor division")
115 // 2

57

In [0]:
# exponentiation
2 ** 10

1024

**Practice:** how to calculate the square root of `1024` using Python knowledge that we already have?

In [0]:
1024 ** 0.5

32.0

On a separate note, notice two things:
 * there is an orange _Out\[number\]_ right next to the outputs of every cell, and
 * we didn't use `print`, and still saw the results of the operations!
 
This happens because when we ran a cell, the output of the last operation is being displayed on the screen. How can we check that it's only the last operation that is displayed and not all of them?

In [0]:
5 + 8   # 13
9 + 2   # 11

11

If we want to make sure that every output is displayed, one should use the `print` function.

In [0]:
print(5 + 8)   # 13
print(9 + 2)   # 11

13
11


**Strings** (`str`) are sequences of characters: `"apple"`, `'Hello world!'` or `"My phone number is 123."`. **Strings are always surrounded by quotes!** These quotes can be either single or double, just use them consistently.

In [0]:
type("My phone number is 123.")

When a number is surrounded by quotes, it is not an integer or a float, but it is a string!

In [0]:
print(type(5))
print(type("5"))

<class 'int'>
<class 'str'>


In the previous code cell, we see the following line: `print(type(5))`. It simply means that the output of `type(5)` is passed to the `print` function as an argument, i.e. `type(5)` tells that the type of `5` is an integer, and the `print` function catches that output and displays it on the screen.

For strings, `+` operator defines concatenation.

In [0]:
"artificial" + "ly"

'artificially'

In [0]:
# "15" and "1" are strings, not integers!
"15" + "1"

'151'

**Practice:** what will happen if we add string "15" and integer 1?

In [0]:
# your code
'15'+1


TypeError: ignored

A frequent tast is to conver a variable from one type to another, or to perform _typecasting_. If we want to change the type of a string to an integer, for example, to be able to preform arithmetics with a number that was represented as a string, we can use the `int` or `float` functions.

A value can be converted from another type to a string by using the `str` function.

In [0]:
number = "55"
print("Old type:", type(number))
number = int(number)
print("New type:", type(number))

Old type: <class 'str'>
New type: <class 'int'>


In [0]:
number2 = 4.7
print("Old type:", type(number2))
number2 = str(number2)
print("New type:", type(number2))

Old type: <class 'float'>
New type: <class 'str'>


Finally, **booleans** (`bool`) are `True` and `False`, or simply `1` and `0`.

In [0]:
print(type(True))
print(type(False))
print(type("False"))

<class 'bool'>
<class 'bool'>
<class 'str'>


Booleans are the "answers" to such questions like the following ones.
 * Does this phrase contain the word "linguistics"?
 * Is the sum of those two number bigger than 17?
 * Have we already seen this sentence before?
 
We will see very soon how extremely useful booleans are.

## 4. Variables

The way to store some value in the memory of the computer is to define a _variable_ that refers to that value. In some sense, the variable is the name of the value. As soon as we _declare_ that variable, we can use it to refer to its value.

For example, we can define a variable `name` and then use it if we want to greet someone:

In [0]:
name = "John"
print("Hello,", name)

Hello, John


The value of the variable can be of any data type.

In [0]:
var1 = "banana"
var2 = 9
var3 = 0.2
var4 = True

print("The type of var1 is", type(var1))
print("The type of var2 is", type(var2))
print("The type of var3 is", type(var3))
print("The type of var4 is", type(var4))

The type of var1 is <class 'str'>
The type of var2 is <class 'int'>
The type of var3 is <class 'float'>
The type of var4 is <class 'bool'>


If there are several lines where the same variable name is defined, only the last definition matters.

In [0]:
name = "John"
name = "Alice"
print(name)

Alice


**Laws of variable names**
 * Variable names are not strings: they are not surrounded by quotes!
 * They cannot start with a digit.
 * They cannot contain spaces or special symbols such as $, !, ~, etc. (The underscore is fine though!)

**Warning:** never (unless you are doing it on purpose!) define a variable using the term that already means something for python (`print`, `int`, `type`, etc.) It is possible, but it will break _a lot_ of things.

We can store a result of an operation in a variable.

In [0]:
hello = type("Hello!")
print(hello)

<class 'str'>


In [0]:
large_number = 193425 + 32532513
print(large_number)

32725938


In [0]:
parent1 = "Mary"
parent2 = "John"
parents = parent1 + " and " + parent2
print(parents)

Mary and John


If the variable value needs to be updated with respect to its old value, we can use the following operators:
 * `var += some_value` (same as `var = var + some_value`);
 * `var -= some_value` (same as `var = var - some_value`);
 * `var *= some_value` (same as `var = var * some_value`);
 * `var /= some_value` (same as `var = var / some_value`).

In [0]:
var1 = 15
print("Old value:", var1)
var1 += 1  #var1=var+1
print("New value:", var1)

Old value: 15
New value: 16


In [0]:
var2 = "mild"
print("Old value:", var2)
var2 += "ly"
print("New value:", var2)

Old value: mild
New value: mildly


In [0]:
var3 = 19
print("Old value:", var3)
var3 /= 2
print("New value:", var3)

Old value: 19
New value: 9.5


## 5. Basic IO

We already know that the way to display values on the screen is to `print` them. However, many tasks, especially the chatbots, rely on the input from a user. In Python, `input` takes care of it!

The `input` function asks user to enter the information and returns the string containing the user input, so in most of the cases, it is useful to save the results of the input into some variable.

If `input` is called without any arguments (i.e. as `input()`, don't forget the parenthesis!), it simply waits for the user to type in some information.

In [0]:
user_input = input()
print("The user input is:", user_input)

hi
The user input is: hi


However, if `input` is called with an argument, this argument is displayed next to the input window.

In [0]:
name = input("What is your name? ")
print("My name is", name)

What is your name? Karina
My name is Karina


## 6. Boolean expressions

Booleans expressions are expressions that can be evaluated to `True` or `False`. There are multiple logical operators that help us to form them.

The operator `==` checks for the equality of its left and the right sides.

In [0]:
10 + 5 == 15

True

In [0]:
10 == 15  #check for equality

False

In [0]:
"Apple" == "apple"

False

In [0]:
(10 + 5 == 20) == False

True

The opposite operator to `==` is `!=`, it checks for non-equality:

In [0]:
1 != 10

True

Operators `>`, `>=`, `<` and `<=` are defined as well.

In [0]:
7 < 9

True

In [0]:
8 >= 8

True

The operator `in` checks if the left-hand side object is contained within the right-hand side one.

In [0]:
"world" in "Hello world!"

True

In [0]:
"Apple" in "I love apples"

False

The operator `not` reverses the truth value to the opposite one.

In [0]:
not True

False

In [0]:
not False

True

In [0]:
"peach" not in "I love apples"

True

In [0]:
not (10 + 5 == 15)

False

Apart from the above listed operators, there are _complex operators_ `and` and `or`. Boolean expressions can be combined using these operators.
* **`and`** returns true if it combines two expressions, and both of them evaluate to True;
* **`or`** returns true if at least one of the expressions it combines evaluates to True.

_Beware of the scope_: `(A and B) or C` is not the same thing as `A and (B or C)`!

In [0]:
True and True

True

In [0]:
True and False

False

In [0]:
False or True

True

In [0]:
(False and True) or True

True

In [0]:
False and (True or True)

False

In [0]:
("apple" in "apples") and (1 + 1 == 2)

True

In [0]:
("apple" in "apples") or (1 + 1 == 5)

True

In [0]:
(("apple" in "apples") and (1 + 1 == 3)) or (5 < 10)

True

## 7. More magic with strings

**Bag-of-words** model of meaning assumes that the meaning of the text can be represented by all the words found in the text and their frequency. Indeed, if the text is about pets, we expect words such as "cat" and "dog" to be more frequent in it in, and if the text is about politics, words such as "president", "market",  and "GDP" will occure more often.

However, there are words that are frequent in all types of texts: "and", "of", "the", "a(n)", "there", and so on. These words are called **stop words**, and since they are not informative for modeling the meaning of the text, they are frequently removed from it.

For many linguistics tasks, capitalization of the words does not matter. For example, when the task is to get rid of the stop words, we want to get rid of them independently of the capitalization ("THE", "the", "The", etc.) However, for Python, "the" and "The" are completely different words.

In [0]:
"the" == "The"

False

There is a way to map all the versions of "the" with different capitalizations to "the": `str.lower("ThE")`.

In [0]:
str.lower("ThE")

'the'

In [0]:
"the" == str.lower("ThE")

True

Similarily, there are functions `upper` and `title` that convert a string to uppercase or capitalize it.

In [0]:
print("The uppercase of 'the' is '" + str.upper("the") + "'.")

The uppercase of 'the' is 'THE'.


In [0]:
print("The title version of 'hello world' is '" + str.title("hello world") + "'.")

The title version of 'hello world' is 'Hello World'.


In the examples below, we are printing parenthesis inside of the other parenthesis by alternating their types (i.e. if double quotes are marking the string, the single quotes are used inside, or vice versa).

However, the other way to do it is to use a special _escape symbol_ `/` before the quotation mark that we want to have as a part of the string.

In [0]:
print("Both single quotations \' and double quotations \" are interpreted literally this way.")

Both single quotations ' and double quotations " are interpreted literally this way.


Another important special symbols are the `\n` (new line) and `\t` (tabulation).

In [0]:
print("This line \ncontinues on the second line, and tabulation is here\tas well.")

This line 
continues on the second line, and tabulation is here	as well.


There are also functions that allow to check if the string is uppercase, lowercase, or title, and these functions are:
 * `str.isupper("Hello")` checks if a string is uppercase;
 * `str.islower("Hello")` checks if a string is lowercase;
 * `str.istitle("Hello")` checks if a string is a title.

In [0]:
str.isupper("HELLO WORLD!")

True

In [0]:
str.islower("hello world!")

True

In [0]:
str.istitle("Hello World!")

True

Another very useful function is `len`. Can you guess what it does based on the outputs of the following cells?

In [0]:
len("Hello world!")

12

In [0]:
len("computational linguistics")

25

In [0]:
len(25)

TypeError: ignored

# Homework 1

**Due on Saturday, September 14th, 11.59pm**

Send your notebook (don't forget to save your solutions!) to <alena.aksenova@stonybrook.edu> with the subject **\[CompLing1\] Homework 1**.

**Problem 1.** You are given the following paragraph.

In [0]:
text = "A glance around her studio reveals some of the complexity. The place is packed chockablock " \
       "with clusters of objects grouped by type: alarm clocks (maybe two dozen), antique books, model " \
       "clipper ships, African masks, birdcages, globes, painted wood watermelon slices, the Mexican " \
       "healing charms known as milagros and so-called mammy dolls piled on a chair."

Write a code that will ask user to enter a word. Then check is this word is contained in the text given above.

In [0]:
print('Please enter a word:')
user_input=str(input())
text="A glance around her studio reveals some of the complexity. The place is packed chockablock " \
       "with clusters of objects grouped by type: alarm clocks (maybe two dozen), antique books, model " \
       "clipper ships, African masks, birdcages, globes, painted wood watermelon slices, the Mexican " \
       "healing charms known as milagros and so-called mammy dolls piled on a chair."

print('The word is contained in the text:', user_input in text)



Please enter a word:
gherf
The word is contained in the text: False


**Problem 2.** Write a condition that checks if the word "idiot" is present in the user input. Make sure that it works independently of the capitalization!

In [0]:
var1=str(input("Input any condition about idiot:"))
var1=str.lower("IDIOT")
print(var1)

Input any condition about idiot:IDIOT
idiot


**Problem 3.** Ask user for the year in which they were born, and print the age of the user. (Assume that the user's birthday is always January 1 so that the calculation is simple.)

_Hint:_ the `int` function might be useful here.

In [0]:
Age=int(input("Which year were you born:"))
Age1=2019-Age
print("You age is:", Age1)

Which year were you born:2017
You age is: 2


**Problem 4.** With string concatenation you can also play a round of *Mad Libs*.
If you aren't familiar with the game, here's how it works: you have a predetermined text where certain words have been replaced by their part of speech, for example *verb*, *noun*, *adjective*.
For each gap, you ask a friend to say a word of that part of speech.
You then put those words in the gap and read out the text aloud.
Ideally, hilarity ensues.

Here's an example adapted from the very first Mad Libs book:

~~~
"[exclamation]! He said [adverb] as he jumped into his convertible
[noun] and drove off with his [adjective] friend."

"Ron! He said better as he jumped into his convertible
Tesla and drove off with his irritating friend."
~~~

Write a program that allows the user to play a single round of Mad Libs with the computer.

In [0]:
a=str(input("The exclmation is:"))
b=str(input("The adverb is:"))
c=str(input("The noun is:"))
d=str(input("The adjective is:"))
print(a+"! He said "+b+ " as he jumped into his convertible "+c+ " and drove off with his "+d+ " friend" )

The exclmation is:Ron
The adverb is:quickly
The noun is:cat
The adjective is:happy
Ron! He said quickly as he jumped into his convertible cat and drove off with his happy friend


**Problem 5.** Ask user to define what is the minimal number of characters for a word to be considered long. Ask for another input, in this case, a word. Afterwards, write a boolean expression that checks if the word provided by a user is long or not.

In [0]:
var1=int(input("What is the minimal number of characters for a word to be considered long?"))
var2=str(input("Please enter a word:"))
var3=int(len(var2))
print('The word is too long:', var3>var1)

What is the minimal number of characters for a word to be considered long?20
Please enter a word:tkt
The word is too long: False
