# Python intro
### LING 242

Python is one of the most useful programming languages to know for computational linguistics and NLP. It's relatively simple to use, and many of the major machine learning and NLP libraries are written for Python. This notebook is a quick intro to Python and some basics of programming for those who haven't done much coding before.

## Data types

Like most modern programming languages, all data in Python are categorized into 'types.' There are four 'primitive' types:

 - **Int** Short for 'integer.' A whole-number value such as 1, 10, -15, etc.
 - **Float** Short for 'floating point number.' A decimal value like 0.001, -10.75, 3.14125, etc. Since computers only have a finite amount of memory, floats sometimes have issues with presicion, because a computer can only store so many places or decimal points. This can cause some issues when working with small decimal numbers, like probabilities, if you're not careful!
 - **String** Text data, such as "example" or "12345" or 'hello' or "False". The single or double quotes can usually be used interchangeably. You can think of them as a list of characters.
 - **Bool** Short for 'boolean.' A true or false value, and written as True or False.
 
 There are more complex types, like lists and objects and so on.
 
 ## Variables
 You can assign or update a variable with =, check equality with ==, and compare with <, <=, > and >=

In [18]:
example = 1
print(example + 12)

# these two are equivalent
example = example + 1
example += 1

print(example == 12)
print(example < 12)

# you can use + and += to concatenate strings
txt = 'hel' + 'lo'
txt += ' world'
print(txt + '!')

13
False
True
hello world!


+= is an example of 'syntactic sugar,' since it's something in Python's syntax that makes it a little easier/quicker to use. In the same vein, you can use -= \*= /= and \**= to update variables. (subtraction, multiplication, division, and exponent)

## Converting types
You can use functions like `string()` or `int()` and so on to convert data from one type to another, if there is a possible way to convert it.

In [23]:
print(float(10))
print(int(7 / 3))

print(int("12345") - 1)

print(bool('False'))
print(bool('example'))
print(bool(''))

10.0
2
12344
True
True
False


## Other useful data types

### Lists
Lists are arbitrarily-sized collections of data. They are ordered, and can be indexed, updated, and traversed. Since strings are more or less lists of characters, you can use the same functions/syntax for both lists and strings.

 - To check the length of a list, use the function `len()`
 - All non-primitive data types (along with strings) have member functions. In the case of lists/strings, `.append()` adds items, `.remove()` removes them, `.index()` returns the position of the input element, and so on.
 - To get the list item at position `i`, use square brackets: `example_list[i]` Note that lists are zero-indexed, so the first element of a list is at position 0. Therefore, the last position in the list is `len(example_list) - 1.` To get the last item, use `example_list[-1]` and so on.
 - 'Slicing' is a method for getting sub-lists. For example, `example_list[1:10]` will return a list consisting of the elements from position 1 to 10. `example_list[1:10:2]` will get every other item.

In [16]:
example_list = [1, 2, 3, 4, 5]
print('Length of the list:', len(example_list))
example_list.append(6)
example_list.remove(2)
print(example_list)

Length of the list: 5
[1, 3, 4, 5, 6]


In [15]:
example_str = 'hello world'
print(example_str[0])
print(example_str[1:7])
print(example_str.index('o'))

h
ello w
4


### Sets
Sets are un-ordered and contain unique values. One simple way of getting only the unique items in a list is to turn it into a set and back, e.g. `list(set(example_list))` The Set data type implements the same functions as sets, the mathematical concept.

 - To create a union or intersection, use `A | B` or `A.union(B)` and `A & B` or `A.intersection(B)`
 - It's faster to search for an element in a set than in a list.

### Dictionaries
Dicts are un-ordered collections of key/value pairs. For example:

In [2]:
example_dict = {1: {'x': 10, 'y': 11},
                2: {'x': 12, 'y': 10}}

In this case, `example_dict` is a nested dictionary: its keys are ints, and its values are dicts! You can access values by using square brackets or the `.get()` function. To get a list of keys or values, use `.keys()` or `.values()`

In [5]:
print(example_dict[1])
print(example_dict[1]['x'])
print(example_dict.values())

{'x': 10, 'y': 11}
10
dict_values([{'x': 10, 'y': 11}, {'x': 12, 'y': 10}])


### None
`None` is a special type that represents an "empty" reference. For example, an un-initialized or discarded variable can be set to `None`.

In [5]:
null = 'abcdef'
print(not null)

null = None
print(not null)

False
True


## Exercises
### 1. Type conversion
In the code block below, try finding things to convert from one data type to another. See if you can find any unexpected/interesting behavior, or if you cause any errors!