# Introduction to Python

September 29, 2020

Shane Steinert-Threlkeld

(building on Scott Farrar, Gina-Anne Levow, and Ryan Georgi)

# Roadmap

* **Background**
* Getting Started
* Learning and Practicing
* NLTK + Resources

# Background

* Python _interpreted_ in most implementations

```python
:~$ python
>>> print('hello world!')
hello world
```

* `echo "print('hello world')" > hello.py && python hello.py`

* Can be compiled into bytecode (\*.pyc) and interface with C
    * See [https://cython.org/](Cython)
    * So: can be a very efficient language

# Background

* Python has a very active user community, and useful package index ([http://pypi.org](PyPi.org)) and package manager (`pip`)
    * **N.B.** best used in concert with _virtual environments_
* Many scientific computing packages:
    * **`numpy`**, `scipy`
    * `nltk`
    * `scikit-learn`
    * Lingua franca of deep learning: `tensorflow`, `pytorch`

# Roadmap

* Background
* **Getting Started**
* Learning and Practicing
* NLTK + Resources

# Installing Python

* **macOS**: MacPorts / homebrew
    * `port install python37`
    * `brew install python`
* **Linux**
    * `apt-get python3`
    * `yum install python3`
* **Windows**
    * [http://python.org/downloads/windows]()

# Installing via Anaconda

Alternatively, use [http://anaconda.org](Anaconda).  Comes with:
* lots of scientific computing packages
* great command-line tools (`conda`) for managing virtual environments
* use `wget` if on a headless machine

# Other Python Tools

* **PyCharm**
    * Integrated Development Environment (IDE)
    * Professional version free for students
    * [https://www.jetbrains.com/pycharm/]()
    
(I personally just use `vim` with some packages for syntax highlighting / linting.  Though I've recently been streamlining my workflow in [https://code.visualstudio.com/](VSCode) with vim keybindings.)

# Other Python Tools

* Jupyter Notebooks
    * "Literate programming" paradigm
    * Create distributable "notebooks" mixing markdown (incl. LaTeX) inline with code
    * E.g.: _these slides_!
    
Caveat: can encourage bad practices. See [Joel Grus' slides](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1)

For a notebook that tries to overcome some of these bad practices, see [https://nbdev.fast.ai/](https://nbdev.fast.ai/).

# Roadmap

* Background
* Getting Started
* **Learning and Practicing**
* NLTK + Resources

# This Tutorial:

[https://github.com/shanest/python-tutorial-clms]()

# Basics: Built-In Types

In [None]:
# basics (this is a single-line comment)
an_int = 1
a_float = 1.2
a_bool = True

# Basics: Built-In Types

In [None]:
# strings
string1 = 'CLMS rules!'  # this is a comment
string2 = "Some people prefer double-quotes."
string3 = '''If you use three quotes, 
the string can include
line breaks.'''

""" here is
a multi-line comment 
but not really"""

print(string3)

# Gotcha 1: Duck-typing

In [None]:
# We'll return to this later

an_int = 1
a_float = 1.2
a_bool = True

variable = an_int and a_float and a_bool and string1
print(variable)

# Basics: Built-in Types

In [None]:
# sequences 
a_list = [1, 2, 3, 1,]  # mutable
a_tuple = (1, 2, 3, 1,)  # immutable
a_set = {1, 2, 3, 1,}  # no duplicates, no order

a_tuple[-2] = 4

In [None]:
a_tuple + (5, 3)
a_tuple

# Basics: Built-in Types

In [None]:
# dictionaries = hash-tables
a_dict = {'key1': 'value1',
          'key2': 'value2',
          3: 4.4}

a_dict[3]

# Basics: Methods

In [None]:
def hello(string):
    output = 'Hello ' + string
    print(output)

In [None]:
hello('world')

# Gotcha 2: White Space

In [None]:
def hello(string):
output = 'Hello ' + string
return output

# Type Annotations

In Python 3.5+, you can add type annotations ([https://docs.python.org/3/library/typing.html]()):

In [None]:
def hello(string: str):
    return 'Hello ' + string

hello(2)

# Basics: Classes

In [None]:
class Student:
    # class variable
    program = 'CLMS'
    
    def __init__(self, name):
        self.name = name
        
    def set_name(self, new_name):
        self.name = new_name
    
    def class_method(blah):
        Student.blah = blah

In [None]:
shane = Student('Shane')
shane.name
shane.set_name('Shania')
shane.name

# Basics: Control Flow

In [None]:
a = 1
if a is None:
    a = 1
    print('None no more')
elif a == 3:
    a = 1
else:
    a = None
    
a = 3 if 2 + 2 == 5 else 4
a

# Basics: Control Flow

In [None]:
a_list = [1, 2, 3, 1,]  # mutable
a_tuple = (1, 2, 3, 1,)  # immutable
a_set = {1, 2, 3, 1,}  # no duplicates, no order

total = 0
for num in a_set:
    total += num
    
print(total)
print(sum(a_list))

for num in range(5):
    print(num)
    
# comprehensions
added = [num + 1 for num in a_list]
print(added)

# Basics: Control Flow

In [None]:
num = 5
while num > 0:
    num -= 1
    print(num)

# Basics: Files

In [None]:
with open('dummy.txt', 'r') as f:  # always open files in a `with`!
    for line in f:
        print(line)

# Regular Expressions

* Useful for searching / matching patterns in text (e.g. corpora)
* In Python: `re` module
    * collections of methods, class definitions, etc.
    * every file roughly defines a module (but more compicated structures)

# Regular Expressions

In [None]:
import re

word = 'raced'
re.search('ed$', word)
re.split('ed$', word)
re.sub('ed$', 'er', word)

In [None]:
pattern = re.compile('ed$')
if pattern.search(word):
    print('maybe past')

# Regular Expressions

In [None]:
# find digits
string = 'LING 571'
pattern = re.compile('[0-9]')
pattern.search(string)

In [None]:
# find float-like
pattern2 = re.compile('[0-9]\.[0-9]')  # what's wrong with this?

# Text Processing

In [None]:
string = 'quick brown fox'
string.split(' ')

In [None]:
string.replace(' ', ', ')

In [None]:
'quick' in string

# Roadmap

* Background
* Getting Started
* Learning and Practicing
* **NLTK + Resources**

# NLTK

* Natural Language ToolKit [http://nltk.org]()
* Large collection of NLP tools, corpora, algorithms:
    * tokenizers, stemmers
    * parsers
    * semantic analysis
    * corpus fragments
    * ML components
* Public domain, open source
* Pedagogically oriented: online book (better than docs), examples
* Heavily used in 571, useful elsewhere

# Using Python

* Minimum: 2.7
    * Last version of 2, being phased ou
    * May be required for legacy
* Recommended: 3.4+ (3.8.6 current newest as of Sep, 2020)
* On `patas`
    * default: `2.7.5`
    * 3.6: use `/opt/python36` (you can make an `alias` in `~/.bashrc` to make this the default)
    * best: for projects, always use `conda` environments!

# Some other tools

* Linting:
    - pylint
    - pep8
    - mypy (reasons using type hints)
* Formatting:
    - PEP8
    - [https://black.readthedocs.io/en/stable/](black) (my recommendation)
 

# Python Resources

* Books:
    * Lutz and Ascher, _Learning Python_, O'Reilly
    * Martelli, _Python in a Nutshell_, O'Reilly
    * Beazley, _Python Essential Reference_, Developers Library
* Online
    * Mark Wilson, _Dive into Python_ [http://www.diveintopython3.net/]()
        * for experienced programmers
    * [http://python.org](http://python.org)
    * [NLTK book](http://www.nltk.org/book)