# Python: the Language and Ecosystem


Shane Steinert-Threlkeld

# Roadmap

* **Background**
* Getting Started
* Language Basics
* Best Practices / Ecosystem / Further Resources

# Background

* Python _interpreted_ in most implementations

```python
:~$ python
>>> print('hello world!')
hello world
```

* `echo "print('hello world')" > hello.py && python hello.py`

* Can be compiled into bytecode (\*.pyc) and interface with C
    * See [Cython](https://cython.org/)
    * So: can be a very efficient language (more later on this)

# Background

* Python has a very active user community, and useful package index ([PyPi.org](http://pypi.org)) and package manager (`pip`)
    * **N.B.** best used in concert with _virtual environments_ (again: more later)
* Many scientific computing packages:
    * **`numpy`**, `scipy`
    * `nltk`
    * `scikit-learn`
    * Lingua franca of deep learning: `tensorflow`, `pytorch`

# Roadmap

* Background
* **Getting Started**
* Language Basics
* Best Practices / Ecosytem / Further Resources

# Installing Python

Global / system-wide installation (more on this later):

* **macOS**: MacPorts / homebrew
    * `port install python37`
    * `brew install python`
* **Linux**
    * `apt-get python3`
    * `yum install python3`
* **Windows**
    * [http://python.org/downloads/windows](http://python.org/downloads/windows)

# Installing via Anaconda

Alternatively, use [Anaconda](http://anaconda.org) or  [miniconda](https://docs.conda.io/projects/miniconda/en/latest/).  Comes with:
* lots of scientific computing packages
* great command-line tools (`conda`) for managing virtual environments
    * highly encouraged!
    * great for custom/local python installs on `patas`
* use `wget` if on a headless machine

# Editing Python

* **PyCharm**
    * Integrated Development Environment (IDE)
    * Professional version free for students
    * [https://www.jetbrains.com/pycharm/](https://www.jetbrains.com/pycharm/)

* `vim`: my old faithful, with packages
    * worth learning `vim` or `emacs` for powerful text editing
 
* [VSCode](https://code.visualstudio.com) (with vim keybindings of course)
    * great plugins / community
    * built-in git support
    * highly extensible

# Editing Python

* Jupyter Notebooks
    * "Literate programming" paradigm
    * Create distributable "notebooks" mixing markdown (incl. LaTeX) inline with code
    * E.g.: _these slides_!
    
Caveat: can encourage bad practices. See [Joel Grus' slides](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1)

For a notebook that tries to overcome some of these bad practices, see [nbdev](https://nbdev.fast.ai/).

More useful editing tools later in the slides!

# Roadmap

* Background
* Getting Started
* **Language Basics**
* Best Practices / Ecosystem / Resources

# This Tutorial:

[https://github.com/shanest/python-tutorial-clms](https://github.com/shanest/python-tutorial-clms)

# Basics: Built-In Types

In [None]:
# basics (this is a single-line comment)
an_int = 1
a_float = 1.2
a_bool = True

# Basics: Built-In Types

In [None]:
# strings
string1 = 'CLMS rules!'  # this is a comment
string2 = "Some people prefer double-quotes."
string3 = '''If you use three quotes, 
the string can include
line breaks.'''

"""
This is a 
mult-line comment.
"""

print(string3)

# Gotcha 1: Duck-typing

In [None]:
# We'll return to this later

an_int = 1
a_float = 1.2
a_bool = True
a_string = ""

an_int and a_bool and a_string

# Basics: Built-in Types

In [None]:
# sequences 
a_list = [1, 2, 3, 1,]  # mutable
a_tuple = (1, 2, 3, 1,)  # immutable
a_set = {1, 2, 3, 1,}  # no duplicates, no order

a_tuple + (3, 4,)
a_tuple

a_list.extend([3,4])
a_list

In [None]:
a_list[1:3]

# Basics: Built-in Types

In [None]:
# dictionaries = hash-tables
a_dict = {'key1': 'value1',
          'key2': 'value2',
          3: 4.4}

a_dict[3]
a_dict['key2']

# Basics: Time Complexity

* Think about data structures and what operations you will be performing often
    * [Time complexity of Python data structures](https://wiki.python.org/moin/TimeComplexity)
* Generally:
    * list and list-likes are good for insertion at end; bad for look-up/membership
    * set + dict are good for look-up

# Basics: Methods

In [None]:
def hello(string):
    output = 'Hello ' + string
    print(output)

In [None]:
hello('world')

# Gotcha 2: White Space

In [None]:
def hello(string):
    # whitespace is meaningful!
    output = 'Hello ' + string
    return output

# Basics: Classes

In [None]:
class Student:
    # class variable
    program = 'CLMS'
    
    def __init__(self, name):
        self.name = name
        
    def set_name(self, new_name):
        self.name = new_name
    
    @classmethod
    def class_method(cls, blah):
        cls.blah = blah
        
    @staticmethod
    def check_name(name):
        return type(name) is str

In [None]:
shane = Student('Shane')
shane.name
shane.set_name('Shania')
shane.name

# Basics: Control Flow

In [None]:
a = 1
if a is None:
    a = 1
    print('None no more')
elif a == 3:
    a = 1
else:
    a = None
    
a = 3 if 2 + 2 == 5 else 4
a

# Basics: Control Flow

In [None]:
a_list = [1, 2, 3, 1,]  # mutable
a_tuple = (1, 2, 3, 1,)  # immutable
a_set = {1, 2, 3, 1,}  # no duplicates, no order

total = 0
for num in a_set:
    total += num
    
print(total)
print(sum(a_list))

for num in range(5):
    print(num)
    
# comprehensions
added = [num + 1 for num in a_list]
print(added)

{n: n+1 for n in a_list}

# Basics: Control Flow

In [None]:
num = 5
while num > 0:
    num -= 1
    print(num)

# Basics: Files

In [None]:
with open('dummy.txt', 'r') as f:  # always open files in a `with`!
    for line in f:
        print(line)

# Regular Expressions

* Useful for searching / matching patterns in text (e.g. corpora)
* In Python: `re` module
    * collections of methods, class definitions, etc.
    * every file roughly defines a module (but more compicated structures)

# Regular Expressions

In [None]:
import re

word = 'raced'
re.search('ed$', word)
re.split('ed$', word)
re.sub('ed$', 'er', word)

In [None]:
pattern = re.compile('ed$')
if pattern.search(word):
    print('maybe past')

# Regular Expressions

In [None]:
# find digits
string = 'LING 571'
pattern = re.compile('[0-9]')
pattern.search(string)

In [None]:
# find float-like
pattern2 = re.compile('[0-9]\.[0-9]')  # what's wrong with this?

# Text Processing

In [None]:
string = 'quick brown fox'
string.split(' ')

In [None]:
string.replace(' ', ', ')

In [None]:
'quick' in string

# Roadmap

* Background
* Getting Started
* Language Basics
* **Best Practices / Ecosystem / Further Resources**

# Type Hinting

In Python 3.5+, you can add type annotations ([https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)):

In [None]:
def hello(string: str) -> str:
    a = string + 2
    return 'Hello ' + string

an_int: int = 2

# Type Hinting

You should _always_ (pretty much) add type hints. Why?
* Readability! Code is for people, not just machines.
* Static analysis:
    * mypy: catch errors before runtime
    * Good linter!
* Editor tools:
    * code completion, etc, can use type hints in very helpful ways

# Code Formatting

Writing clean, consistent code will be extremely valuable for you, your peers, colleagues, future self, etc.

But: it can be a PITA.

**Use a code formatter!**

* [black](https://black.readthedocs.io/en/stable/)
* [yapf](https://github.com/google/yapf)

# Comments and Docstrings

Write detailed comments and docstrings!

I try to follow [Google's Python Style Guide](https://google.github.io/styleguide/pyguide.html) for this.

In [None]:
def find_token(sentence, token, sep=" "):
    for idx, element in enumerate(sentence.split(sep)):
        if element == token:
            return idx
    raise KeyError(f"Token {token} not found in sentence.")

print(find_token("Hello world my name is Shane", "name"))

In [None]:
def find_token(sentence: str, token: str, sep: str =" ") -> int:
    """Checks whether a specified token is found in a provided sentence.

    If so, returns the index in the sentence of the first occurrence of the token.
    If not, raises an error.

    Args:
        sentence: the sentence to search
        token: the token to search for
        sep: a separator by which to split the sentence into tokens

    Returns:
        the index of the first occurrence of `token` in `sentence`, if it exists

    Raises:
        KeyError, if `token` is not found in `sentence`, when split by `sep`
    """
    # split the sentence by the separator, and enumerate through the tokens by index
    for idx, element in enumerate(sentence.split(sep)):
        # return the index if token is found
        if element == token:
            return idx
    # end of sentence reached, token not found, so raise an error
    raise KeyError(f"Token {token} not found in sentence.")

# Useful Packages

* Natural Language ToolKit [http://nltk.org](http://nltk.org)
    * Large collection of NLP tools, corpora, algorithms:
        * tokenizers, stemmers
        * parsers
        * semantic analysis
        * corpus fragments
    * Pedagogically oriented: online book (better than docs), examples
    * Heavily used in 571, useful elsewhere
* **[numpy](https://numpy.org)!!**
    * wrapper around very fast C code for numerical computation
    * learn to _vectorize_ numerical code as much as possible

# Writing High-performance Python

* Use new versions when possible! (3.11 has great speed boosts)
* Vectorization with numpy (and TF, PyTorch, ..)
* JIT compilation with [numba](https://numba.org)
* ...

# Python Resources

* Books:
    * Lutz and Ascher, _Learning Python_, O'Reilly
    * Martelli, _Python in a Nutshell_, O'Reilly
    * Beazley, _Python Essential Reference_, Developers Library
* Online
    * Mark Wilson, _Dive into Python_ [http://www.diveintopython3.net/](http://www.diveintopython3.net/)
        * for experienced programmers
    * [http://python.org](http://python.org)
    * [NLTK book](http://www.nltk.org/book)