# An introduction to Python

## Join the flying Python circus

Python started as a hobbyist project in Christmas of '89. Its creator said that he wanted "a descendant of ABC that would appeal to Unix/C hackers". (ABC was an attempt to make the BASIC programming language less like [https://en.wikipedia.org/wiki/BASIC#Examples](broken English in screaming CAPS).)

Python looks a lot like pseudo code, code written in a notebook on a hot summer afternoon. For example, `["Spoiled!" if apple["spoiled"] else "Fresh!" for apple in bunch]` is an example of Python's looser syntax. It relies more on words and indents than brackets and semi-colon line teminators.

Python is not for everyone. It's a loose and dynamic interpreted programming language. It assumes a lot, more than a language like C++, and it doesn't run especially fast. It is not something you want to use in a high-stakes, high-speed situation.

Python has good things going for it. It is able to interface with fast and powerful libraries like Numpy and TensorFlow, giving Python speed without sacrificing friendliness. Python has also been the most popular beginner programming language for quite some time, so it has a good following of beginners and grey-beards alike.

To get you started in python, I'll go over the basics

1. **Importing libraries** Importing libraries is how you can use pre-written code in Python. If you're doing data science, statistical learning, or deep learning, you'll likely be importing a library. For example, people have already developed TensorFlow to run deep learning models, and you can import TensorFlow into your python project to quickly set up neural networks yourself.
2. **Manipulating strings** Strings in python are text. They may a few letters, words, lines, sentences, or a simple report nicely aligned and evenly spaced. You can use tools in python to manipulate strings to your liking or make them more presentable on the screen.
3. **Using lists** Lists stores data, but as the name indicates, lists are *lists*. Python can read through them naturally, allowing you to do whatever you want with the items.
4. **Using dictionaries** Just like a real dictionary, when you give a dictionary a word, you get something in return, whether it's the definition of a word, the phone number of a person, or the sales value of a product. You may need dictionaries in your project or you might not: dicionaries have their uses.
5. **Opening files** Since these guides are targetted to data analysts, it's bound to happen that a fille need opening somewhere. I'll cover opening text files, including CSVs. This should at least get you started.

## Importing libraries

To import a Python library/package means to include either Python code or some pre-compiled programs into your own Python script. The important thing to keep track of here is the *namespace*.

Nevertheless, the main things `import` does is include pre-existing code in your python script. This code may be active, or it may be passive. Here's an example of an `import` [that does something](https://www.python.org/dev/peps/pep-0020/).

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


More often, you'll use `import` to get access some libraries like scikit-learn to use premade statistical models. These models will be given their own space in your code, and you can call the models when you decide to: this is the namespace.

As an example of a namespace, suppose you've written a Python function called `fit()` to fit a model to some data. Suppose you also want to import a library that already has a library with a function called `fit()`. How will you be able to call each function separately? That's the main concern when dealing with namespaces. With a good import command, you don't have to worry about these things.

### Method 1: import

Here is a way of importing a library that keeps it completely separate.

In [2]:
import sklearn.linear_model

This imports all of scikit-learn's `linear_model` library and lets you use these models in your program by calling `sklearn.linear_model` followed by the model you want to use. `sklearn.linear_model` is the namespace and it is kept separate.

In [3]:
lm = sklearn.linear_model.LinearRegression()
lm

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Method 2: import as

You can also import a library under a name of your choosing.

In [4]:
import sklearn.linear_model as hotdog

This will let you call the linear models library by only using `linear_model`.

In [5]:
lm = hotdog.LinearRegression()
lm

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Method 3: import selectively

You can also import only some parts of a library, to keep things clean.

In [6]:
from sklearn.linear_model import LinearRegression

Now the `LinearRegression` class will available for you to use on its own. It is now part of your own namespace, as if you had written the model yourself.

In [7]:
lm = LinearRegression()
lm

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### What you should do

Look at some code examples online and see how people usually import a library. It's better to use the common method so that people reading your code see familiar things. For example, the library `numpy` is always import as

In [8]:
# The official way
import numpy as np

## Strings

In Python a [string](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) is text. This will be format used for things such as column names, dictionary keys (more later), and text data.

A very useful built-in function in Python is the [`print()`](https://docs.python.org/3/library/functions.html#print) function!

In [9]:
print("Hello world!")

Hello world!


Use the print function often to display data to your Jupyter session or your Python console.

Here are a few neat things you can do with strings.

In [10]:
print("Strings can be concatenated by " + "adding them together.")

print("A string can also be " + (5 * "multiplied "))

print("The str() function can turn numbers into a string: " + str(100))

Strings can be concatenated by adding them together.
A string can also be multiplied multiplied multiplied multiplied multiplied 
The str() function can turn numbers into a string: 100


Another useful set of string commands are [`split()`](https://docs.python.org/3/library/stdtypes.html#str.split), `join()`, and `replace()` (these last two are documented in [string methods](https://docs.python.org/3/library/stdtypes.html#string-methods).

In [11]:
print("The split function will separate delimited substrings.".split(" "))

print(" ".join(["The", "join", "method", "does", "the", "opposite."]))

print("The replace method does a find and replace.".replace("h", "HOTDOG"))

['The', 'split', 'function', 'will', 'separate', 'delimited', 'substrings.']
The join method does the opposite.
THOTDOGe replace metHOTDOGod does a find and replace.


The most important string function you will come across is the [`.format()`](https://docs.python.org/3/library/string.html#formatstrings) method. If you find the Python documentation obtuse, there also [this webpage](https://pyformat.info/) devoted to `.format()`.

In [12]:
# The first mode of .format() is positional
print("Johnny ate {0} cakes while Timmy ate {1}.".format(5, 15.1))

Johnny ate 5 cakes while Timmy ate 15.1.


The `.format()` method can also receive specific formatting options. For example, `{:5}` will reserve a 5 characters space to print an integer. `{:<10}` will left-align a string within a 10-character space. Finally, `{:07.2f}` will reserver 7 characters to hold a float number, padding it with zeros.

In [13]:
# The second mode is more specific
print("Johnny ate {:5d} cakes while {:<10} ate {:07.2f}.".format(5, "Cthulhu", 151.2))

# A more realistic example
print("Epoch {:5d} complete, cost {:10.6f}, accuracy {:5.2f}%".format(11, 0.0231532, 97.231))

Johnny ate     5 cakes while Cthulhu    ate 0151.20.
Epoch    11 complete, cost   0.023153, accuracy 97.23%


## Using lists

Lists in Python are really just flexible data containers. You can store any kind of data in them, and you can manipulate them any way you want. Lists are also useful to iterate through for-loops.

You can find the documentation on lists [here](https://docs.python.org/3/tutorial/datastructures.html).

In [14]:
# A list is declared with square brackets
mylist = ["a", "b", "c", 1, 2, 3]
print(mylist)

# They can also be created by splitting strings
my_delimited_list = "a, b, c, 1, 2, 3".split(", ")
print(my_delimited_list)

['a', 'b', 'c', 1, 2, 3]
['a', 'b', 'c', '1', '2', '3']


You can "slice" lists various ways. All of these are pretty useful. You can find the official Python slicing guide [here](https://docs.python.org/3/tutorial/introduction.html#lists).

**Important**: Python starts counting list indices at 0.

**Exception**: If Python is counting backwards, it starts at -1.

In [15]:
print("All indices: {0}".format(mylist[:]))
print("Individual index: {0}".format(mylist[1]))
print("Range of indices: {0}".format(mylist[0:2]))
print("Last index: {0}".format(mylist[-1]))

All indices: ['a', 'b', 'c', 1, 2, 3]
Individual index: b
Range of indices: ['a', 'b']
Last index: 3


You can also perform some operations on lists. Here are some useful ones.

In [16]:
mylist = ["a", "b", "c", 1, 2, 3]
my_delimited_list = "a, b, c, 1, 2, 3".split(", ")

# The append method concatenates something to the end of a list (does not return the appended list)
mylist.append("a")
print(mylist)
mylist.append((1, 2, 3))
print(mylist)
mylist.append(my_delimited_list)
print(mylist)

# You can also use the addition operator
print(my_delimited_list + ["apple", "orange", "banana"])

['a', 'b', 'c', 1, 2, 3, 'a']
['a', 'b', 'c', 1, 2, 3, 'a', (1, 2, 3)]
['a', 'b', 'c', 1, 2, 3, 'a', (1, 2, 3), ['a', 'b', 'c', '1', '2', '3']]
['a', 'b', 'c', '1', '2', '3', 'apple', 'orange', 'banana']


A very nice thing about lists is that they're iterable. This means that they can be fed to a for-loop and processed one-by-one.

In [17]:
mylist = ["1", "2", "3", 1, 2, 3]

for i in mylist:
    print(i)

1
2
3
1
2
3


You can call `i` whatever you want, like Ricky. `Ricky` will become a temporary variable representing the current element of `mylist`, and you can do whatever you want with it.

In [18]:
for Ricky in mylist:
    if isinstance(Ricky, str):
        print("This is a string. {0}".format(Ricky))
    else:
        print("This isn't a string. {0}".format(Ricky))

This is a string. 1
This is a string. 2
This is a string. 3
This isn't a string. 1
This isn't a string. 2
This isn't a string. 3


Because `Ricky` is a temporary variable, changing it doesn't affect the original list.

In [19]:
newlist = [1, 2, 3, 4, 5]

print("Before {0}".format(newlist))

for i in newlist:
    i += 1 # the += is shorthand for making an addition in-place

print("After {0}".format(newlist))

Before [1, 2, 3, 4, 5]
After [1, 2, 3, 4, 5]


What you can do instead is use the list indexes in the for loop. This will change the original list.

The [`range()`](https://docs.python.org/3/library/functions.html#func-range) function comes standard in Python, and it's a way of quickly generating sequences of numbers.

In [20]:
newlist = [1, 2, 3, 4, 5]

print("Before {0}".format(newlist))

for i in range(len(newlist)):
    newlist[i] += 1 # the += is shorthand for making an addition in-place

print("After {0}".format(newlist))

Before [1, 2, 3, 4, 5]
After [2, 3, 4, 5, 6]


A more advanced technique is called "list comprehension", which is like Python's [`map()`](https://docs.python.org/3/library/functions.html#map) or [`apply()`](http://stat.ethz.ch/R-manual/R-devel/library/base/html/apply.html) function in other languages. 

In [21]:
# The list comprehension will apply a function on all the elements in mylist
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_other_list = [str(x) for x in mylist]
print(my_other_list)

# The list comprehension can also take conditions
my_shorter_list = [str(x) for x in mylist if x > 5]
print(my_shorter_list)

['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
['6', '7', '8', '9', '10']


A list comprehension can be useful in some situations, but a for-loop can do the same in a few more lines.

## Dictionaries

Dictionaries can be pretty useful, especially if you're working data but don't want a very complicated retrieval mechanism.

If you've used JSON before, you'll quickly get the idea behind Python dictionaries. Dictionaries are also known as hashtables and associative arrays.

You use dictionaries to retrieve a value using a key, like so

In [22]:
christmas_presents = {"Billy" : "waffle-maker", "Betty" : "belt-sander", "Bonny" : "clown costume"}

print("Billy wants a {0}.".format(christmas_presents["Billy"]))

Billy wants a waffle-maker.


The difference with lists is that dictionaries don't have an order: they just store values someplace in memory and retrieve them when given the right key.

So adding a key/value pair to a dictionary is simply a matter of assigning a value to a new key.

In [23]:
christmas_presents["Bart"] = "tarantula"

print("Bart wants a {0}.".format(christmas_presents["Bart"]))

Bart wants a tarantula.


When using dictionaries, you will get an error if you attempt to retrieve a key that does not exist. You can avoid this with the [`.get()`](https://stackoverflow.com/questions/11041405/why-dict-getkey-instead-of-dictkey) function, which can return a default value.

In [24]:
print("Bort wants a {0}.".format(christmas_presents.get("Bort", "pair of socks")))

Bort wants a pair of socks.


Finally, if you want to see what a dictionary contains, you can use the `.keys()` and `.values()` methods.

In [25]:
print(christmas_presents.keys())
print(christmas_presents.values())

dict_keys(['Billy', 'Betty', 'Bonny', 'Bart'])
dict_values(['waffle-maker', 'belt-sander', 'clown costume', 'tarantula'])


If you want to iterate over the keys in a dictionary, you can simply do it as `for key in dict:`. Python will feed the keys into the `key` variable

In [26]:
for key in christmas_presents:
    print("{0} wants a {1}.".format(key, christmas_presents[key]))

Billy wants a waffle-maker.
Betty wants a belt-sander.
Bonny wants a clown costume.
Bart wants a tarantula.


## Opening files

Finally, here is how [you open and read files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). I'll only give you the simplest example since it's usually all you ever need to use.

The [`with`](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement) statement is a way of closing the file connection automatically as soon as you're finished with it. The `with` statement is use a lot in TensorFlow.

In [27]:
with open("Principio.txt", 'r') as f:
    for line in f.readlines():
        print(line)

# The extra newlines below come the ends of the lines in the file

Urbem Romam a principio reges habuere; libertatem et consulatum L. Brutus instituit.

Dictaturae ad tempus sumebantur; neque decemviralis potestas ultra biennium, neque tribunorum militum consulare ius diu valuit.

Non Cinnae, non Sullae longa dominatio; et Pompei Crassique potentia cito in Caesarem, Lepidi atque Antonii arma in Augustum cessere, qui cuncta discordiis civilibus fessa nomine principis sub imperium accepit.


But of course, the best solutions use a list comprehension. ;-)

In [28]:
[print(line) for line in open("Principio.txt", 'r').read().splitlines()] ; # The semicolon will keep Jupyter quiet

Urbem Romam a principio reges habuere; libertatem et consulatum L. Brutus instituit.
Dictaturae ad tempus sumebantur; neque decemviralis potestas ultra biennium, neque tribunorum militum consulare ius diu valuit.
Non Cinnae, non Sullae longa dominatio; et Pompei Crassique potentia cito in Caesarem, Lepidi atque Antonii arma in Augustum cessere, qui cuncta discordiis civilibus fessa nomine principis sub imperium accepit.


Let's try with a csv. This example will use things we covered elsewhere in the guide.

Here is what the code below does
1. The `Apples.txt` file is opened temporarily under the name `f`
2. The first line of the csv file is split by `,` and saved as a list
3. A new dictionary `dict()` object is created to hold the incoming data
4. For each remaining line, the line is split by `,` and the data fed to `bunch[apple]` as a nested dictionary.

In [29]:
with open("Apples.txt", 'r') as f:
    # Get header
    header = f.readline().replace("\n", "").split(",")
    # Instantiate a new dictionary
    bunch = dict()
    for line in f.readlines():
        dataline = line.replace("\n", "").split(",")
        bunch[dataline[0]] = {header[1] : dataline[1], header[2] : int(dataline[2]), header[3] : dataline[3]}

We now have a tree-style dataset. Whenever you want to access information about a particular apple, you just have to supply the right sequence of keys to get to the data.

In [30]:
print(bunch["Alfa"])

print(bunch["Alfa"]["Freshness"])

{'Variety': 'Spartan', 'Weight': 161, 'Freshness': 'Fresh'}
Fresh


We can check on our apples' freshness by using a list comprehension. In Jupyter you can use a `;` at the end of a line to supress output. Here is only want my `print()` output to appear.

In [31]:
[print("{:<10} is {:>10}".format(apple, bunch[apple]["Freshness"])) for apple in bunch.keys()] ;

Alfa       is      Fresh
Bravo      is      Fresh
Charlie    is      Fresh
Delta      is    Spoiled
Echo       is    Spoiled
Foxtrot    is      Fresh
Golf       is    Spoiled
Hotel      is      Fresh
India      is    Spoiled
Juliett    is      Fresh
Kilo       is      Fresh
Lima       is      Fresh
Mike       is    Spoiled
November   is      Fresh
Oscar      is      Fresh
Papa       is      Fresh
Quebec     is      Fresh
Romeo      is      Fresh
Sierra     is      Fresh
Tango      is      Fresh
Uniform    is      Fresh
Victor     is      Fresh
Whiskey    is      Fresh
X-ray      is    Spoiled
Yankee     is      Fresh
Zulu       is      Fresh


Thanks for reading this python intro!