# Getting Started with Python

Python is a popular high-level programming language. It's a simple language, designed with an emphsis on code readability. If you already have programming experience, Python is easy to learn.

# Installing Python

First if you don't have Python installed on your system, you'll need to download and install it. We recommend you use [Anaconda](https://store.continuum.io/cshop/anaconda/) which contains not only Python but also lots of useful Python libraries.

Instruction to install depend on your operating system:

[Install Anaconda on OS X](http://docs.continuum.io/anaconda/install#os-x-install)

[Install Anaconda on Linux](http://docs.continuum.io/anaconda/install#linux-install)

[Install Anaconda on Windows](http://docs.continuum.io/anaconda/install#windows-install)

<b>Important:</b> You will need to use a 2.7 version of Python. GraphLab Create does not currently support Python 3. 

Once you have have Anaconda, start an IPython session. IPython is a powerful interactive shell for executing Python. You can start an IPython session by running "ipython" from the command line.

This tutorial is written as IPython notebooks. This allows you to download and run the tutorials on your own machine, either as a notebook (.ipynb) or Python file (.py).

# Python Basics

Now it time to execute our first Python command. We can use <b>print</b> keyword to print a string.

In [1]:
print 'Hello World!'

Hello World!


In Python single line comments are started with a <b>#</b>.

In [2]:
# this is a comment!

Python doesn't actually have built in support of multiline comments. However this can be done by just creating a multine string and not setting it equal to anything. Multiline string are started and ended with three single quotes or three double quotes. (Single and double quotes are equivant in Python.)

In [3]:
'''
This is technically just
 a multiline string but
 ususually it's used as a
 multiline comment.
'''

"\nThis is technically just\n a multiline string but\n ususually it's used as a\n multiline comment.\n"

Python has several built in data types. The simple built in types are called: bool, str, int, and float. These are just shorthand names for: boolean, string, integer, and floating point number.

Below are examples of creating each type.

In [4]:
b = True                              # bool
s = 'This is a string'                # str
i = 4                                 # int
f = 4.1                               # float

Python has other built in types that are compound types (i.e. types composed of other types). The most common are: list, dict and tuple. 

dict is just short for dictionary.

Below are examples of creating these types, and accessing their elements.

In [5]:
d = {'foo': 1, 'bar': 2}              # dict
l = [3,2,1]                           # list
t = (1,2,3)                           # tuple

print d['foo']
print l[2]
print t[1]

1
1
2


Tuples are like lists except they are immutable. Strings are also immutable.

Python also has a special type called <b>None</b> which can be set to any data type.

In [6]:
b = None
s = None

You can print the value of variable inside of strings by using the <b>%</b> operator and placing <b>%s</b> inside of the string. For example:

In [7]:
print "Our float value is %s. Our int value is %s." % (f, i)

Our float value is 4.1. Our int value is 4.


You create a functions by using the <b>def</b> keyword. Here is an example of a function called <i>add2</i> that takes a value called <i>x</i> return the value of two added to it.

In [8]:
def add2(x):
    return x + 2

add2(10)

12

Like most programming languages, Python has <b>if</b> and <b>else</b> statements. The <b>elif</b> keyword is used for else-if statements. Unlike a lot of programming language, white space is meaningfull; the body of if-statements must be indented from its test-expression. Python doesn't use braces.

You can use the <b>and</b> and <b>or</b> keywords to string together multipart tests. 

In [9]:
if i == 1 and f > 4:
    print "The value of i is 1 and f is greater than 4."
elif i > 4 or f > 4:
    print "i is greater than 4 or f is greateer than 4."
else:
    print "Both i and f are less or equal to 4."

i is greater than 4 or f is greateer than 4.


Python has two types of loops, <b>for</b> loops and <b>while</b> loops.

In a for-loops there is one iteration for each element in the variable. Note that <i>i</i> is the current element, not the index value.

In [10]:
for i in l:
    print i

3
2
1


While-loops are executed as long as the given expression is True.

In [11]:
while i < 10:
    print i
    i += 1

1
2
3
4
5
6
7
8
9


Notice the use of "+=" to increment. Unlike a lot of programming languages, Python does not have a increment or decrement operator. 

# GraphLab Create Basics

First, download and install GraphLab-Create by following these directions: https://dato.com/download/

In order to use another library, you first need to <b>import</b> that library. Like so:

In [1]:
import graphlab
graphlab.canvas.set_target('ipynb')  

Using GraphLab Create, we can easily read in comma seperated file.

In [None]:
sf = graphlab.SFrame.read_csv('http://s3.amazonaws.com/dato-datasets/coursera/toy_datasets/people-example.csv')

# SFrame basics

In [37]:
sf # you can view the contents

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


In [None]:
# you can explore summaries of the data
sf['age'].show(view='Categorical')

Suppose we just wanted to look a single column.

In [39]:
sf['Country']

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

You can add columns. 

In [40]:
# add a new column called "Full Name":
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']
sf

First Name,Last Name,Country,age,Full Name
Bob,Smith,United States,24,Bob Smith
Alice,Williams,Canada,23,Alice Williams
Malcolm,Jone,England,22,Malcolm Jone
Felix,Brown,USA,23,Felix Brown
Alex,Cooper,Poland,23,Alex Cooper
Tod,Campbell,United States,22,Tod Campbell
Derek,Ward,Switzerland,25,Derek Ward


In [41]:
# You can filter finding all rows that match a logical condition
sf[sf['Full Name'] == 'Felix Brown']

First Name,Last Name,Country,age,Full Name
Felix,Brown,USA,23,Felix Brown


In [42]:
# You can do math
print sf['age']
print sf['age'].mean()
print sf['age'].std()
print sf['age']*2
print sf['age']+2*sf['age']

[24, 23, 22, 23, 23, 22, 25]
23.1428571429
0.989743318611
[48, 46, 44, 46, 46, 44, 50]
[72, 69, 66, 69, 69, 66, 75]


In [43]:
sf['Country']

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

On the countries, notice that we have two country values that mean the same thing: "United States" and "USA".

To fix this we can apply a function to transform the 'USA' to 'United States'

In [44]:
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country

In [45]:
sf['Country'].apply(transform_country)

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']

We could also have used a <b>lambda</b> function in the apply. Lambdas are just inline, unamed functions. Lambdas also don't have explict return statements. What the expression evaluates to will be automatically returned

In [None]:
sf['Country'] = sf['Country'].apply(lambda cur_value: 'United States' if cur_value == 'USA' else cur_value)
sf.show()

For more about GraphLab Create see our [Getting Started with GraphLab Create Notebook](https://dato.com/learn/gallery/notebooks/getting_started_with_graphlab_create.html) or our [Introduction to SFrame Notebook](https://dato.com/learn/gallery/notebooks/introduction_to_sframes.html).