<hr><hr>

# Data Science Summer School - Split '16 </center>

## Day 1 - Python for data analysis fundamentals 
### *Numpy, Pandas, Matplotlib*

(c) 2016 Damir Pintar

*version: 0.1* 


`kernel: Python 2.7`

<hr><hr>

# Part 0 - Basic instructions / Python refresher

### Basic Instructions for using Jupyter Notebooks

This is a Jupyer Notebook. It represents an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media.

Below you can see a portion of executable programming code, denoted by different visual representation than "regular" text. You can run this code by selecting the cell and pressing SHIFT + ENTER. Do it now. 

In [None]:
print "Hello!"

As you can see, the code was run "behind" the scenes and its results are shown in the notebook right under the code.

You can place more lines of code in the code cell. Try running the cell below. Also, this time use ALT+ENTER instead of SHIFT.

In [None]:
a = 5
print a

As you can see, ALT+ENTER runs the code but creates an additional programming cell under the result. This is useful if you want to perform some additional experimenation with the code in a certain part of the document. If you don't want this additonal cell, you can delete it by pressing the "scissors" symbol in the toolbar above. Also, by choosing `Insert -> Insert Cell Above / Insert Cell Below` you can create additional cells whenever you want in the document. You can even put "markdown" text instead of code in them if you want, by typing text and then changing the cell type above from `Code` to `Markdown`, but for the purposes of this Notebook we will presume that you will be filling out exclusevily the programming code portions of the Notebook.

IMPORTANT: The code portions you see above aren't standalone, and nstructions you run are "remembered" in later code snippets. You can think of this notebook as a one big interactive session in which all the code snippets are a part ot a big script, even though they visually look disconnected. We can prove this by trying to print the variable `a` again.

In [None]:
print a

How about printing x?

In [None]:
print x

We are getting an error - variable `x` is not defined. Try to define it in an earlier code cell, run that cell and then try to run the above one again. Notice that the error is not present anymore.

If at any time you want to clear all the results from a cell (or reset cells in the entire Notebook), choose `Cell -> Current Output -> Clear` or `Cell -> All Outputs -> Clear`. Notice that this way you are only clearing the *printed results* but the session is still running behind the scenes, and all the variables, functions and imported packages are still present. If you want to clear the session, choose `Kernel -> Restart & Clear Output`. Also, remember that at any tima you can save the current state of the Notebook by choosing `File -> Save and Checkpoint` or create a backup copy by choosing `File -> Make a Copy..`. You can also try to convert this Notebook into additional formats, such as `html` or `pdf`, by choosing `Download as..`. Note that certain formats might require that you install some additional packages first.

Feel free to explore and experiment with additional options you are provided through the toolbar and the menu options. It might be wise though to first create a backup copy so all potentially unwanted changes you make aren't permanent.

This is it for the appetizer - now on to the main course! :)

<hr>

# Using Python for data analysis 

## Quick Python refresher

Python is a general-purpose, object-oriented, high-level programming language which encourages producing clean, concise, highly readable programming code. Python supports various programming paradigms - such as object-oriented, imperative, functional or procedural programming. By its nature, Python isn an interpreted language which can be used interactively (running each programming instruction separately in an interactive programming session) or by writing and running programming scripts (which may or may note be compiled in entirety before running). Python supports organizing code into "packages" or "module" which encourage modular programming and code re-use. In recent years the addition of polished, high-quality packages oriented towards data analysis have resulted in establishing the Python language as one of the leading competitors for an open, flexible data analysis platform suited for various statistical analysis, machine learning and data mining tasks.

Some characteristics of Python language (in short):
- gentle learning curve
- implicit data typing
- strict rules of code formatting (enforced block tabulation)
- "everything is an object"

Let's do a lightning refresher of Python basics.

In [None]:
# primitive data types
a = 5         # int
b = 3.2       # float
c = 10e2      # float, scientific notation
d = "Pero"    # str
e = True      # bool
f = None      # NoneType, variable holds "no specific value"

# complex data types
x = [1, 2, True, "Ana", None, [1,2,3,4]]      # list - ordered set of elements of mixed types
d = {'jedan':1, 'dva':"dva", 'tri':[3,3,3]}   # dictionary - stores mixed-type elements on key-value principle


We access the values from stored variables in the following ways:

In [None]:
b  # same as `print b`, we are using the so called `autoprint` feature

In [None]:
x   # x is a list

In [None]:
x[2] # accessing the third element of the list (remember, first index is 0!)

In [None]:
d["dva"]  # accesing a dictionary value by key

#### LIst management

Various ways for list creation:

In [None]:
x = list(range(10))                  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = list(range(1, 10, 2))            # [1, 3, 5, 7, 9]
w = 5 * [0]                          # [0, 0, 0, 0, 0]
v = [0.1 * p for p in range(5)]      # [0, 0.1, 0.2, 0.3, 0.4]    - this is so-called "list comprehension"
z = list()                           # empty list


# try printing out the above variables to see what they contain


Cutting, merging and modifying lists:

In [None]:
# addition operator (+) is used for list merging (concatenating)
x = [1,2,3] + [4,5,6]
x 

In [None]:
#    0  1  2  3  4  5  6  7  8  9              list indexes
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x[1:3]    # retreive indexes from 1 (inclusive) to 3 (exclusive)

In [None]:
x[:-5]    # no index means "all"; negative index means "count from the end,  -1 being the last element"

In [None]:
x[3:5] = [10, 20]       # modifying list elements
x

In [None]:
del(x[:2])     # deleting list elements
x

In [None]:
y = x[:]      # list copying; if we simply do x = y we will just get two references to the same object
y

This is it for this quick Python refresher. If you want, you can try the additional exercises provided below.

<hr> <hr> <hr>
## <font color = "blue">Exercises


In [None]:
# Python can sort lists. You can use the function sorted())
a = [12, 5, 20, 7, 99, 4, 8, 69]
print("Sorted list:" + str(sorted(a)))     # (str changes integers to strings)

# but this doesn't change the list itself, instead it makes a sorted copy.

# If you want to modify the list so it becomes soted, use the method sort() of the list itself. 
# Call the method sort() and print out the results.







In [None]:
# Why not reverse the list with the method reverse()? Print out the result.




In [None]:
# Now remove the first and last element from the list using only one instruction. Print out the result.




In [None]:
# Check out the length of the list by using the function len()




In [None]:
# Is the number 99 still in the list? Check this without printing out the list itself. Use the operator 'in'





In [None]:
# Print out the smallest and largest number in this list by using functions min() and max()





In [None]:
# Finally, using conditional list comprehension, change all the even numbers from this list to 0
# you can use the following template as a guide

# a = [ _____ if _____ else ____ for i in a  ]





<hr><hr><hr>

## Additional Resources

[1] *The Python tutorial*, official Python documentation, https://docs.python.org/3/tutorial/ , last accessed 2016/08/26

[2] *Introducing IPython Notebook*, OpenTechSchool tutorial, http://opentechschool.github.io/python-data-intro/core/notebook.html , last accessed 2016/08/27

[3] *Problem Solving with Algorithms and Data Structures using Python*, Brad Miller and David Ranum, Luther College, http://interactivepython.org/runestone/static/pythonds/index.html , last accessed 2016/08/27