# Introduction to Python

This introduction to Python will focus on the first of two main topics: basic Python syntax.
The second will cover tools for data manipulation, analysis, and plotting.

The Python syntax we will cover is:
- loading libraries and modules,
- manipulating variables,
- printing information,
- opening and reading files,
- functions,
- for loops, and
- control statements (if statements).

IPython notebooks are organized by "cells." Each cell can have its own code and can be run independently and in any order (although they are usually run top to bottom in a notebook.) To run a cell and move to the next cell press ```Shift+Enter```. To run a cell and stay on that cell press ```Control+Enter```.

Questions to be discussed in groups are highlighted in <font color='green'>green</font>. If you don't understand a function that is used, try googling something like "python function-name".

## IPython Notebook
We'll be using this notebook format for the tutorial. The notebook is organized in a series of 'cells'. You can run each cell individually by selecting it with your mouse or arrow keys and hitting Shift+Enter.

This notebook will lead you through some basic python syntax and then you'll load some data at the end.

## Loading CSVs

Many times, data will come to us in a format called "CSV" or "comma separated values." Generally, these files will contain a "header" row that contains the column names and then a number of rows containing the data entries.

### Loading Libraries
We won't need many tools for this. We'll just use the 'csv' library that comes with Python. You can read about the functionality in the library and see some example code here:
https://docs.python.org/2/library/csv.html.

In [None]:
# Import the csv library. Also, this is how you make a comment!
# Run me with `Shift+Enter`.
import csv

You can find out about a library by adding a question mark at the end and running the cell (Shift+Enter). You can close the window that pops up.

In [None]:
csv?

### Variable Types and Assignment
Almost everything you will do when programming is some kind of 'data manipulation.' To manipulate data, we need to assign it to some variable.

For instance, we might need the name of a csv file so we can later open and read it.

In [None]:
file_name = 'student_data.csv'
print(file_name)
print(type(file_name))

Here we can see that the variable called 'file_name' has been set to the string 'student_data.csv'. The pattern for assignment is the variable name on the left is given the value of the thing on the right.

In addition to string (collection of letters) data, we can also have numerical data.

In [None]:
n_students = 10
print(n_students)
print(type(n_students))

In [None]:
passing_grade = .6
print(passing_grade)
print(type(passing_grade))

The 'int' type stands for integers and 'float' for real numbers (with decimals). One important different between ints and floats is how division works. Try dividing different combinations of ints and floats.
<font color='green'>
1. What is the result of a float divided by a float?<br>
2. An int by an int?<br>
2. An int by an float?
</font>

In [None]:
# Do division here:


## Collections of Things
We often want to have collections of things. There are three common collections: lists, tuples, dictionaries.
### Lists
Python lists are similar to the common notion of a list. They are a ordered set of things that you can add to or take away from. The items in a list are often similar to each other.

In [None]:
vegetables = ['tomatoes', 'broccoli', 'carrots']
print(vegetables)

In [None]:
vegetables = vegetables + ['cabbage']
print(vegetables)
# What happens if you run this more than once?

In [None]:
# Shortcut syntax!
vegetables += ['kale']
print(vegetables)

### Tuples
Python tuples are similar to lists, but they are generally used when you have a fixed/known number of elements. Mathematical coordinates are a good example (x, y, z). They are also often used when the items are more heterogeneous (name, age, hobbies).

In [None]:
location = (.5, 4., 3.)
print(location)

In [None]:
person = ('Pat', 19, ['swimming', 'painting'])
print(person)

An important technical different between lists and tuples is that the elements of a list can change, but the elements of a tuple cannot change.

In [None]:
print(vegetables)
vegetables[0] = 'chicken'
print(vegetables)

In [None]:
print(person)
person[0] = 'chicken'

### Dictionaries
Dictionaries, like a language dictionary, are compromised of a number of pairs of things (like words and definitions). They are called keys (words) and values (definitions). They are useful when you want to store and recall things by name.

In [None]:
students = {1: 'Paul',
            2: 'Shelly',
            3: 'Pat'}
print(students.keys())
print(students.values())
print(students[1])

In [None]:
# Can add more later
students[4] = 'Victoria'
print(students)

<font color='green'>
Create a few lists, tuples and dictionaries of your own. Try manipulating them a bit.
</font>

In [None]:
# Try here:


## Loading Data from Files

Before we load it into Python, have a look at the "student_data.csv" file in a text editor. It's filled with fake student grade data.

<font color='green'>
1. How is the data formatted?<br>
2. How should the data be "read in"?
</font>

After you've answered these questions, run the following cell and look at what is printed. You can also run the following cell to see what the variable "rows" is.

You can click on the white space to the left of the output to minimize or maximize it.

In [None]:
with open('student_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    rows = []
    for row in reader:
        print row
        rows.append(row)

In [None]:
rows

## For Loops

A common manipulation in programming is to loop through a bunch of data or lists and do something specific with each item. Here, we have a bunch of students' grade information. Let's say we want to calculate each student's final grade based on the weighting:
- MT1: 25%
- MT2: 25%
- HW: 20%
- Final: 30%

and print out their final score. Since we want to calculate the weighting a bunch of times, we'll write a function to do that for us.

In [None]:
# Function to take grade data and calculate final score
def final_grade(grade_data):
    # The csv reader will load the data as a string and so we need to turn it into numbers
    # Look up "list comprehension" in python to understand the next line
    float_data = [float(data) for data in grade_data]
    return .25*float_data[0]+.25*float_data[1]+.2*float_data[2]+.3*float_data[3]
    
# Look up "slices" in python to understand what [1:] syntax is doing.
for student in rows[1:]:
    print('Student: '+student[0]+', Grade: '+str(final_grade(student[1:])))

<font color='green'>
1. In addition to the final grade, can you add in the correctly weighted grade going into the final? Feel free to copy the 'final_grade' function and make a new version.<br>
2. What if you only want to list 'n_students' (defined above) grade information starting at 'begin_student'? Try looking up the 'enumerate' function or 'range' function in python to get some ideas.
</font>

In [None]:
begin_student = 5
# Use this space to try answering the above questions:


## If Statements

Another commom manipulation is to have different branches in code that behave differently depending on the data. This is commonly done with an "if statement." An examples of this would be to print "S" if the student gets a 60% or above and "U" if they do not. 

In [None]:
for student in rows[1:]:
    grade = final_grade(student[1:])
    if grade >= passing_grade:
        letter = 'S'
    else:
        letter = 'U'
    print('Student: '+student[0]+', Grade: '+str(grade)+', '+letter)

<font color='green'>
1. Add in another category (S+) for students who get above 80%.<br>
2. Make a variable that keeps track of how many students are in each category and print the result at the end.
</font>