# Intro to Python


There are so many reference books on Python. You can use one such as: <br>
Python Crash Course: A Hands-On, Project-Based Introduction to Programming <br>
by Eric Matthes

## What is Python?

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.  It supports modules and packages, encouraging modularity in design, and code reuse. There are two main variants of Python in use currently - 2.x (which an older lineage that is still quite popular), and 3.x (which is the new and improved version of Python).  We'll be focusing on Python 3.7 in this course.

*Python is a language to automate everyday tasks.*

It also comes with an extensive set of libraries. We'll be using some of these libraries. 
                                                                        

## Working in Jupyter Notebook

Your Jupyter Notebook consists of a number of cells. When you create a new notebook,  it will have a single empty cell.

Cells have types - the most important of which for our purposes are `Markdown` (which is a simple formatting syntax -- these descriptive cells are written in Markdown), or `Code` (which is for Python 3.7 code).

You can change the type of a cell by using the dropdown control above.

You can enter code directly into a cell marked as type `Code` and run it by clicking on the Run button, by selecting `Cell->Run Cells` from the menu, or by typing `SHIFT` + `Return`.

If the cell outputs something, this will be visible right underneath it.  

Try running the following cell as follows: click anywhere on the cell, then click on the Run button in the toolbar. You'll see the output underneath the cell.

In [None]:
print("Hello World!")

You can also run the cells that only have text in them (like this one). Click anywhere on this cell and click on Run in the toolbar. You'll see that nothing will happen. Instead, the cell selection will move down. 

You'll also see `In [x]` to the left of the cell. The number indicates the order in which the interpreter has run the code cells in the notebook. If the interpreter is busy running a cell,
you will see `In [*]` there instead, and the number will replace the `*` character when the cell has completed running.

Remember, the shorcut for the Run button is `SHIFT` + `Return`. You can do that combination on this cell to run it.

A full list of hotkeys is available in the menu `Help->Keyboard Shortcuts`. This might help you get an overview of 
how you can interact with the notebook. 

## New Cells

You can add new cells via the `Insert` menu above. You can select `Insert Cell Above` or `Insert Cell Below`.  

You can also delete cells by selecting `Cut Cells` or `Delete Cells` from the `Edit` menu, or by selecting the cell and using the Scissors icon above.

Try inserting some new cells now below this cell. 

## Python Basics

Python is a dynamically typed language. You don't use an explicit type specifier for declaring variables. 
You don't need to use separators to mark the end of code lines.

In Python, indentation is really important. It replaces brackets to delimit blocks of code. You can use
either tabs or spaces to indent code, but you must be consistent or Python will complain! 

You cannot mix tabs and spaces for indentation in Python 3.  

The style guide for Python (https://www.python.org/dev/peps/pep-0008) suggests using 4 spaces per 
indentation level.

Comments in Python use the hash character, `#`.

## Basic principle in coding (and in Python): 
    
1. Create variable(s), 
2. Assign value(s) to the variable(s)
3. Derive new variables or manipulate existing ones using the "libraries"

### Simple Example: value assignment to variables

Let's calculate the area of a square using the `width` and `height` variables. First, let's assign values to these variables:

In [None]:
# Here, "width" and "height" are variables we generate on the fly.
# Then we assign them values using the equal sign.

width = 3
height = 4

In [None]:
# let's check the value assignment using the print() function:

print(width)

We don't always have to type "print" to display values. We can also use the variable name and run it to show the value assigned to it. Try running the below code to see the value of `height`

In [None]:
height

### Simple Example: calculations

In [None]:
# Let's calculate the area 
area = width * height

# This displays the calculation
print(area)

### Built-in Python Functions

There are a lot of built-in function in Python. We can use them to perform simple tasks.

In [None]:
# Let's use the "max()" function to find the maximum of a set of values:

max(3,4,5)

In [None]:
# Let's use the "round()" function to round a number
# In this example, we are rounding a number to two decimal places

round(3.456435, 2)

### Custom function
You could also create your own "function" and *pass* values to this function. This is better than hardcoding the width and height values into the code

In [None]:
# This creates a function called "CalculateArea"
# The function takes two values and returns the multiplication of these values

def CalculateArea(height, width):
    return height * width

In [None]:
#This passes the value 3 to width, and the value 4 to height

area = CalculateArea(3, 4)

# This displays the value of area
print (area)

Once you define a function, it stays there as long as you close the application. (It disappears after closing the notebook). Now, try changing the values 3 and 4 with two different values to calculate another area.


In [None]:
# Change the values X and Y with two integer values and run the cell

area = CalculateArea(4, 5)

print(area)

In [None]:
#You can also send the parameters using keywords:

area = CalculateArea(height=10, width=4)

print(area)

In [None]:
#You can have default value for some parameters

def Volume (height, width, depth=10):
    return height * width * depth

Volume(2,3)

In [None]:
# OR orverride the default value

Volume(2,3,5)

### Basic Data Types

- float (numbers with decimals, e.g., 3.14)
- int (integer, e.g., 3)
- str (string or text, needs double or single quotes)
 - e.g., “Hello World!”
 - e.g., ‘Hello World!’
- bool (Boolean or True/False)

### String example:

String literals in Python can be enclosed in either matched single quotes (') or matched double quotes ("). You can use + to perform concatenation on strings.

In [None]:
# Create two strings
string1 = 'Hello'
string2 = "There"


In [None]:
# This code concatenates both strings using a space character in between
string3 = string1 + " " + string2

print(string3)

In [None]:
# You can easily change between upper/lowercase by calling the string "methods" upper() and lower()

print(string1.upper() + " " + string2.lower())

In [None]:
#You can also insert new line while printing

print(string1.upper() + "\n" + string2.lower())

In [None]:
#How do you use the quotation in a string: use the escape character \

print('This is Mary\'s book')

See https://docs.python.org/3.6/library/stdtypes.html#string-methods for more string methods!

### Number data type
You already saw the number data type. You can store numbers in variables.

In [None]:
# Create an age variable:
age = 34

print(age)

In [None]:
#Float is python's decimal numbers:

temp = 75.55

print(temp)

In [None]:
#Can't concatenate numbers with strings unless you change the number to string

print('My age is ' + str(age))

In [None]:
#Or, you can use formatting instead of using str

print('My age is {}'.format(age))

In [None]:
#Or, you can use formatting instead of using str

print('My age is {}. And the temp is {}'.format(age, temp))

For more information on numbers: https://docs.python.org/3.6/library/stdtypes.html#numeric-types-int-float-complex

## More Advanced Data Structures

In Python, we usually have a lot of variables. For example, imagine having different "width" values. We would have to create multiple variables for each of them:

width1 = 3

width2 = 4

width3 = 8

(etc.)

It becomes difficult to create a new variable each time there is a new value. Instead, we can take advantage of a single variable that can store "multiple values". 

### Python Lists

Python supports a list type to store multiple values using a single variable name. 


In [None]:
#Let's create a list/array of people:

people = [ 'Mary', 'Kevin', 'Venki', 'Cyril', 'Jane', 'Pierre', 'Xin', 'Andre']

Python uses index numbers to keep track of the list entries.

The index numbers start from zero. For example, for the above list, **'Mary' is the 0th entry.**

Therefore, **'Kevin' is the 1st entry.** We need to use these index values to retrieve these values.

In [None]:
people[1]

We can also retrieve a slice of the list.

In this case, we use the following syntax: [start index : end index]

**Be careful: starting index is inclusive, ending index is exclusive**

If you omit one of the index values, it means all preceding/succeeding items

In [None]:
people[0:2]

In [None]:
#Let's add one more person to the list/array

people.append('Susan')

people

In [None]:
#Let's remove from the list/array:

people.remove('Kevin')

people

In [None]:
#Sometimes you need the length of a list (so you can go iterate over the items)

len(people)

In [None]:
#Retreiving the last item
people[-1]

There are many other things you can do with lists, but we will not be concerned with them in this class.

## Two-dimensional lists

We can also create two dimensional lists. You can think of these lists as tables with rows and columns.

In [None]:
# Let's create a two dimensional list

ages = [['Mary', 34],
        ['Kevin', 43],
        ['Venki', 51],
        ['Cyril', 22],
        ['Jane', 44],
        ['Pierre', 38],
        ['Xin', 31],
        ['Andre', 25]]

Here, 'Mary, 34' is considered is row 0.

Therefore, row 1 is 'Kevin, 43'.

Similarly, the *name* column is considered col 0. So, the *age* column is col 1.

We need to use this indexing schema to retrieve values.

While retreiveing values, we need the following format: `list[row index][column index]`

In [None]:
# Let's retrieve Mary. This will be row 0, col 0:

print(ages[0][0])

In [None]:
#Let's retrieve the row for Jane:

print(ages[4])

## Dictionaries

Dictionaries are similar to arrays, but instead of indexing by number, you index with a 'key'. The variable the key responds to is called a 'value'. Like arrays, you can add, modify and remove items from dictionaries.


In [None]:
# Let's create the two dimensional list in a dictionary format. 
# Be careful, we don't have any square brackets. Instead, we have curly braces and colon.

ages_dict = {'Mary': 34,
        'Kevin': 43,
        'Venki': 51,
        'Cyril': 22,
        'Jane': 44,
        'Pierre': 38,
        'Xin': 31,
        'Andre': 25}

In [None]:
# We can retrieve a "value" by searching for the "key" value. However, we use square brackets this time:

ages_dict['Mary']

In [None]:
# Let's print all "keys:

ages_dict.keys()

In [None]:
# Let's print all "values"

ages_dict.values()

## Loops in Python

`For` loops in Python iterate across a list. (There are multiple ways to write `for` loops.) You can use the range method to create a list for numbers. The range is inclusive of the starting number, but exclusive of the terminating number. For example:


In [None]:
# the variable "i" is usually used to to "iterate" across a range:

for i in range(0,10):
    print(i)   

You can also create loops using `while`:

In [None]:
i = 0
while i < 10:
    print(i)
    i = i + 1 

# Be careful, here you need to increment i by 1 manually. 

In [None]:
#Iterate through lists/arrays
people = [ 'Mary', 'Kevin', 'Venki', 'Cyril', 'Jane', 'Pierre', 'Xin', 'Andre']

for person in people:
    print (person)

In [None]:
#Iterate through lists/arrays
people = [ 'Mary', 'Kevin', 'Venki', 'Cyril', 'Jane', 'Pierre', 'Xin', 'Andre']

for _ in people:
    print (_)

In [None]:
#Iterate through dictionary items
ages_dict = {'Mary': 34,
        'Kevin': 43,
        'Venki': 51,
        'Cyril': 22,
        'Jane': 44,
        'Pierre': 38,
        'Xin': 31,
        'Andre': 25}


for _ in ages_dict.keys():
    print (_)

In [None]:
for _ in ages_dict.values():
    print (_)

**Loops are very useful when you need to perform the same task over and over again.** We will not be using loops in this class.

## IF conditions in Python

If conditions can help you check for truth values. (They are very similar to the IF functions you use in Excel.) If the condition is true, you can take a certain action, if not, you can take another action. 

Conditional statements are ended with a colon before the block of code they guard. For example:

In [None]:
# Let's set the variable "weather" to sunny
weather = 'sunny'


In [None]:
# This code will display the first statement if condition is True. 
# Otherwise, it displays the second statement. 

if weather == 'sunny':
    print('It is sunny!')
else:
    print('It could be rainy!')

In [None]:
#Checking multiple conditions: 
temp = 80
weather = 'sunny'


if (weather == 'sunny') and (temp > 75):
    print('It is humid!')
else:
    print('It could be dry!')

In [None]:
#Working with lists
people = [ 'Mary', 'Kevin', 'Venki', 'Cyril', 'Jane', 'Pierre', 'Xin', 'Andre']

if 'Mary' in people:
    print ('Mary is in the list')
else:
    print ('Mary is NOT in the list')

## Methods
Python has "methods" which are functions specific to "objects". 

In fact, you have already used several methods for the list and dictionary objects.

Methods are usually appended at the end of objects. Here is the notation to use a method: `object.method()`

In [None]:
# Recall the ages dictionary array. Let's recreate it:

ages_dict = {'Mary': 34,
        'Kevin': 43,
        'Venki': 51,
        'Cyril': 22,
        'Jane': 44,
        'Pierre': 38,
        'Xin': 31,
        'Andre': 25}

In [None]:
# Let's try the keys() function of the dictionary objec

ages_dict.keys()

The `keys()` method is specific for a dictionary object (here, "ages_dict" is called a dictionary object). It won't work with any other data type.

In [None]:
# Try running this code.
# It will create an error, because a string object doesn't have a method called "keys()"

txt = 'Hello World!'

txt.keys()

In [None]:
# But a string object has a method called "upper()" to convert it to capitalize it:
# Similarly, the dictionary object doesn't have the "upper()" method.

txt.upper()

So, then the question is: how would you know which method to use for each object? The Internet and the Python documentation will be your friend!

## Classes

In [None]:
class car():
    
    #This method is mandatory to create. self is mandatory as the first attribute
    #color and model are aother "attributes"
    def __init__ (self, color, model):
        self.color = color
        self.model = model
        
    #Let's create a "method" so it can "start":
    def start(self):
        return print('You started the car')
    
    #Let's stop the engine:
    def stop(self):
        return print('You killed the engine')

In [None]:
#Let's create a car object:

mycar = car('red', 'SUV')

In [None]:
#Let's start the car:

mycar.start()

In [None]:
#Let's stop the engine:

mycar.stop()

In [None]:
#Let's call the "color" attribute:

mycar.color

Now, you can save this class as a .py file and later "import" into another file.

## Python Packages/Libraries

The default installation of Python comes with a lot of built-in objects, methods, and functions. But sometimes we need specialized packages (also called libraries) to perform special tasks. 

These packages/libraries are developed by developers. For example, there is a Matplotlib package that was developed to create nice charts and visualizations in Python. 

We can use these packages/libraries only by importing them from trusted sources. Libraries in Python are imported using the `import` keyword. 

## NumPy Package

NumPy is the fundamental package for scientific computing with Python. It includes powerful arrays and many sophisticated functions. Numpy arrays are similar to Python **lists** you have seen earlier. NumPy arrays are slightly different. They are generally more compact and faster to access. For data science, we typically prefer using NumPy arrays for performance reasons.

**Be careful, numpy wants the values to be of the same data type. It is designed for "homogeneous" tables - such as those that consist only of numbers. There shouldn't be any text in a table.**

After importing a library like Numpy, we can use its objects/functions only by calling the library's name.




In [None]:
# using this code, we import the numpy library and refer to it as np for short

import numpy as np  

In [None]:
# Let's create a one-dimensional numpy array:
myArray = np.array ([1,2,3,4,5,6,7,8])


In [None]:
# Let's print the same numpy array:

myArray

We can also create two-dimensional arrays. You can think of this as rows and columns again.

In [None]:
# Let's recreate the ages array we did earlier using a numpy array:
# Since values need to be homogeneous, let's use numbers for people names.
ages_array = np.array(
        [[0, 34],
        [1, 43],
        [2, 51],
        [3, 22],
        [4, 44],
        [5, 38],
        [6, 31],
        [7, 25]]
        )

In [None]:
print(ages_array)

In [None]:
# We can retreive the values the same way we did before:
ages_array[0][0]

In [None]:
# This retrieves the first row
ages_array[0][:]

In [None]:
# Let's retrieve all ages. Here, colon means all rows, 1 means the ages column
ages_array[:,1]

In [None]:
# Let's find the mean age:
ages_array[:,1].mean()

In [None]:
# You can also retrieve the size or shape of an array:
# When you run this code, the first value corresponds to total number of rows
# The second value corresponds to total number of columns
ages_array.shape

## Matplotlib Package

The Matplotlib package helps you create charts and graphs. If you master this package, you can create very nice visualizations.

In [None]:
# First we need to import it and call it "plt" for short

import matplotlib.pyplot as plt

In [None]:
# Let's create two Python lists:
year = [2000, 2005, 2010, 2015]

GDP = [1.17, 2.43, 3.22, 3.02]

In [None]:
# Now, we need to tell Python to "display" the charts as we create them
%matplotlib inline

In [None]:
# Now, create a line chart
plt.plot(year, GDP)

plt.show()

In [None]:
# We could also create a bar chart:
plt.bar(year, GDP)

plt.show()

The Pyplot library allows you create highly customized charts. You can insert axis names, data labels, color-coded lines, and other enhancements that make your charts stand out.

In [None]:
# Let's add axis labels:
plt.xlabel('Year')
plt.ylabel('Gross Domestic Product')

# Let's add chart title:
plt.title('This is my chart')

# Let's fill in some color:
plt.fill_between(year, GDP, 0, color='red')

# Finally, show the chart:
plt.show()

## Pandas Package

The Pandas library is similar to Numpy. Though, it doesn't require a homogeneous table. Therefore, it is better for non-homogeneous table. You'll be using this library in your assignments.

It is specifically designed to analyze tables in SQL databases and Excel sheets. So, it is becoming the preferred library among data scientists. For more information about pandas visit the following link: http://pandas.pydata.org/pandas-docs/stable/

In [None]:
# Importing pandas

import pandas as pd

We'll be using the "DataFrame" data structure of Pandas. 

You can think of the "DataFrame" as a data type like a Python list, or dictionary. But it provides more features and functionalities.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table

Let's import the ages file - which is in CSV format. It looks like this in Excel: 
<img src='images\CSV.jpg' style="width: 200px;">

In [None]:
# Let's import the CSV file. It imports both the values and the column names. 
ages_df = pd.read_csv('example_dataframe.csv')

ages_df

Notice that when you create/import a dataframe, Pandas assigns an "index" to each row - if you don't define your own index. You can see the index as the leftmost column (the numbers that start with 0). This index is used to identify and retrieve the data. 

In [None]:
# Let's display the first 5 rows of the dataframe:

ages_df.head()

In [None]:
# We can retrieve using columns names: 

ages_df['Name']

In [None]:
# Retrieve the age column:

ages_df['Age']

In [None]:
# Let's add a new column for "Spouse Age":
ages_df['SpouseAge'] = [31, 44, 50, 25, 48, 35, 29, 28]

ages_df

In [None]:
# Let's create a new calculated column by taking the age difference:
ages_df['AgeDiff'] = ages_df['Age'] - ages_df['SpouseAge']

ages_df

In [None]:
# Let's combine several things we learned:
# Let's create a new column based on whether the age difference is positive or negative.
# If it is negative, the column will have a value of "negative,
# if it is zero or positive, it will have a value of "zero or positive"

# First we need to create a custom function:

def NewColumn (inputdataframe):
    if (inputdataframe['AgeDiff'] < 0):
        return 'negative'
    else:
        return 'zero or positive'
    

# Now, we create a new column called "Sign" and apply the function on it:
ages_df['Sign'] = ages_df.apply(NewColumn, axis=1)

In [None]:
# Let's display the dataframe:

ages_df

In [None]:
# Let's retrieve Kevin's row. We need to use its index number, which is 1
ages_df.loc[1]

# Notice that the record is transposed for presentation purposes:

In [None]:
# If you want to retrieve only the age difference for Kevin:
ages_df['AgeDiff'].loc[1]

You can do a lot more with Pandas. For example, you can read CSV files, which makes it an ideal candidate to analyze big data sets with columns names. We'll experiment with some of these features in the activity and deliverable.

# Jupyter Notebooks: Important Tips and Tricks

A Jupyter Notebook is a very useful tools for mixing documentation (in Markdown) and code (in our case, in Python) in a single online editable notebook. You can run the code, and evaluate its output live in the notebook.

When you execute code in a Juptyer Notebook, it actually runs in a `kernel` - that is, a small piece of code running locally in your own Jupyter Notebooks server.  The output from this kernel is then rendered in your web browser.

The kernel's environment is built up as you run cells of code. Therefore, it is important in many of the labs to run the code blocks in the order they appear in the notebook. Some of these earlier blocks are importing libraries and setting up data structures and variables that will be referenced later in the lab. If you run the code out of order, you might see strange errors. You can always select `Kernel->Restart` to clear the slate if things go funny.

There is always the chance that something goes wrong. You can try the menu option `Kernel->Reconnect` to see if it is 
possible to connect again to running kernel without interrupting computations (but some part of your output may 
already be lost).

If you find the Notebook is becoming fully unresponsive after waiting a significant period of time, you can click on the `Kernel` menu, and select `Interrupt` and try running that cell again. You can also select `Restart` but this will clear the kernel of any memory it had previously, so you will need to start running code from the start of the notebook again.

There is a chance that you will see occasional `MemoryError` exception messages in your notebooks. If you do, don't panic -- our advice is simply to select `Restart and Clear Output` under Kernel and try again.  

### Saving your notebook

Most of your changes are saved automatically to your notebooks. But you should click on the Save button frequently to make sure all changes are saved (I lost changes even in autosave mode).

### Timeouts

Your notebook server may shutdown after a period of inactivity.

Your session is active as long as you are interacting with it. If you are inactive for a long time, you may see an error message on your notebook. However, your notebooks themselves are safe - you just have to restart Jupyter Notebooks to see them. 
