# Introduction to Python



### Simple Math
Let's get started and use Python as a calculator

In [1]:
print(7 + 10)

17


In [2]:
# Addition 
print(5 + 5)

# Subtraction
print(5 - 5)

# Multiplication
print(3 * 5)

# Division 
print(10 / 2)

# Exponentiation
print(4 ** 2)

10
0
15
5.0
16


In [3]:
# Combining processes
print(4 * (2 ** 8))

1024


### Variables

A variable is a specific, case-sensitive name that serves as a placeholder for another value. Every time you use a variable name, Python replaces the variable with the actual value. Variables help make your code reproducible and flexible to changes.

Here, we will calculate body mass index (BMI) using BMI = weight(kg) / height(m)^2

In [4]:
# Weight in kg
weight = 68.2

# Height in meters
height = 1.8

# Check to see the value of each variable
weight
height

# Calculate BMI using variables
weight/(height ** 2)

# Saving expression to the variable name `BMI`
BMI = weight/(height ** 2)

# Two ways to print variable. 
# One way is to console, the other is best practice for use in scripts
BMI
print(BMI)

21.049382716049383


In [5]:
# Identify the type of the object stored as `BMI`
type(BMI)

float

### Data Types

Variables can store data of different types, and different types can do different things.

Python has several data types built-in by default:

* Single objects
    * **Text** - str
    * **Numeric** - int, float, complex
    * **Boolean** - bool
* Collections of data are described below

You can get the data type of any object by using the `type()` function


In [6]:
# String - use single or double quotes
type("abc")

x = "This is a string"
type(x)
print(type(x))

<class 'str'>


In [7]:
# Float - decimal, real number
type(1.2)

float

In [8]:
# Integer - whole number
type(5)

int

In [9]:
# Booleans - logical values
# Useful to filter data
type(True)
type(False)

bool

In [10]:
# Combining data types that are the same

# Sum of integers
print(2 + 3)

# Sum of strings
print("ab" + "cd")

5
abcd


### Combining data

In Python, to combine data types that are not the same you need to explicitly convert the type to be the same. You can do this by using the following functions:

* `str()` - convert to string
* `int()` - convert to integer
* `float()` - convert to float
* `bool()` - convert to boolean

In [11]:
# Example of combining different data types

# Integer
two_var = 2
print(type(two_var))

# Wrong - tries to combine two different data types and results in error message
# print("I have " + two_var + " ducks")

# Correct - converts integer value to string before combining with other string values
print("I have " + str(two_var) + " ducks")

<class 'int'>
I have 2 ducks


In [12]:
# String examples

# Repeating a string using "multiplication"
print(("duck " * 2) + "goose")

# Converting integers to strings
print("I have " + str(2) + " ducks and " + str(1) + " goose")

duck duck goose
I have 2 ducks and 1 goose


In [13]:
# Boolean examples

# True = 1, False = 0
print(True + False)
print(True + False + 5)

# Any non-zero number is considered to be "True" and equal to 1
print(True + False + bool(5))
print(True + False + bool(0))

# Any non-zero input is considered to be "True" and equal to 1
print(True + False + bool("eight"))

1
6
2
1
2


In [14]:
# Integer examples
print(1 + 2)

# True = 1, False = 0
print(1 + 2 + True)

# An integer character can be converted from string to integer
print(1 + 2 + int("5"))

# Other characters cannot be converted from string to integers - results in error
#print(1 + 2 + int("five"))

3
4
8


### Lists

Lists are used to store multiple items in a single variable.

There are 4 built-in data types in Python used to store collections of data, all with different qualities and usage.

* **List** is a collection which is ordered and changeable. Allows duplicate members.
* **Tuple** is a collection which is ordered and unchangeable. Allows duplicate members.
* **Set** is a collection which is unordered, unchangeable, and unindexed. No duplicate members.
* **Dictionary** is a collection which is ordered and changeable. No duplicate members.

***

#### Characteristics of Lists

* Lists are created using square brackets `[]`
* **Ordered**: list items are indexed, the first item has index `[0]`, the second item has index `[1]` and so on
* **Changeable**: list items can be changed, added, and removed after it has been created
* **Duplicates**: lists can have items with the same value

In [15]:
# List of strings with duplicates
fruit = ["apple", "banana", "cherry", "apple", "cherry", "banana"]
print(fruit)
print(type(fruit))

['apple', 'banana', 'cherry', 'apple', 'cherry', 'banana']
<class 'list'>


In [16]:
# String list
list_str = ["apple", "banana", "cherry"]
print(list_str)

# Integer list
list_int = [1, 5, 7, 9, 3]
print(list_int)

# Boolean list
list_bool = [True, False, False]
print(list_bool)

['apple', 'banana', 'cherry']
[1, 5, 7, 9, 3]
[True, False, False]


In [17]:
# A list with different data types
list_combo = ["abc", 34, True, 40, "male"]
print(list_combo)

['abc', 34, True, 40, 'male']


In [18]:
# Find the length of the list using `len()`
print(len(list_combo))

5


In [19]:
# Fruit variables
apple = 5
banana = 10
cherry = 7

# List combined with strings and variables with integer values
fruits = ["Apples", apple, "Bananas", banana, "Cherries", cherry]
print(fruits)
print(type(fruits))

['Apples', 5, 'Bananas', 10, 'Cherries', 7]
<class 'list'>


In [20]:
# List of lists
fruit_lists = [["Apples", apple],
               ["Bananas", banana],
               ["Cherries", cherry]]
print(fruit_lists)
print(type(fruit_lists))

[['Apples', 5], ['Bananas', 10], ['Cherries', 7]]
<class 'list'>


### Subsetting Lists

**Zero-based indexing**

The first element in a list has index **0**, the second element has index **1**, and so on. You can specifically select elements from your list by using the index value.

Syntax to reference list and specific index value: `mylist[2]`

In [21]:
print(fruits)

['Apples', 5, 'Bananas', 10, 'Cherries', 7]


In [22]:
# Length of list
print(len(fruits))

6


In [23]:
# Subset FIRST item in the list with INDEX 0
fruits[0]

'Apples'

In [24]:
# Subset THIRD item in the list with INDEX 2
fruits[2]

'Bananas'

In [25]:
# Count backwards using negative indexes, starts at -1
# Subset LAST item in the list using INDEX -1
fruits[-1]

7

In [26]:
# Subset the SECOND (index 1) and LAST (index -1) values in the list
# Calculate the sum of the values
app_cher = fruits[1] + fruits[-1]
print(app_cher)

12


### Subsetting Lists

**Slicing**

You can select multiple elements from a list by selecting a range of elements. The result will be a new list of the choosen elements.

`my_list[start:end]`

* **start** index will be included
* **end** index will NOT be included

In [27]:
# Range [included : excluded]
# Slice the FIRST TWO items from the list (INDEX 0 and INDEX 1)
fruits[0:2]

['Apples', 5]

In [28]:
# Subset everything before and NOT including INDEX 4
# Leave the start position empty to indicate "everything"
fruits[:4]

['Apples', 5, 'Bananas', 10]

In [29]:
# Subset INDEX 4 to the end of the list
# Leave the end position empty to indicate "everything"
fruits[4:]

['Cherries', 7]

In [30]:
# Subset INDEX 2 up to and NOT including INDEX 4
fruits[2:4]

['Bananas', 10]

In [31]:
# Select ALL list elements explicitly
fruits[:]

['Apples', 5, 'Bananas', 10, 'Cherries', 7]

In [32]:
# Subset two values and calculate the sum
app_cher = fruits[1] + fruits[-1]
print(app_cher)

12


In [33]:
# Subset lists from lists
print(fruit_lists)

[['Apples', 5], ['Bananas', 10], ['Cherries', 7]]


In [34]:
# Length of lists
print(len(fruit_lists))

3


In [35]:
# Subset the THIRD list using INDEX 2
fruit_lists[2]

['Cherries', 7]

In [36]:
# Select an element from a list, within another list
# Subset the THIRD list (INDEX 2) and the FIRST element (INDEX 0)
fruit_lists[2][0]

'Cherries'

### Manipulating list elements

As stated before, lists are a changeable collection of values. This includes:

* Changing an element
* Adding an element
* Removing an element

In [37]:
print(fruits)

['Apples', 5, 'Bananas', 10, 'Cherries', 7]


In [38]:
# Changing a single element

# Single value replacements do not need brackets
# Replace the second element in the list with a different value
# Here, '5' is replaced by '3'
fruits[1] = 3
print(fruits)

['Apples', 3, 'Bananas', 10, 'Cherries', 7]


In [39]:
# Changing multiple elements

# Must use brackets to indicate a list is being provided
fruits[4:] = ["Oranges", 5]
print(fruits)

['Apples', 3, 'Bananas', 10, 'Oranges', 5]


In [40]:
# Adding an element

# New element must be in list form to add to another list
# Creates a new list and the original list is unchanged
print(fruits + ["Cherries"])
print(fruits)

['Apples', 3, 'Bananas', 10, 'Oranges', 5, 'Cherries']
['Apples', 3, 'Bananas', 10, 'Oranges', 5]


In [41]:
# The new list created by adding an element can be saved as new list with a new variable name
fruits_2 = fruits + ["Cherries", 8]
print(fruits_2)

['Apples', 3, 'Bananas', 10, 'Oranges', 5, 'Cherries', 8]


#### Deleting an element of a list

* Deleting an element can be done using the `del()` function. 
* As an element is removed from a list, the indexes of the elements that come after the deleted element all change

In [42]:
# Removing elements using `del()`

# Subset the SECOND TO LAST and LAST element (INDEX -2 and INDEX -1)
# Use `:` to indicate "everything" 
print(fruits_2[-2:])

# Remove element
del(fruits_2[-2:])
print(fruits_2)

['Cherries', 8]
['Apples', 3, 'Bananas', 10, 'Oranges', 5]


In [43]:
# Another example
print(fruits_2[2:4])

# Removing elements
del(fruits_2[2:4])
print(fruits_2)

['Bananas', 10]
['Apples', 3, 'Oranges', 5]


In [44]:
# Removing multiple elements in series
# The index changes after each removal
print(fruits_2)
print(fruits_2[-1])

del(fruits_2[-1]); del(fruits_2[-1])
print(fruits_2)

['Apples', 3, 'Oranges', 5]
5
['Apples', 3]


### Making copies and the inner workings of python

* Creating a copy of a list by assigning a new variable name only makes a reference to the original list
* Any changes made to either reference will change the original list
* Copying a list must be done explicitly so that the actually list, not the reference, is copied.

In [45]:
print(fruits_2)

# Copying reference for an original list
fruits_copy = fruits_2
print(fruits_copy)

['Apples', 3]
['Apples', 3]


In [46]:
# Change in reference copy
# Result is a change in the original list 
del(fruits_copy[1])
print(fruits_copy)
print(fruits_2)

['Apples']
['Apples']


In [47]:
# New list
mylist = [1, 2, 3, 4, 5]
print(mylist)

[1, 2, 3, 4, 5]


In [48]:
# Copying original list explicitly with `[:]`
copy_1 = mylist[:]
print(copy_1)

[1, 2, 3, 4, 5]


In [49]:
# Remove last two elements
# Change in new list only
del(copy_1[-2:])
print(copy_1)
print(mylist)

[1, 2, 3]
[1, 2, 3, 4, 5]


In [50]:
# Copying original list explicitly with list()
copy_2 = list(mylist)
print(copy_2)
print(mylist)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


In [51]:
# Remove first two elements
# Change in new list only
del(copy_2[0:2])
print(copy_2)
print(mylist)

[3, 4, 5]
[1, 2, 3, 4, 5]


### Functions

A function is a piece of resuable code that performs a specific task

**Examples of functions**

* `str()`
* `bool()`
* `float()`
* `int()`
* `type()`
* `del()`

You can search online to find functions built in to Python that you may need. Try searching for "maximum value in a list in python" in your search engine.

Many functions can accept multiple arguments. These can also be found on the help page.

In [52]:
# Find maximum value in a list
max(mylist)

5

In [53]:
# Use the function `round()` with default settings
round(1.2345)

1

In [54]:
# Finding help
help(round)
?round

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



[0;31mSignature:[0m [0mround[0m[0;34m([0m[0mnumber[0m[0;34m,[0m [0mndigits[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Round a number to a given precision in decimal digits.

The return value is an integer if ndigits is omitted or None.  Otherwise
the return value has the same type as the number.  ndigits may be negative.
[0;31mType:[0m      builtin_function_or_method


In [55]:
# Use the function `round()` with option `ndigits` settings
round(1.2345, 2)

1.23

In [56]:
# Use the function `sorted()`
help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



In [57]:
# Make new lists
one = ["d", "e", "f"]
two = ["c", "b", "a"]

# Combine lists into a new list
three = one + two
print(three)

['d', 'e', 'f', 'c', 'b', 'a']


In [58]:
# Sort the new list
# Output is only printed to console, it does not change the original list
# The sorted list must be saved with a variable name
print(sorted(three))
print(three)

['a', 'b', 'c', 'd', 'e', 'f']
['d', 'e', 'f', 'c', 'b', 'a']


In [59]:
# Reverse sort the list
sorted(three, reverse = True)

['f', 'e', 'd', 'c', 'b', 'a']

### Methods

**Objects** are values and data structures in Python. Objects have methods associated, depending on the type of object.

**Methods** are call functions that belong to Python objects. They are used with *dot notation*. It is important to note that some methods change the object they are called on ... and some don't.

* **String**
    * `capitalize()` - returns a copy of the original string and converts               the first character of the string to a capital (uppercase)                 letter while making all other characters in the string                     lowercase letters.
    * `replace()` - returns a copy of the string with all occurrences of                 substring old replaced by new
    * `upper()` - returns the uppercase string from the given string
    * `index()` - returns the index of the specified element in the string
    * `count()` - returns the number of times the specified value appears               in the string
* **Float**
    * `bit_length()` - Returns the number of bits necessary to represent                 an integer in binary, excluding the sign and leading zeros
* **List**
    * `index()` - returns the index of the specified element in the list.
    * `count()` - returns the number of times the specified element                     appears in the list
    * `append()` - adds an item to the end of the list.
    * `remove()` - removes the first matching element of a list

In [60]:
# String methods example
a = "apple"
print(a)

apple


In [61]:
# Capitalize the provided string
a.capitalize()

'Apple'

In [62]:
# The original value is not changed
print(a)

apple


In [63]:
# Make all letters upper case
a.upper()

'APPLE'

In [64]:
# Count the number of times a value is present in string
a.count("p")

2

In [65]:
# Find the first index of a specific value
a.index("p")

1

In [66]:
a.index("e")

4

In [67]:
# List methods example
colors = ["red", "yellow"]
print(colors)

['red', 'yellow']


In [68]:
# Add an element to the list
# Change is made to original list
colors.append("blue")
print(colors)

['red', 'yellow', 'blue']


In [69]:
# Sort list alphabetically
# Change is made to original list
colors.sort()
print(colors)

['blue', 'red', 'yellow']


In [70]:
# Find index of specific value
colors.index("red")

1

In [71]:
# Remove an item from list
# Change is made to original list
colors.remove("yellow")
print(colors)

['blue', 'red']


### Packages

* Packages are a directory of python scripts
* Each script is a module that focuses on a specific task
* Specify functions, methods, types
* Thousands of packages available
    * Numpy
    * Matplotlib
    * Scikit-learn
    * Pandas
* Anaconda has most packages installed
    * For non-Anaconda users: install specific packages using `pip`
        * https://pip.pypa.io/en/stable/installation/
* To use packages in Python, you must `import` the entire package
    * `import numpy`
    * `numpy.array([1, 2, 3])`
    * Making a nickname for imported package
        * `import numpy as np`
        * `np.array([1, 2, 3])`
    * Possible but not recommended: only import one function from a package
        * `from numpy import array`
        * `array([1, 2, 3])`
    * Possible but not recommended: import one function from a package with a nickname
        * `from numpy import array as my_array`
        * `my_array([1, 2, 3])`


In [72]:
# `pi` is a function that provide the number pi
# pi

In [73]:
# `math` is the package that contains the script containing the pi function
# To use `pi`, the math package must be imported into Python
import math

In [74]:
# Call a function from a specific package using dot (.) notation
# package_name.function
math.pi

3.141592653589793

In [75]:
# Find the area of a circle
# Equation: Area = pi * r ** 2
area = math.pi * (0.5) ** 2
print(area)
print(round(area, 2))

0.7853981633974483
0.79


### Numpy

"Numeric python"

- Non-Anaconda users: installation using terminal `pip install numpy`
- Numpy arrays (grid of data) are an alternative to the Python list. It can only contain one type of data.

In [76]:
# Importing the package Numpy with the nickname "np"
import numpy as np

In [77]:
# Making Python lists (py_list) and Numpy arrays (np_arrays)
py_list = [10, 30, 50]
np_array = np.array([20, 40, 60])

In [78]:
# Determining type for python list
print(py_list)
type(py_list)

[10, 30, 50]


list

In [79]:
# Determining type for Numpy array
print(np_array)
type(np_array)

[20 40 60]


numpy.ndarray

In [80]:
# Combine Python lists
# Result: lists are concatenated
py_list + py_list

[10, 30, 50, 10, 30, 50]

In [81]:
# Sum of Numpy arrays
# Result: lists are combined element-wise and the sums are calculated
np_array + np_array

array([ 40,  80, 120])

In [82]:
# Sum of list and array
# Result: the Python list is converted to an array and combined element-wise
py_list + np_array

array([ 30,  70, 110])

In [83]:
# Combine python list and array with concatenate
cat_array = np.concatenate((py_list, np_array))
print(type(cat_array))

<class 'numpy.ndarray'>


In [84]:
# Sort using array method
cat_array.sort()
print(type(cat_array))
print(cat_array)

<class 'numpy.ndarray'>
[10 20 30 40 50 60]


In [85]:
# Multplying a single value with a Numpy array
# Element-wise calculation
triple_array = cat_array * 3
print(triple_array)

# New array sorted
triple_array.sort()
print(type(triple_array))

[ 30  60  90 120 150 180]
<class 'numpy.ndarray'>


In [86]:
# Subsetting for the FIRST element with INDEX 0
cat_array[0]

10

In [87]:
# Subsetting for the LAST element with INDEX -1
cat_array[-1]

60

In [88]:
# Subsetting for the FIRST element with INDEX 0
py_list[0]

10

In [89]:
# Subsetting for the LAST element with INDEX -1
py_list[-1]

50

In [90]:
# Retrieving array of booleans from condition (> 35)
cat_array > 35

array([False, False, False,  True,  True,  True])

In [91]:
# Compare array to boolean result
print(cat_array)

[10 20 30 40 50 60]


In [92]:
# Subsetting array with array of booleans
cat_array[cat_array > 35]

array([40, 50, 60])

In [93]:
# Python lists cannot be subset with conditions (> 25)
print(py_list)
# py_list > 25

[10, 30, 50]


In [94]:
# Python lists can be coverted to Numpy arrays first
# Then it can be subset with conditions (> 25)
py_array = np.array(py_list)
print(py_array)
print(type(py_array))

[10 30 50]
<class 'numpy.ndarray'>


In [95]:
# Retrieving array of booleans from condition (> 25)
py_array > 25

array([False,  True,  True])

In [96]:
# Subsetting array with array of booleans
py_array[py_array > 25]

array([30, 50])

### Numpy Arrays

So far we have seen Numpy arrays look very similar to a Python list. When we use `type()` on an array, `<class 'numpy.ndarray'>` is printed to the screen. This means that the object is a Numpy N-dimensional array, indicating it can be multidimensional. 

Furthermore, just like **methods** are associated with a type of object, **attributes** are also class associated and they can provide more information about the structure of the data. To use attributes, type the object name (dot) attribute -- do not use parentheses.

In [97]:
# Make a new list of temps from Orlando, Florida
# First element is day of month in January
# Second element is high temp in F
# 5 rows, 2 columns
temps = [[1, 79],
         [5, 77],
         [10, 79],
         [15, 72],
         [20, 80],
         [25, 55],
         [30, 59]]

In [98]:
# Check the new Python list (of lists)
print(temps)
print(type(temps))

[[1, 79], [5, 77], [10, 79], [15, 72], [20, 80], [25, 55], [30, 59]]
<class 'list'>


In [99]:
# Import Numpy if you haven't already
import numpy as np

In [100]:
# Convert the Python list "temps" to a Numpy array
np_temps = np.array(temps)
print(np_temps)

[[ 1 79]
 [ 5 77]
 [10 79]
 [15 72]
 [20 80]
 [25 55]
 [30 59]]


In [101]:
print(type(np_temps))

<class 'numpy.ndarray'>


In [102]:
# 7 rows, 2 columns
print(np_temps.shape)

(7, 2)


Subsetting a multidimenional array is similar to subsetting a list. Only now, both rows and columns are indicated. 

`my_array[rows, columns]`

In [103]:
# Subset ALL rows and the SECOND column (INDEX 1), containing the temperatures only
# 
np_temps[:, 1]

array([79, 77, 79, 72, 80, 55, 59])

In [104]:
# Subset the temperature for the FIRST DAY of the month
# FIRST row (INDEX 0)
# SECOND column (INDEX 1)
np_temps[0, 1]

79

In [105]:
# Subset the temperature for the LAST DAY of the month
# LAST row (INDEX -1)
# SECOND column (INDEX 1)
np_temps[-1, 1]

59

In [106]:
# Find the minimum value in the temperature column of the array
# ALL rows (:)
# SECOND column (INDEX 1)
min(np_temps[:, 1])

55

In [107]:
# Find the maximum value in the temperature column of the array
# ALL rows (:)
# SECOND column (INDEX 1)
max(np_temps[:, 1])

80

In [108]:
# Mean of temperatures using Numpy `mean()` function
# ALL rows (:)
# SECOND column (INDEX 1)
np.mean(np_temps[:, 1])

71.57142857142857

In [109]:
# Median of temperature using numpy median function
# ALL rows (:)
# SECOND column (INDEX 1)
np.median(np_temps[:, 1])

77.0

In [110]:
# Standard deviation of temperatures using Numpy standard deviation `std()` function
# Rounding std output
# ALL rows (:)
# SECOND column (INDEX 1)
np.std(np_temps[:, 1])
round(np.std(np_temps[:, 1]))

10

### Summary 

* Python is an object oriented language
* Everything is an object
    * integers, strings, lists, arrays, ...
* Objects, including the objects that are the result of an expression, can be stored as a variable
* Lists can be modified, contain duplicates, and may have different types of data
* Arrays can be generated using the Python package Numpy
    * Methods occur in an element-wise fashion
    * Subsetting multidimensional arrays requires both row and column indices