# 2.3 Introduction to Python

## 2.3.1 Getting Started

This notebook contains the Python lap work and questions found at the end of Chapter 2 of *An Introduction to Statistical Learning with Applications in Python*. Topics include the following:

- Introduction to Basic Python Commands
- Introduction to Supervised Learning Models

## 2.3.2 Basic Commands

The `print()` function outputs a text representtion of all its arguments to the console.

In [2]:
print('fit a model with', 11, 'variables')

fit a model with 11 variables


We can use the `?` to get information about functions.

In [3]:
print?

[1;31mSignature:[0m [0mprint[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0msep[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [0mend[0m[1;33m=[0m[1;34m'\n'[0m[1;33m,[0m [0mfile[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mflush[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method

We can add two numbers together:

In [4]:
5 + 3

8

We can concatenate strings using the `+` operator

In [6]:
'hello' + ' ' + 'world'

'hello world'

A *string* is an example of a *sequence* in Python. There are three main kinds of sequences: *lists*, *tuples*, and *strings*.

**Lists** are set with brackets `[]`.

In [9]:
x = [3, 4, 5]
x

[3, 4, 5]

Note that the `+` operator does not add elements in two lists pointwise, but rather concatenates the lists together!

In [10]:
y = [6, 7, 8]
x + y

[3, 4, 5, 6, 7, 8]

## 2.3.2 Introduction to Numerical Python

In Python, we can make use of function *libraries* or *packages*, which are collections of *modules* that are not necessarily included in the base Python distribution. The package we will use in this section is `numpy`, which is an appreviation for *numerical Python*.

First we must `import` the package. Note that we alias the package as `np`.

In [11]:
import numpy as np

In `numpy`, an *array* is a generic term for a multidimensional set of numbers. We can use `np.array()` to define `x` and `y`, which are one-dimensional arrays (i.e., vectors).

In [12]:
x = np.array([3, 4, 5])
y = np.array([4, 9, 7])

Note now that the `+` operator when used on two `numpy` arrays actually adds the elements of the arrays together pointwise:

In [13]:
x + y

array([ 7, 13, 12])

We can represent vectors using one-dimensional `numpy` arrays and matrices using two-dimensional `numpy` arrays. It is possible to represent a matrix using `np.matrix()`, but in this book we will use `np.array()` instead as this is the standard and allows for working with $n$-dimensional arrays which will come in handy later.

Next we create a matrix using `np.array()`:

In [14]:
x = np.array([[1,2], [3,4]])
x

array([[1, 2],
       [3, 4]])

In Python, objects (such as a `np.array()`) have multiple *attributes*, or associated objects/variables. We can access these using the syntax `<object>.<attribute>`, such as getting the number of dimensions of our array `x`:

In [15]:
x.ndim

2

We can also get the *data type* of `x` using `<object>.dtype`. In the following example, `int64` indicates that `x` contains 64-bit integers.

In [17]:
x.dtype

dtype('int64')

Python is intelligent, and the `numpy` package automatically selected the minimal data type required to store our object. If we had initialized the array with decimals, it would change the data type. In this case, it will contain 64-bit *floating-point numbers* (i.e., real-valued numbers).

In [19]:
np.array([[1,2], [3.0, 4]]).dtype

dtype('float64')

We can see the documentation of `np.array` using our `?`:

In [21]:
np.array?

[1;31mDocstring:[0m
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
copy : bool, optional
    If ``True`` (default), then the array data is copied. If ``None``,
    a copy will only be made if ``__array__`` returns a copy, if obj is
    a nested sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.). Note that any copy of
    the data is shallow, i.e., for arrays with object dtype, the new
    array will po

We can actually set the data type during initialization of the array passing a `dtype` argument.

In [23]:
np.array([[1, 2], [3, 4]], float).dtype

dtype('float64')

We can find the number of rows and columns using the `shape` attribute.

In [25]:
x.shape

(2, 2)

A *method* is a function that is associated with an object. For instance, given a `np.array`, the expressions `x.sum()` will sum all the elements of the array. 

In [27]:
x = np.array([1, 2, 3, 4])
print(x.sum())

10


Note that the `numpy` package has a `sum` method that we can pass the array to, which accomplishes the same goal of adding the elements together.

In [30]:
print(np.sum(x))

10


We can use the `reshape()` method to re-organize the array with different dimmensions based on values we pass into the function.

In [31]:
x = np.array([1, 2, 3, 4, 5, 6])
print('beginning x:\n', x)
x_reshape = x.reshape((2,3))
print('reshaped x:\n', x_reshape)

beginning x:
 [1 2 3 4 5 6]
reshaped x:
 [[1 2 3]
 [4 5 6]]


We can see also that `numpy` arrays are specified as a series of *rows*, which is called *row-major ordering* as opposed to *column-major ordering*.

Python also uses 0-based indexing, meaning the $0^{th}$ element of the array is actually the first element of the array.

In [33]:
print(x_reshape[0,0])

1


Similarly we can access the element in the first row and second column:


In [34]:
print(x_reshape[1,2])

6


Note that if we modify the first element of `x_reshape`, then we also modify the first element of `x`!

In [35]:
print('x before we modify x_reshape:\n', x)
print('x_reshape before we modify x_reshape:\n', x_reshape)
x_reshape[0,0]=5
print('x_reshape after we modify its top left element:\n', x_reshape)
print('x after we modify top left element of x_reshape:\n', x)

x before we modify x_reshape:
 [1 2 3 4 5 6]
x_reshape before we modify x_reshape:
 [[1 2 3]
 [4 5 6]]
x_reshape after we modify its top left element:
 [[5 2 3]
 [4 5 6]]
x after we modify top left element of x_reshape:
 [5 2 3 4 5 6]


This happened because Python is set both `x` and `x_reshape` to the same location in memory. In Python, variables are actually *pointers* to objects in memory, rather than the object itself. So when we modify one pointer, we tell Python to modify the object in memory and the pointer variables `x` and `x_reshape` are both pointing to that singular object that has been modified.

We've seen that we can modify lists, but what about tuples? Tuples are set with parentheses `()`.

In [36]:
my_tuple = (3, 4, 5)
my_tuple[0] = 2

TypeError: 'tuple' object does not support item assignment

Tuples are not mutable, meaning that they are set after being initialized. Trying to change a tuple throws an *exception*, or error.

For some extra handy attributes of arrays we have:
- `shape` : returns the array's dimensions
- `ndim`: returns the number of dimensions
- `T`: the transpose of the array