## What We Looked At Last Time
* We worked through a few exercises built around lists, dictionaries, and sets.
* We looked at using the FuncAnimation method of matplotlib to perform simple animations in Python
* We re-introduced NumPy arrays and provided more detail on their functionality.

## What We'll Look At Today
* We'll wrap up our discussion of arrays.
* We'll introduce pandas and data _series_, which provide the cornerstone for many practical data operations in Python

## Indexing an Array
* One-dimensional `array`s can be **indexed** like lists.
* `array`s with two or more dimensions are handled a bit differently.
    * To select an element in a two-dimensional `array`, specify a tuple containing the element’s row and column indices in **square brackets**.
    * To select a single row, specify only one index in square brackets. 
    * To select a single column, use square brackets and a colon (:) for the row dimension, and the appropriate index for the column following a comma (,).

In [None]:
import numpy as np
#grades of 4 students on 3 different exams
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])


In [None]:
print(grades)
print(grades[0,1]) #Student 0, exam 1

In [None]:
print(grades[1,:]) #All of student 1's exam grades (same as grades[1,0:3])

In [None]:
print(grades[:,2]) #All grades for exam 2

## Multiple Rows/Columns and Slicing
* Multiple sequential rows/columns can be selected using slice notation
* Multiple non-sequential rows/columns can be selected using a list of indices
* It's possible to select a subset of rows and subset of columns simultaneously
* As with slicing a list, second indices are **not included** in any range

In [None]:
print(grades[0:2,:]) #Grades for students 0 and 1

In [None]:
print(grades[:, [0, 2]]) #Test grades for exams 0 and 2 only

In [None]:
print(grades[[0,3],1:3]) #Student 0 and 3's grades on exams 1 and 2


# Views: Shallow Copies
* Views “see” the data in other objects, rather than having their own copies of the data
* Views are technically shallow copies of array objects
*`array` method `view` returns a new array object with a view of the original `array` object’s data
* Changing any actual data within one array will affect the other array's data.

In [None]:
numbers = np.arange(0, 5)
numbersview = numbers.view()
print(numbers)
print(numbersview)

In [None]:
print(id(numbers))
print(id(numbersview)) #even though this array has the same elements, it's a different object

In [None]:
numbers[1] = 10 #This impacts both arrays' data
numbersview[3] = 30 #And so does this!
print(numbers)
print(numbersview)

## Views, Slices and Subsets
* Slices create views (shallow copies) of arrays.
* But using the list-based (comma-separated) notation actually creates **new array data**. 
* With multiple dimensions, if only slices are used, then shallow copies are created, but if even _one_ dimension uses list-based indexing a new copy is created.

In [None]:
numbers = np.arange(0, 5)
numbersmod1 = numbers[1:4] #This variable references elements 1-3
numbersmod2 = numbers[[0,4]] #This variable DUPLICATEs of elements 0 and 4
print(numbersmod1)
print(numbersmod2)


In [None]:
#Confirming that all three arrays are technically different objects
print(id(numbers))
print(id(numbersmod1))
print(id(numbersmod2))


In [None]:
numbersmod1[2] = 99
numbersmod2[1] = 100
print(numbersmod1)
print(numbersmod2)
print(numbers) #Note what has changed and what stays the same!

# Deep Copies
* When sharing **mutable** values, it’s often necessary to create a **deep copy** of the original data
* This is especially important in multi-core programming, where different transformations are applied on the original data concurrently (if the copies aren't deep, then multiple transformations may be applied in an arbitrary order on the same data!)
* The `array` method `copy` returns a new array object with an independent copy of the original array's data.

In [None]:
mnumbers =  np.array([[1, 2, 3], [2, 4, 6]])
mnumberscopy1 = mnumbers.copy() #A deep copy
mnumberscopy2 = mnumbers[[0,1],0:3] #Also a deep copy (because we used list notation)


In [None]:
mnumbers[0,0]*=10
mnumberscopy1[0,2]*=10
mnumberscopy2[1,1]*=10
print(mnumbers)
print(mnumberscopy1)
print(mnumberscopy2)


# Reshaping and Transposing 

### `reshape` vs. `resize` 
* Method `reshape` returns a _view_ of the original `array` with new dimensions
* `reshape` Does _not_ modify the original `array` (but again, remember it's a shallow copy -- so changes to one will affect the other)
* Method `resize` modifies the original `array`’s shape in-place.

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90]])
grades_rs = grades.reshape(3,2)
grades_rs[1,0] = 1000 #This change will still be seen in the original grades array!
print(grades_rs)
print(grades)


In [None]:
grades.resize(1, 6)
print(grades)
grades[0,0]=500
print(grades_rs)

### `flatten` vs. `ravel` 
* Can flatten a multi-dimensonal array into a single dimension with methods **`flatten`** and **`ravel`**
    * From most perspectives, these functions are identical.
    * **However**, `flatten` _deep copies_ the original array’s data
    * Method `ravel` produces a _view_ of the original `array`, which _shares_ the `grades` `array`’s data

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90]])
flattened = grades.flatten()
print(grades)
print(flattened)

In [None]:
grades[0,1] = 1000
flattened[5] = 2000
print(grades) #Changes to flattened do not impact grades
print(flattened) #and vice-versa

In [None]:
#Trying the same thing with ravel shows that the copy is shallow
grades = np.array([[87, 96, 70], [100, 87, 90]])
raveled = grades.ravel()
grades[0,1] = 200
raveled[5] = 300
print(grades) 
print(raveled)

### Transposing Rows and Columns
* Can quickly **transpose** an `array`’s rows and columns
    * “flips” the `array`, so the rows become the columns and the columns become the rows
    * **`T` attribute** (not method!) returns a transposed _view_ (shallow copy) of the `array`

In [None]:
grades_t = grades.T
grades_t[1,1] = 400
print(grades)
print() 
print(grades_t)

### Horizontal and Vertical Stacking
* We can combine arrays by adding more columns or more rows — this is known as _horizontal stacking_ and _vertical stacking_
* Use `np.hstack` and a _tuple_ of array objects to combine them in sequential, "left-to-right" order.
* Use `np.vstack` and a _tuple_ of array objects to combine them in sequential, "top-to-bottom" order. 

In [None]:
chars1 = np.array([['a', 'b', 'c'], ['d', 'e', 'f']])
chars2 = np.array([['g', 'h', 'i', 'j'], ['k', 'l', 'm', 'n']])
chars3 = np.array(['o','p','q'])
print(chars1);
print()
print(chars2)
print()
print(chars3)


In [None]:
print(np.hstack((chars1,chars2)))
print()
print(np.vstack((chars3,chars1)))


In [None]:
print(np.hstack((chars1,chars3))) #Dimensions are incompatible

In [None]:
print(np.vstack((chars1,chars2))) #As are these

# Intro to Data Science: `pandas` Series and `DataFrames`
* NumPy’s `array` is optimized for homogeneous numeric data that’s accessed via integer indices
* Big data applications must support mixed data types, customized indexing, missing data, data that’s not structured consistently and data that needs to be manipulated into forms appropriate for the databases and data analysis packages you use
* **Pandas** is the most popular library for dealing with such data
* Two key collection types exist in Pandas:
    * **`Series`** for one-dimensional collections 
    * **`DataFrames`** for two-dimensional collections

* NumPy and pandas are closely related and share compatibility:
    * `Series` and `DataFrame`s use `array`s “under the hood” 
    * `Series` and `DataFrame`s are valid arguments to many NumPy operations
    * `array`s are valid arguments to many `Series` and `DataFrame` operations

## pandas `Series` 
* An enhanced one-dimensional `array`
* Supports custom indexing, including even non-integer indices like strings
* Offers additional capabilities that make them more convenient for many data-science oriented tasks
    * `Series` may have missing data
    * Many `Series` operations ignore missing data by default

### Starting Small: a basic pandas  `Series`
* The `pd.Series` constructor can be used to create a series from any iterable
* Different underlying iterables can result in different datatypes for the series.
* By default, a series has integer indices numbered sequentially from 0
* Many basic operations available to lists or numPy arrays can be applied to series.

In [None]:
import numpy as np
import pandas as pd
gradesList = [86, 91, 94, 89]
gradesSeries1 = pd.Series(gradesList)
print(gradesSeries1)

In [None]:
gradesArray = np.array(gradesList)
gradesSeries2 = pd.Series(gradesArray)
print(gradesSeries2) #note the different data type!


### Assignment and Copies with Series
* Simple assignment creates a shallow copy (view) of an existing series.
* Using the `copy` method will create a deep copy.
* Other series assignment rules match those of standard numPy arrays
    * Using slices will return a view
    * Using list notation in indexing returns a deep copy.
    * By default, indices are maintained with these assignments (so be careful with these operations). 


In [None]:
gsCopy1 = gradesSeries1 #Shallow copy (view)
gsCopy2 = gradesSeries1.copy() #Deep copy
gsCopy1[0] = 100
gsCopy2[1] = 100
print(gradesSeries1)

### More On Datatypes in Series 
* As mentioned previously, data series are built on the NumPy library, and thus they share most of the same data types.
* The method `astype` can be used to _cast_ a series from one compatible variety to another.
* Note that columns with mixed datatypes are permitted, but will be listed as being of type "object" and operations applied may have unanticipated outcomes

In [None]:
gradesSeriesAlt = gradesSeries1.astype('int16')
print(gradesSeriesAlt)
print()
print(gradesSeriesAlt>=90)

In [None]:
gradesSeriesAlt[2]="One-hundred"
print(gradesSeriesAlt)


In [None]:
print(gradesSeriesAlt<100)