<h1> Useful Packages for Data Science in Python </h1>

<p>
This walkthrough will take us through some of the most common and best interfacing Python packages for data scientists. It will also shallowly walk through the trickier and more useful parts of the Python language.
</p>

<h2> Class Objects </h2>

<p>
Much of Python's syntactic sugar is a consequence of being an object-oriented language. Object-oriented programming allows for the association of functions with classes of objects (generally) and specific instantiations of a class object in the form of class **methods**. It also allows for the association of different Python objects by allowing for multiple objects to be **attributes** of a class/class object. These are not necessary to know how to create, though they can be incredibly helpful and time-saving in writing code in Python. More importantly, understanding how these work will be necessary for understanding most Python modules and Python documentation.
</p>

In [179]:
## Define a new class
class test_class:
    ## Classes require an __init__ method for their generation. All other methods are optional. Note that all
    ## variables and functions are currently being defined in the scope of the class, and not the global scope.
    ## Lastly, note that attributes/methods of the class that can be accessed outside the scope of the class must
    ## be defined using "self" syntax.
    def __init__(self, value, name, personify=True):
        self.value = value
        if personify:
            self.name = name
        ## Can call other class methods from within a class method
        self.here_I_am()
    def here_I_am(self):
        print(self.value)

## Instantiate a new instance of this class
test_instance = test_class(5.0, 'Hans')

## Retroactively assign attribute to test_class class
setattr(test_class, 'treble', True)
## Note that the "treble" attribute can be accessed as an attribute of both the test_class class generally, and any
## object of the test_class class initialized hereafter.
print(test_class.treble)

print(test_class(8.0, 'Charles').treble)

5.0
True
8.0
True


<h2> Iterators </h2>

<p>
The trademark of Python as a programming language is its excessive "syntactic sugar." While Python runs about fifty times slower than C for the same basic operations, what can be written in 100 lines of C can often be handled in ten lines or less of Python. One of the kinds of objects that contribute to this ease in writing is the iterator class. An iterator is never usually generated as a standalone object; rather, iterators are transiently generated from objects that have an ordered index, like an array and then deleted after the "iteration" it was made for is finished.
</p>

In [153]:
"""
Iterators most often appear when implementing "for-in' syntax, which is of the form:

    for <sub_obj> in <iterable_onj>:
    
where <iterable_obj> is something like a list, tuple, range object, NumPy array, etc. which has an ordered index.
"""

## Iterating over a range of integers; implemented using the "range" function. Note that the iterator begins at zero
## (Python is zero-indexed) and ends at the integer immediately below the input integer.
for j in range(5):
    print(j)
print('\n')

## Range can also support two integer inputs; "range(m,n)" allows for iteration over all integers j in the range
## m <= j < n
for k in range(26,32):
    print(k)
print('\n')

## Iterate over the items in a list; this is covered in greater depth later
listt = ['meth', 'eth', 'prop', 'but']
for word in listt:
    print(word+'ane')

0
1
2
3
4


26
27
28
29
30
31


methane
ethane
propane
butane


In [154]:
## It is possible to iterate over the elements of multiple iterable objects at the same time. This is most often
## implemented using the zip function. 'first' is a list, 
first = ['Will', 'Jennifer', 'Matt', 'Tom']
last = ('Smith', 'Lopez', 'Damon', 'Hanks')
role = ['actor', 'actress', 'actor', 'actor']

for fir, la, rol in zip(first, last, role):
    print(fir+' '+la+': '+rol)
print('\n')

## Enumerate over the index of an iterable object and that object's elements
for j, fir in enumerate(first):
    print(fir+': '+str(j))

Will Smith: actor
Jennifer Lopez: actress
Matt Damon: actor
Tom Hanks: actor


Will: 0
Jennifer: 1
Matt: 2
Tom: 3


<h2> Functions </h2>

<h3> "def" syntax </h3>

<p>
**Python functions defined using def syntax can support \*args and \*\*kwargs inputs. These are not covered here. Though they are very useful and worth looking into, they are not necessary to understand for the sake of reading and understanding most of Python documentation on the web.**
</p>

In [149]:
## Define a function "square" with input argument x. x can be literally any Python object. If a function returns 
## errors based on the input object, it is due to the incompatability of the object with the operations performed 
## within the function. The multiline comments are not necessary, but allow for your function to be auto-documented
## (see below).
def square(x):
    """
    This function returns the square of the input. Requires that the input be a float, int, NumPy array, or the
    like.
    """
    sqr = x**2
    return sqr

## Functions can have multiple arguments, and even multiple output arguments.
def fun(x, y, word):
    """
    Returns the sum of the numbers x and y, and the concatenation of the string word and the string typecasts of x
    and y.
    """
    return x+y, word+str(x)+str(y)

## Functions can even have no input arguments, and no output arguments.
def approval():
    print('Approved.\n')
    
## When a function is called without input arguments, it is called using a set of empty parenthesis like so:
approval()

## Functions can be called without catching their output:
square(5.0)
## Or whilst catching output. Note that the variable name used below is the same as a variable name used inside the
## definition of "square." Because sqr was only defined within the scope of the function above, by reusing the same
## variable name here, I am not at all overriding any variable definitions as one variable was defined in the scope
## of a function (above) and one was defined in the global scope (below).
sqr = square(5.0)

## When a function returns multiple outputs, the multiple arguments have to be caught in one of the following ways.

## If caught using a single variable, that variable is initialized as a tuple containing all the function outputs.
a = fun(3.0, 4.0, 'cabbage')
print(a)
print('\n')

## If caught using multiple variables, there must be as many variables as there are output arguments.
num, new_word = fun(3.0, 4.0, 'cabbage')
print(num)
print(new_word)
print('\n')

## If you do not wish to catch all output arguments, you can choose to not catch a specific variable by "assigning"
## it to the "_" character
_, new_word = fun(3.0, 4.0, 'cabbage')

## For all intents and purposes, functions can be manipulated and moved around as variables. They can be elements of
## tuples and lists, and can even be passed in as arguments to other functions. The following function takes a
## function as input, and returns a new function as output.
def cube_function(fun):
    def new_function(x):
        return fun(x)**3
    return new_function

## Function inputs can be assigned default values, such that they do not require an input argument. Notive that all
## such arguments have to be defined, in the the function definition, after the non-defaulted arguments
def number_string(base, power=2, co_summand=5.0, printt=True):
    """
    Note that this function prints and has zero output arguments if printt is True, and does not print and has one
    output argument if printt is False.
    """
    numberstr = str(base**power+co_summand)
    if not printt:
        return numberstr
    else:
        print(numberstr)

## Calling this function with different input combinations. Note that the inputs with defalt input arguments can be
## input in any order.
number_string(2.0, 3.0, 4.2, False)
## Same as:
number_string(2.0, power=3.0, co_summand=4.2, printt=False)
## Same as:
number_string(2.0, printt=False, co_summand=4.2, power=3.0)
## Also valid:
number_string(3.0)
number_string(4.0, printt=False)
number_string(5.0, co_summand=10)

Approved.

(7.0, 'cabbage3.04.0')


7.0
cabbage3.04.0


14.0
35.0


<h3> Anonymous (lambda) functions </h3>

In [150]:
## Lambda functions serve as quick, one-line ways of defining simple functions. You will see them a bit in Python
## module documentation.

## This function:
def funn(x, y):
    return x+y
## is identical to this one:
funn = lambda x,y: x+y

## Same for these two:
def funn2(x):
    return x**2
funn2 = lambda x: x**2

<h3> Function Documentation </h3>

<p>
Often, nothing is more helpful than reading Python package tutorials and documentation online. However, one can access the documentation for a class initializer, class method, or a function using the "help" function.
</p>

In [151]:
## See documentation for SciPy's factorial implementation. Note that a function name, without the parentheses after
## the function name, refers to the function as a Python object, and not as a call to the function.
from scipy.special import factorial
help(factorial)

Help on function factorial in module scipy.special.basic:

factorial(n, exact=False)
    The factorial of a number or array of numbers.
    
    The factorial of non-negative integer `n` is the product of all
    positive integers less than or equal to `n`::
    
        n! = n * (n - 1) * (n - 2) * ... * 1
    
    Parameters
    ----------
    n : int or array_like of ints
        Input values.  If ``n < 0``, the return value is 0.
    exact : bool, optional
        If True, calculate the answer exactly using long integer arithmetic.
        If False, result is approximated in floating point rapidly using the
        `gamma` function.
        Default is False.
    
    Returns
    -------
    nf : float or int or ndarray
        Factorial of `n`, as integer or float depending on `exact`.
    
    Notes
    -----
    For arrays with ``exact=True``, the factorial is computed only once, for
    the largest input, with each other result computed in the process.
    The output dtype is in

In [152]:
## See documentation on our functions defined above
help(square)
help(fun)

Help on function square in module __main__:

square(x)
    This function returns the square of the input. Requires that the input be a float, int, NumPy array, or the
    like.

Help on function fun in module __main__:

fun(x, y, word)
    Returns the sum of the numbers x and y, and the concatenation of the string word and the string typecasts of x
    and y.



<h3> Passing objects of mutable type into functions </h3>

<p>
If a mutable object is input into a function, the operations that are applied to the object within the function permanently alter the object, even outside the scope of the function. For this reason, the "copy" method of mutable objects becomes very handy when calling functions on mutable objects.
</p>

In [178]:
## Define function that appends a letter to a list
listt = ['x', 'y', 'z']
def appender(listtt):
    listtt.append('a')
    print(listtt)
for _ in range(5):
    appender(listt)
print('\n')

## Now using the copy method:
listt = ['x', 'y', 'z']
def appender(listtt):
    lizt = listtt.copy()
    lizt.append('a')
    print(lizt)
for _ in range(5):
    appender(listt)

['x', 'y', 'z', 'a']
['x', 'y', 'z', 'a', 'a']
['x', 'y', 'z', 'a', 'a', 'a']
['x', 'y', 'z', 'a', 'a', 'a', 'a']
['x', 'y', 'z', 'a', 'a', 'a', 'a', 'a']


['x', 'y', 'z', 'a']
['x', 'y', 'z', 'a']
['x', 'y', 'z', 'a']
['x', 'y', 'z', 'a']
['x', 'y', 'z', 'a']


<h2> Lists and Tuples </h2>

<p>
Lists and tuples are Python's primary array implementation. Though NumPy arrays and Pandas DataFrames can also handle diverse data types, lists and tuples, at the expense of limited methods and integer indexing, make up for their simplicity in being relatively lightweight and greater permissivity in the handling of input objects.
</p>

In [155]:
## Note that the misspellings below are necessary. Else, I would overwrite the typecasting/initializing functions
## "list" and "tuple" from native Python.

## Generate a list using "[ ]"
listt = ['a', 'b', 'c']
## Generate a tuple using "( )"
toople = ('a', 'b', 'c')

## Access the first element of listt and the second and third of toople. Notice that Python is a zero-indexed
## language.
print(listt[0])
print(toople[1])
print(toople[2])

## Get the length of a list/tuple returned as an int
print(len(listt))
print(len(listt) == len(toople))
print('\n')

## The "+" operator is overwritten for lists and tuples, to indicate creating a new list/tuple that is the
## concatenation of the first two
letters = toople + ('c', 'd')
more_letters = listt + ['x', 'y', 'z']
print(letters)
print(more_letters)

a
b
c
3
True


('a', 'b', 'c', 'c', 'd')
['a', 'b', 'c', 'x', 'y', 'z']


In [156]:
## It is possible to iterate over the objects in a list/tuple without explicitly referring to the list/tuple's index
## This...
for j in range(len(listt)):
    print(listt[j])

## ...is equivalent to:
for letter in listt:
    print(letter)
    
## The "*" operator is specially interpreted in the case of lists and tuples to indicate the repetition of the
## list/tuple
a = ['methyl', 'ethyl']
print(a*6)
b = ('propyl', 'butyl')*2
print(b)

a
b
c
a
b
c
['methyl', 'ethyl', 'methyl', 'ethyl', 'methyl', 'ethyl', 'methyl', 'ethyl', 'methyl', 'ethyl', 'methyl', 'ethyl']
('propyl', 'butyl', 'propyl', 'butyl')


<h3> List/tuple comprehensions </h3>

<p>
In Python, there exists syntax which allows for the easy creation of a list or tuple from the elements of a pre-existing iterable (integer-indexed) object. Knowing how to implement these is not only very useful for cutting down time spent writing code; it is also necessary for reading a good deal of Python documenation on the web.
</p>

In [157]:
## Generate a list of random numbers between 0.0 and 1.0 using NumPy
from numpy import random, log10
noise = random.rand(100)
## Create a list of the common logarithms of these random numbers
comLogs = [log10(num) for num in noise]

## Create a tuple of the common logarithms of the numbers in noise THAT ARE SMALLER THAN 0.5.
## Notice that the parentheses alone are not interpreted as the generation of a tuple; the verbose "tuple" function
## has to be used.
some_comLogs = tuple(log10(num) for num in noise if num < 0.5)

<h3> Mutability </h3>

<p>
The biggest difference between lists and tuples is that lists are **mutable**, while tuples are **immutable**. Once a tuple is initialized, the object cannot at all be changed. This allows the tuple to be used in certain scenarios where constancy is crucial, such as in the keys of a Python dictionary (see below). Lists, on the other hand, can be changed; the object at any given index can be replaced, an object can be removed or inserted, etc. Certain properties of lists that demonstrate their immutability are shown below.
</p>

In [158]:
## Generate list
listt = ['a', 'b', 'c']
print(listt)
## append to list
listt.append('d')
print(listt)
## insert object at index 2 of list
listt.insert(2, {})
print(listt)
## remove object as index 1 of list. Notice that the "pop" method 
popped = listt.pop(1)
print(listt)
print('popped: '+str(popped))
print('\n')

## If I initialize listt2 as equal to listt, what actually happens is that listt2 and listt are both interpreted to
## point to the same object in physical memory. This is not the case for tuples, ints, floats, and other immutable
## typles. So, if I initialize listt2 as equal to listt...
listt2 = listt
## ...and make modifications to listt2...
listt2.append('extra')
## ...because listt2 and listt both point to the same object in physical memory, they are both altered.
print(listt)
print('\n')

## Fortunately, all mutable types offered in native Python boast a "copy" method, which allows for the mutable
## object to actually be duplicated. This duplicate can be modified without affecting the original object.
listt3 = listt.copy()
listt3.append('also_extra')
print(listt)

['a', 'b', 'c']
['a', 'b', 'c', 'd']
['a', 'b', {}, 'c', 'd']
['a', {}, 'c', 'd']
popped: b


['a', {}, 'c', 'd', 'extra']


['a', {}, 'c', 'd', 'extra']


<h2> Dictionaries </h2>

<p>
Dictionaries are the built-in Python implementation of a hash table. Dictionaries consist of **items** and **keys**. An item is inserted into the dictionary using a key, such that the item can be called from the dictionary using the key. Generally, it is computationally expensive to initialize or modify a dictionary, at least relative to lists, tuples, and the like. The utility in using them, however, lies in that dictionaries allow you to access items stores in them in *O(1)* time. It makes searching for the item you are looking for much faster than having to iterate over the indices of lists and tuples.
</p>

In [159]:
## Initialize empty dictionary:
book = dict()
## or, equivalently,
book = {}

## Items are inserted into the dictionary in the following manner
key = 'key'
item = 'item'
book[key] = item

## Dictionary items can be any Python onject, including class objects introduced by imported packages or the user.
## Dictionary keys, however, cannot be any time of object. They have to be objects of immutable type (see "Lists
## and Tuples" above). What this implies practically is that if you can't currently use the native Python object you
## want as a dictionary key, you can always typecast it into a native object that can be used as a key.

## Can't be used as a dictionary key
squares = [j**2 for j in range(1,11)]
## Can be used as a dictionary key
cubes = (j**3 for j in range(11))
## Typecast squares to a tuple so it can be used as a dictionary key
square_tuple = tuple(squares)

## Show off the versatility of dictionaries by inserting keys and items of various type
book[square_tuple] = squares
book['Promethius'] = 'bad movie'
book[78.9] = {}
for j in range(5):
    book['key'+str(j)] = j+5

## Get list of all dictionary keys. Note that these keys are NOT in the order in which we inserted them.
## Dictionary keys are ordered randomly, as hash maps depend on randomization to enable O(1) amortized runtime of 
## finding items.
print(book.keys())

dict_keys(['key3', (1, 4, 9, 16, 25, 36, 49, 64, 81, 100), 'Promethius', 'key0', 'key1', 'key2', 'key', 'key4', 78.9])


In [160]:
## Get all key-item pairs in the dictionary
print(book.items())

dict_items([('key3', 8), ((1, 4, 9, 16, 25, 36, 49, 64, 81, 100), [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]), ('Promethius', 'bad movie'), ('key0', 5), ('key1', 6), ('key2', 7), ('key', 'item'), ('key4', 9), (78.9, {})])


In [161]:
## Trying to return the value of a key you have not set yet raises a KeyError
book['missing']

KeyError: 'missing'

In [162]:
## You can construct dictionaries using two iterable objects using the zip function
from numpy.random import randn
noise = randn(5)
names = ['john', 'tom', 'valerie', 'donald', 'susan']
book2 = dict(zip(names,noise))
print(book2.items())

dict_items([('susan', 0.54749716225056855), ('tom', -0.88496032238588229), ('valerie', -0.72388660769207702), ('donald', -0.55601910065312399), ('john', -1.2695674020765195)])


<h2> NumPy (Numerical Python) and SciPy </h2>

<p>
<a href="https://docs.scipy.org/doc/">NumPy</a> is a great package for any and all manipulation of tensors and numbers. The package consists of many modules that might be useful for a given project, but what is most important about this package is its implementation of tensors (arrays). NumPy arrays are generally the fastest way to store numerical values, considering both array generation time and value access time. They are also one of the most commonly used Pythonic data structures outside native Python, and so are sometimes necessary to interface with other Python modules. <br/> <br/>

<a href="https://docs.scipy.org/doc/">SciPy</a> can be thought of as an extension to NumPy. While it boasts functions which span more statistical applications than NumPy, many of these functions are dependent upon input of NumPy array type.
</p>

In [164]:
import numpy as np

<h3> Generate tensors of arbitrary degree. </h3>

In [165]:
## Create a 5-vector of zeros
a = np.zeros(5)
## Create a 3x5 matrix of zeros
A = np.zeros((3,5))
## Create a 3x4x5x6 4-tensor of ones
AA = np.ones((3,4,5,6))

## Multiply a tensor by a number
B = 5.0*A
## Add a number to each element of a tensor
B = 5.0+A
## Add two tensors of identical shape together
C = A+B

## The shape attribute of NumPy arrays describes the degree of the tensor, as well as each degree's dimensionality.
## The size attribute tells the total number of elements in the tensor. Calling the function "len" on a NumPy array
## returns the dimensionality of the tensor's first degree.
print(C.shape)
print(C.size)
print(len(C))
print('\n')

## It is possible, and sometimes helpful, to generate NumPy arrays from lists.
prerow = [j for j in range(6)]
row = np.array(prerow)
print(row.shape)
prematr = [[j for j in range(6)]]*4
matr = np.array(prematr)
print(matr.shape)

(3, 5)
15
3


(6,)
(4, 6)


<h3> Matrix Operations </h3>

<p>
Operations between NumPy arrays default to elementwise operations. To perform matrix operations, specific functions from NumPy or SciPy have to be implemented.
</p>

In [166]:
## Take the dot (matrix) product of two arrays
## 3x3 identity matrix, times 4.0
I = 4.0*np.eye(3)
## 3x2 matrix; the reshape method returns an array of shape (3,2) by iterating the indices of the original array in
## order
A = np.array([j for j in range(6)]).reshape((3,2))
C = np.dot(I,A)
print(C)
print('\n')

## Dot product of I and a vector
print(np.dot(I,np.array([j for j in range(3)])))
print('\n')

## Matrix exponential of the 3x3 identity matrix
from scipy.linalg import expm
print(expm(np.eye(3)))

[[  0.   4.]
 [  8.  12.]
 [ 16.  20.]]


[ 0.  4.  8.]


[[ 2.71828183  0.          0.        ]
 [ 0.          2.71828183  0.        ]
 [ 0.          0.          2.71828183]]


<h2> Pandas </h2>

<p>
The most popular platform for tabulated data structures (primarily, pandas.DataFrame and pandas.Series objects). Interfaces very well with NumPy, and can be used to easily generate NumPy arrays from Pandas DataFrames. <a href="http://pandas.pydata.org/">Pandas</a> tabulated structures make for very quick writing of Python on the human end, but operate using an indexing scheme that runs much slower than the scheme used for NumPy arrays. It is advised to use Pandas for things like making figures for publications, or for generating NumPy arrays from .csv files at the start of an MCMC simulation. <br/> <br/>

**Note that Pandas DataFrames/Series are mutable data types. Because Jupyter notebooks run as a single Python session, from a single kernel that can only be restarted manually (see the kernel tab at the top of the notebook), you will have to try and go through the Pandas portion of this notebook linearly. Jumping out of order between cells in this part of the notebook will likely cause a lot of errors to be thrown.**
</p>

In [167]:
import pandas as pd
import numpy as np

<h3> Generate Pandas DataFrame from NumPy arrays </h3>

In [168]:
## Generate a 2-D array of random numbers
array = np.random.rand(5, 20)
## Create pandas.DataFrame object from 2-D array; can also be done with 1-D arrays
frame = pd.DataFrame(array)
print(frame)

         0         1         2         3         4         5         6   \
0  0.849432  0.759391  0.407645  0.483247  0.442975  0.832181  0.362157   
1  0.440911  0.322243  0.739795  0.111783  0.563349  0.324110  0.838508   
2  0.778631  0.014049  0.724021  0.112391  0.552409  0.027593  0.581519   
3  0.796457  0.039300  0.372154  0.992089  0.563488  0.636457  0.730779   
4  0.736914  0.173529  0.405944  0.870869  0.033657  0.963024  0.739559   

         7         8         9         10        11        12        13  \
0  0.777904  0.970598  0.852417  0.073762  0.644564  0.074347  0.961431   
1  0.615184  0.197405  0.967027  0.387898  0.770207  0.607748  0.023078   
2  0.811544  0.261400  0.970310  0.298188  0.304584  0.727953  0.602061   
3  0.970794  0.098488  0.626079  0.782566  0.037819  0.845896  0.358908   
4  0.940630  0.911218  0.471237  0.435394  0.085970  0.123525  0.054743   

         14        15        16        17        18        19  
0  0.374813  0.787383  0.431750  0

<h3> General DataFrame Properties </h3>

In [169]:
## All DataFrames have an index (the "label-less" column to the far left in the output above)
print(frame.index)

RangeIndex(start=0, stop=5, step=1)


In [170]:
## This index can be renamed, and even made into a new column using the reset_index method
frame.index.name = 'iteration'
frame.reset_index(inplace=True)
print(frame)

   iteration         0         1         2         3         4         5  \
0          0  0.849432  0.759391  0.407645  0.483247  0.442975  0.832181   
1          1  0.440911  0.322243  0.739795  0.111783  0.563349  0.324110   
2          2  0.778631  0.014049  0.724021  0.112391  0.552409  0.027593   
3          3  0.796457  0.039300  0.372154  0.992089  0.563488  0.636457   
4          4  0.736914  0.173529  0.405944  0.870869  0.033657  0.963024   

          6         7         8    ...           10        11        12  \
0  0.362157  0.777904  0.970598    ...     0.073762  0.644564  0.074347   
1  0.838508  0.615184  0.197405    ...     0.387898  0.770207  0.607748   
2  0.581519  0.811544  0.261400    ...     0.298188  0.304584  0.727953   
3  0.730779  0.970794  0.098488    ...     0.782566  0.037819  0.845896   
4  0.739559  0.940630  0.911218    ...     0.435394  0.085970  0.123525   

         13        14        15        16        17        18        19  
0  0.961431  0.374

In [171]:
## The columns in a Pandas DataFrame are ordered, and so the (ordered) names of these columns can be accessed as a
## list through the DataFrame's columns attribute.
print(frame.columns)

Index(['iteration',           0,           1,           2,           3,
                 4,           5,           6,           7,           8,
                 9,          10,          11,          12,          13,
                14,          15,          16,          17,          18,
                19],
      dtype='object')


In [172]:
## Pandas allows for the column names to be reset easily using "=". As long as the object used to rename the columns
## is iterable, and of the same length as the current list of columns, Pandas will typecast a copy of the object
## into the Index type to use as column names.
frame.columns = ['iter']+[chr(num+97) for num in range(20)]
print(frame.columns)

Index(['iter', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't'],
      dtype='object')


In [173]:
## Access column by column name; very similar to dictionary syntax. This returns a Series object. Series objects are
## by and large incredibly similar to DataFrames, with many of the same attributes and methods. The differences
## between them will not be discussed in this tutorial.
print(frame['s'])

0    0.786857
1    0.008125
2    0.544678
3    0.212315
4    0.038343
Name: s, dtype: float64


In [174]:
## Columns can be removed from a DataFrame using the drop method. The drop method is not like the append method for
## lists, in that while the append method alters the list, the drop method returns a new DataFrame that does not
## point back to the original DataFrame. If you want to permanently drop the column from the DataFrame,
## re-initialize the original DataFrame as the output of the drop method, or set the inplace argument of the drop
## method equal to True

## This method includes an "axis" argument; axis=0 referes to rows, while axis=1 refers to columns. Defaults to
## axis=0.

## Drop a single row
frame.drop(2)
## Drop multiple rows
frame.drop(list(range(3)))
## Drop single column
frame.drop('a', axis=1)
## Drop multiple columns
frame.drop(['a', 'b', 'c'], axis=1)
## PERMANENTLY drop multiple columns (same as "frame.drop(['a', 'b', 'c'], axis=1, inplace=True)")
frame = frame.drop(['a', 'b', 'c'], axis=1)
print(frame)

   iter         d         e         f         g         h         i         j  \
0     0  0.483247  0.442975  0.832181  0.362157  0.777904  0.970598  0.852417   
1     1  0.111783  0.563349  0.324110  0.838508  0.615184  0.197405  0.967027   
2     2  0.112391  0.552409  0.027593  0.581519  0.811544  0.261400  0.970310   
3     3  0.992089  0.563488  0.636457  0.730779  0.970794  0.098488  0.626079   
4     4  0.870869  0.033657  0.963024  0.739559  0.940630  0.911218  0.471237   

          k         l         m         n         o         p         q  \
0  0.073762  0.644564  0.074347  0.961431  0.374813  0.787383  0.431750   
1  0.387898  0.770207  0.607748  0.023078  0.345216  0.394744  0.286500   
2  0.298188  0.304584  0.727953  0.602061  0.553236  0.972256  0.439532   
3  0.782566  0.037819  0.845896  0.358908  0.766832  0.239746  0.418213   
4  0.435394  0.085970  0.123525  0.054743  0.188917  0.620572  0.160770   

          r         s         t  
0  0.828225  0.786857  0.481

In [175]:
## If I permanently drop a row from a DataFrame...
frame.drop(2, inplace=True)
print(frame)
print('\n')
## ...I can reset the index of the DataFrame using the reset_index method. This method contains an "inplace"
## argument that defaults to False. Setting this to True alters the original DataFrame. The method also contains a
## "drop" argument, which defaults to False. When inplace=True and drop=False, the original DataFrame's index is
## reset, AND a new column replicating the original index is inserted into the DataFrame. Setting inplace=True and 
## drop=True resets the index of the original DataFrame without creating a new column.
frame.reset_index(inplace=True, drop=True)
print(frame)

   iter         d         e         f         g         h         i         j  \
0     0  0.483247  0.442975  0.832181  0.362157  0.777904  0.970598  0.852417   
1     1  0.111783  0.563349  0.324110  0.838508  0.615184  0.197405  0.967027   
3     3  0.992089  0.563488  0.636457  0.730779  0.970794  0.098488  0.626079   
4     4  0.870869  0.033657  0.963024  0.739559  0.940630  0.911218  0.471237   

          k         l         m         n         o         p         q  \
0  0.073762  0.644564  0.074347  0.961431  0.374813  0.787383  0.431750   
1  0.387898  0.770207  0.607748  0.023078  0.345216  0.394744  0.286500   
3  0.782566  0.037819  0.845896  0.358908  0.766832  0.239746  0.418213   
4  0.435394  0.085970  0.123525  0.054743  0.188917  0.620572  0.160770   

          r         s         t  
0  0.828225  0.786857  0.481740  
1  0.497793  0.008125  0.698064  
3  0.514652  0.212315  0.573292  
4  0.322543  0.038343  0.889319  


   iter         d         e         f         

In [176]:
## The syntax for inserting a new column into a Pandas DataFrame is similar to the insertion of new items into
## dicts. New columns have to be the same length as the DataFrame index.
frame['subject_name'] = ['Ron', 'Gwen', 'Heather', 'Oswald']
print(frame)

   iter         d         e         f         g         h         i         j  \
0     0  0.483247  0.442975  0.832181  0.362157  0.777904  0.970598  0.852417   
1     1  0.111783  0.563349  0.324110  0.838508  0.615184  0.197405  0.967027   
2     3  0.992089  0.563488  0.636457  0.730779  0.970794  0.098488  0.626079   
3     4  0.870869  0.033657  0.963024  0.739559  0.940630  0.911218  0.471237   

          k         l         m         n         o         p         q  \
0  0.073762  0.644564  0.074347  0.961431  0.374813  0.787383  0.431750   
1  0.387898  0.770207  0.607748  0.023078  0.345216  0.394744  0.286500   
2  0.782566  0.037819  0.845896  0.358908  0.766832  0.239746  0.418213   
3  0.435394  0.085970  0.123525  0.054743  0.188917  0.620572  0.160770   

          r         s         t subject_name  
0  0.828225  0.786857  0.481740          Ron  
1  0.497793  0.008125  0.698064         Gwen  
2  0.514652  0.212315  0.573292      Heather  
3  0.322543  0.038343  0.88931

<h3> Also worth exploring with Pandas: </h3>

<ul style="list-style-type:disc">
  <li>The "from_csv" function in the Pandas model; creates DataFrames straight from .csv files</li>
  <li>The "apply" methods of both the DataFrame and Series classes</li>
  <li>Logical indexing (very difficult, and seen a lot in my and Dr Meyer's code; willing to explain this whenever it comes up).</li>
  <li>The "melt" method for DataFrames. Difficult to use, but very helpful when iterating over a compound index.</li>
</ul>