<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#General-Python" data-toc-modified-id="General-Python-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>General Python</a></span><ul class="toc-item"><li><span><a href="#Strings" data-toc-modified-id="Strings-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Strings</a></span></li><li><span><a href="#Lists" data-toc-modified-id="Lists-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Lists</a></span></li><li><span><a href="#Dictionaries" data-toc-modified-id="Dictionaries-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Dictionaries</a></span></li><li><span><a href="#Packages" data-toc-modified-id="Packages-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Packages</a></span></li><li><span><a href="#Loops" data-toc-modified-id="Loops-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Loops</a></span></li></ul></li><li><span><a href="#Numpy" data-toc-modified-id="Numpy-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Numpy</a></span></li><li><span><a href="#Matplotlib" data-toc-modified-id="Matplotlib-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Matplotlib</a></span></li><li><span><a href="#PANDAs" data-toc-modified-id="PANDAs-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>PANDAs</a></span></li></ul></div>

# Python tome

Collection of syntax and notes for Python 3 and key modules for data science.

![title](https://regmedia.co.uk/2017/08/23/heftytome.jpg)

## General Python

In [None]:
print(5/8) #In Python 3, the print statement requires bracketing

In [None]:
help(max) # to get some help on a function

In [None]:
??max # alternative way of getting help, in jupyter, brings up docstring in seperate window

In [None]:
savings = 100 # Declaring variables

In [None]:
result = 100 * 1.10 ** 7

In [None]:
# Convert int/float to strings when embedding in a string statement
# Use a backslash and carriage return after an operand to split print statements
print("I started with $" + str(savings) + " and now have $" + \
      str(result) + ". Awesome!")

In [None]:
# Unlike int and bool, a list is a compound data type
my_list = ["my", "list", 23, True]

In [None]:
type(my_list) # use type to identify the type of object

In [None]:
# you can call multiple commands with a semicolon
print('hello'); print(3+5)

### Strings

In [None]:
room = "poolhouse"

In [None]:
print(room.count('o')) # strings come with their own methods

### Lists

In [None]:
my_list = [2,5,9,10,12,14]

In [None]:
my_list[1] # print out second element

In [None]:
my_list[-2] # print out second element from end of list

In [None]:
my_list[1:6] # slicing: start index is included, end index is not

In [None]:
my_list[2:] # slicing: index from start until the end

In [None]:
my_list[-1] = 10.5 # replacing list elements

In [None]:
del(my_list[2:]) # delete list entries

When making copies of a list, it is best to use list()

In [None]:
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

In [None]:
areas_copy = list(areas)

In [None]:
areas_copy = areas # Changes to one list will copy over to the other

Lists have their own methods

In [None]:
print(areas.index(20,0)); print(areas.count(14.5))

In [None]:
areas.append(24.5) # add to a list
print(areas)

In [None]:
areas.reverse() # reverse
print(areas)

Iterating over lists

### Dictionaries

Dictionaries are used to link multiple lists together. Difference between lists and dictionaries are: 
- lists are indexed by range of numbers
- dictionaries are indexed by unique keys

In [None]:
my_dict = {
   "key1":"value1",
   "key2":"value2",
}
print(my_dict)

In [None]:
my_dict.keys() # keys method

Keys have to be immutable (they can't be changed), e.g. they can't be lists

In [None]:
my_dict['key3'] = 'value3' #adding keys
print(my_dict)

### Packages

In [None]:
import math # import all functionality available in a package

In [None]:
from math import radians # just one function

In [None]:
from scipy.linalg import inv as my_inv # just one function with an alias

Collection of useful packages

In [None]:
import math

### Loops

In [None]:
area = 12

In [None]:
# If, Else, Elif syntax
if area > 10:
    print('Big place!')
elif area > 5:
    print('Medium-sized')
else:
    print('Small sized!')

In [None]:
while area > 0: # While loops are repeated if statements.
    print(area)
    area = area - 1

In [None]:
# for loop syntax
my_list = [1,2,3,4]
for element in my_list:
    print(element)

You can use for loops for iteration

In [None]:
# You can iterate using multiple parameters over rows with enumerate
heights = [1.73, 1.6, 1.82, 1.92]
for index, height in enumerate(heights): # enumerate produces the index
    print(index, height)

In [None]:
# dictionary
for key, val in my_dict.items():
    print(key,val) # strictly named

## Numpy

Python, by default, doesn't know how to make calculations on lists as a whole.
It is more elegant and quicker to use NumPy to perform calculations
element-wise with 'ndarrays' - fast and efficient multidimensional array
objects. The numpy library also has linear algebra operations, random number
generators, fourier transforms and other mathematical functions.

In [None]:
import numpy as np

In [None]:
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
np_baseball = np.array(baseball) # use np.array to create a Numpy array

Numpy arrays can only contain one type - they are homogenous arrays. This also makes speeds up calculations.

In [None]:
np_weight = np.array([87,62,90,102])
np_height = np.array([1.94,1.8,2,1.99])
bmi = np_weight/(np_height**2) # perform listwise calculations
bmi

In [None]:
light = bmi < 21; bmi[light] # you can subset with boolean checks

In [None]:
np_weight[1:3] # subsetting with indexes

In [None]:
# Boolean functions
np.logical_and(bmi >= 20,bmi <= 25)
# same with np.logical_or()

In [None]:
# you can create 2d numpy arrays from a python list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]
np_baseball = np.array(baseball)
np_baseball

Iterating over Numpy arrays

In [None]:
for val in np.nditer(np_baseball):
    print(val)

In [None]:
np_baseball.shape # dimensions of array

In [None]:
np_baseball[3] # print out 4th row

In [None]:
np_baseball[:,1] # entire second column

Numpy includes statistical functions

In [None]:
np.mean(np_height)

In [None]:
np.median(np_height)

Numpy is great for simulating pseudo-random numbers. They are pseudo-random as they are based on mathematical formulae that aims to achieve randomness as best as possible.

In [None]:
# sets the random seed, so that your results are the reproducible between simulations
np.random.seed(123)
np.random.rand()

##  Matplotlib

In [None]:
import matplotlib.pyplot as plt # the subpackage pyplot is of most importance

In [None]:
x = [1,7,6,9,10]
y = [2,6,4,4,6]
plt.plot(x,y) # line plot
plt.show()

When calling the plt.plot() command, the plot won't show until .show() is called. This allows you to add features to the plot before calling it.

In [None]:
plt.clf() # to clean up a plot

Scatterplots

In [None]:
# use s to specify size of dots, c for colour, alpha for opacity
plt.scatter(x,y,s=,c=,alpha=0.8)

In [None]:
plt.xscale('log') # display horizontal axis in log scale, useful when exploring correlation

Histograms

In [None]:
plt.hist(x,bins=) # histogram, specify number of bins if needed

Customising plots

In [None]:
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Title')

In [None]:
# points to where the ticks are, and what the labels are
plt.yticks([0,1,2], ["one","two","three"])
plt.text(x,y,'label') # label data

## PANDAs

pandas blends the high-performance, array-computing ideas of Numpy with
the flexibile data manipulation capabilities of spreadsheets and relational
databases. The two primary objects are:
- DataFrame: tabular, column-oriented data structure with both row and column
labels.
- Series: a one-dimensional labelled array object

Difference between two-dimensional numpy array vs Pandas dataframe: np array can only hold one data type, pandas allows multiple (more like a spreadsheet)

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(my_dict) # creating a dataframe from a dictionary
df.index = labels # set index

In [None]:
pd.read_csv('data.csv') # reading in csv file, assuming data is in same folder

In [None]:
df['column_name'] # return series of column

In [None]:
df[['column_name']] # return column but keeping it a DataFrame

In [None]:
df[['column_1','column_2']] # return subset of dataframe for the selected cols

In [None]:
df[1:4] # return 2nd, 3rd and 4th rows - row access

There are two ways of slicing dataframes:
- loc - label-based
- iloc - integer position-based

.loc

In [None]:
df.loc['row_name'] # returns row as series

In [None]:
df.loc[['row_name']] # returns row as dataframe

In [None]:
df.loc[['row_1','row_2'],['col_1','col_2']] # returns intersection as a dataframe

In [None]:
df.loc[:,['col_1','col_2']] # as above but all rows

.iloc

In [None]:
df.iloc[3,0] # print out 4th row and first column

In [None]:
df.iloc[[3,4],0] # print out 4th and 5th row, and first column

In [None]:
df.iloc[[3,4],[0,1]] # 4th and 5th row, first 2 columns

In [None]:
df.iloc[:,2] # all rows, 3rd column

Iterating through dataframes

In [None]:
# for pd.dataframe, you can have to use iterrows, this gives label and entire
# series for a row
for lab, car in cars.iterrows(): # iterrrows() is also a panda series
    print(lab) # prints out row label
    print(car) # prints out entire series

You can create new columns with a for loop and iterrows(). Alternatively, a quicker way is to use the .apply() method

In [None]:
brics["name_length"] = brics["country"].apply(len)
brics["name_length"] = brics["country"].apply(str.upper) # for string methods