
## Table of Contents

### Numpy/Pandas
1. [Numpy](#arr)
<br><br>
2. [Pandas](#pd)
<br><br>


### Extra
3. [General Tips](#gt)
<br><br>
4. [Extra Tricks](#et)
<br><br>
5. [Syntax](#syn)
<br><br>
6. [Terminology](#tr)
<br><br>
7. [Useful Functions](#uf)

# Arrays
<a id="arr"></a>

In [4]:
import numpy as np

In [226]:
x = arr[:,1]
y = arr[:,2]

In [227]:
np.array([x,y])    # new array based on two arrays or lists


array([[ 58, 195],
       [603, 133]])

In [117]:
list(np.array([1,2,3]))

[1, 2, 3]

In [128]:
np.zeros((2,3,4,5)) + 5       
# 2 - objects in the first(0th) dimension (axis),
# 3 - objects in the second(1st) dimension (axis),
# 4 - objects in the third(2nd) dimension (axis),
# 5 - objects in the fourth(3rd) dimension (axis)


array([[[[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]],

        [[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]],

        [[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]]],


       [[[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]],

        [[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]],

        [[5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.],
         [5., 5., 5., 5., 5.]]]])

In [125]:
np.ones((2,3,4)) * 5    # specify shape of desired array

array([[[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]],

       [[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]]])

In [135]:
np.empty(5, dtype=float)[0]     # if single value is passed for the shape, 1d array is created

4.6425296154988e-310

In [90]:
type(np.array([1,2]))      # type of an array is ndarray

numpy.ndarray

In [145]:
np.array([1.,2.]).dtype     # dtype shows the type of the underlying elements

dtype('float64')

In [146]:
np.array([1.,2.]).astype(int)   # converts all elements to desired type

array([1, 2])

In [92]:
arr = np.ones((2,3))       # shape shows you the dimensions of the array
arr.shape

(2, 3)

In [93]:
arr.size                    # size shows the total number of elements

6

In [152]:
arr.shape

(2, 3)

In [16]:
arr.reshape(3,2)             # reshape reconfigures the array to new dimenstions
                             # you can always reshape into a new shape if the product of the old dimensions
                             # equals the product of the new dimensions

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [154]:
arr

array([[0.01374306, 0.44403838, 0.59207462],
       [0.12455101, 0.17375854, 0.92589919]])

In [153]:
arr[0]                       # indexing with a single value into a 2d array returns a 1d array

array([0.01374306, 0.44403838, 0.59207462])

In [19]:
arr[0,0]                     # indexing with two values into a 2d array returns a value at that position

1.0

In [155]:
arr[0:2,:1]                  # slicing works the same way as with lists

array([[0.01374306],
       [0.12455101]])

In [156]:
np.sum(arr)                  # sum returns the sum of the entire array

2.2740648058462316

In [157]:
arr

array([[0.01374306, 0.44403838, 0.59207462],
       [0.12455101, 0.17375854, 0.92589919]])

In [162]:
np.sum(arr, axis=1) # summing along a desired axis returns an array of the shape as the other axis

array([1.04985606, 1.22420874])

In [160]:
np.sum(arr[:,1])     # summing along a specific column requiers no axis specified   

0.6177969296070263

In [161]:
np.sum(arr, axis=0)[1]  # sum all columns return just one

0.6177969296070263

In [64]:
np.arange(1,101) * 2         # arange creates an array of numbers similar to the range function with list comprehension

array([  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,  26,
        28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,  52,
        54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,  78,
        80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102, 104,
       106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130,
       132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156,
       158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182,
       184, 186, 188, 190, 192, 194, 196, 198, 200])

In [163]:
arr<0                     # we can use boolean conditions on arrats

array([[False, False, False],
       [False, False, False]])

In [164]:
arr

array([[0.01374306, 0.44403838, 0.59207462],
       [0.12455101, 0.17375854, 0.92589919]])

In [165]:
li = [1,2,3,4,5]

In [167]:
li[1:3]

[2, 3]

In [171]:
li[0:1] + li[3:]          # lists cannot index separated indices

[1, 4]

In [172]:
arr

array([[0.01374306, 0.44403838, 0.59207462],
       [0.12455101, 0.17375854, 0.92589919]])

In [173]:
arr<0.15

array([[ True, False, False],
       [ True, False, False]])

In [174]:
arr = np.array([1,2,3,4,5,6])

In [180]:
arr[[True,False,False,False,False,True]]      # arrays can index separated indices

array([1, 6])

In [56]:
arr[arr>3]                                    # we can use boolean conditions to index into arrays

array([4, 5, 6])

In [5]:
arr = np.random.randint(0,1000,(2,3))                 # numpy supports a variety of random operations

In [6]:
arr

array([[933, 353, 553],
       [207,  68, 238]])

In [188]:
arr[:,1]>100                     

array([False,  True])

In [189]:
arr[np.array([False,  True])]

array([[209, 195, 133]])

In [214]:
arr[(arr[:,1]>100) | (arr[:,2]<700)]       # numpy doesn't allow standard 'and' 'or' but & and |

array([[610,  58, 603],
       [209, 195, 133]])

In [213]:
arr

array([[610,  58, 603],
       [209, 195, 133]])

In [208]:
(arr[:,1]>100)

array([False,  True])

In [209]:
(arr[:,2]<700)

array([ True,  True])

In [212]:
(arr[:,1]>100) | (arr[:,2]<700)

array([ True,  True])

In [201]:
arr[(arr[:,1]>100) & (arr[:,2]<200)]

array([[209, 195, 133]])

In [218]:
l = []
for i in range(10):
    l.append(i)

In [220]:
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [219]:
[i for i in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [10]:
arr[:,np.r_[[0,2]]]         # slice reparated inidices

array([[933, 553],
       [207, 238]])

# Pandas
<a id="pd"></a>

In [81]:
import pandas as pd
import numpy as np

In [144]:
df = pd.DataFrame({'A':['hello', 'there'], 'B':[np.nan, 1]})   # create df from a dictionary

In [106]:
df

Unnamed: 0,A,B
0,hello,
1,there,1.0


In [107]:
df.loc[1:0,'B']         # loc is used to index into df with index and column values

Series([], Name: B, dtype: float64)

In [108]:
df.iloc[0:1,0:1]      # iloc is used to index into df with index and column indices

Unnamed: 0,A
0,hello


In [135]:
df.loc[:,'A':'B']      # multiple column selexction with loc returns a df

Unnamed: 0,A,B
0,whats,
1,up,1.0


In [139]:
df['A'].value_counts(normalize=True)

hello    0.333333
you      0.333333
there    0.333333
Name: A, dtype: float64

In [140]:
df['A'].unique()

array(['hello', 'there', 'you'], dtype=object)

In [141]:
df['A'].nunique()

3

Unnamed: 0,A,B
1,there,1.0
2,you,1.0
0,hello,


In [110]:
df['A']               # accessing a single column with [] returns a series

0    hello
1    there
Name: A, dtype: object

In [145]:
df['G'] = df['A'].apply(lambda x: 5 if x=='hello' else 4)

In [146]:
df['F'] = df.apply(lambda row: 5 if row['A']=='hello' and row['G'] == 5 else 17, axis=1)

In [152]:
df

Unnamed: 0,A,B,G,F
0,hello,,5,5
1,there,1.0,4,17


In [170]:
df.sort_values(['G','F'], ascending=[True,False], inplace=True)

In [171]:
df.sort_values('A')

Unnamed: 0,A,B,G,F
1,hello,,5,5
0,there,1.0,4,17


In [175]:
df.reset_index()

Unnamed: 0,A,B,G,F
0,there,1.0,4,17
1,hello,,5,5


In [114]:
df.drop(columns=['G'])

Unnamed: 0,A,B,F
0,hello,,5
1,there,1.0,17


In [115]:
df

Unnamed: 0,A,B,G,F
0,hello,,5,5
1,there,1.0,4,17


In [116]:
df.A                 # accessing a single column with .

0    hello
1    there
Name: A, dtype: object

In [117]:
df.B > 0             # boolean conditions on columns reuturn a series of booleans

0    False
1     True
Name: B, dtype: bool

In [118]:
df.B > 0

0    False
1     True
Name: B, dtype: bool

In [119]:
df[df.B > 0]        # we can use bolean series to index into the df

Unnamed: 0,A,B,G,F
1,there,1.0,4,17


In [120]:
df[(df.B > 0) & (df.A=='there')]       # multiple conditions work the same way as with numpy

Unnamed: 0,A,B,G,F
1,there,1.0,4,17


In [121]:
df['C'] = 2                         # creating a new column with one value sets that value to all rows

In [122]:
df['D'] = [5.0, 7.5]                # creating a new column with a collection of same length assigns all the values

In [123]:
df

Unnamed: 0,A,B,G,F,C,D
0,hello,,5,5,2,5.0
1,there,1.0,4,17,2,7.5


In [124]:
df['A'] = ['what', 'up']            # using the same syntax on an existing column replaces it

In [125]:
df[df['A']=='what']['A']

0    what
Name: A, dtype: object

In [126]:
df.loc[df['A']=='what','A'] 

0    what
Name: A, dtype: object

In [127]:
df.loc[df['A']=='what','A'] = 'whats'

In [128]:
df

Unnamed: 0,A,B,G,F,C,D
0,whats,,5,5,2,5.0
1,up,1.0,4,17,2,7.5


In [129]:
df.isna().sum()                          # returns a series with a sum of all missing values

A    0
B    1
G    0
F    0
C    0
D    0
dtype: int64

In [130]:
df['B'].fillna(df['B'].mean())        # fill the missing values of a column with a specific value

0    1.0
1    1.0
Name: B, dtype: float64

In [95]:
df

Unnamed: 0,A,B,G,F,C,D
0,whats,,5,5,2,5.0
1,up,1.0,4,17,2,7.5


In [98]:
df = df.rename(columns = {'A' : 'a'})                # renames column, updates df, doesn't return anything
df.rename(columns = {'A' : 'a'})                     # renames column, doesn't update df, returns result
df.rename(columns = {'A' : 'a'}, inplace=True)       # renames column, updates df, doesn't return anything
df = df.rename(columns = {'A' : 'a'}, inplace=True)  # ruins everything

In [132]:
df['D'].isin([5.0, 7.5])

0    True
1    True
Name: D, dtype: bool

In [133]:
df[df['D'].isin([5.0])]

Unnamed: 0,A,B,G,F,C,D
0,whats,,5,5,2,5.0


In [185]:
df = pd.DataFrame({'A':['hello', 'there', 'hello'], 'B':[1,2,3], 'C':[5,6,7]})

In [211]:
df

Unnamed: 0,A,B,C
0,hello,1,5
1,there,2,6
2,hello,3,7


In [213]:
df[['A']]

Unnamed: 0,A
0,hello
1,there
2,hello


In [210]:
df.groupby('A')[['C']].mean()

Unnamed: 0_level_0,C
A,Unnamed: 1_level_1
hello,6
there,6


In [214]:
df.groupby('A')['B'].agg(['mean', 'min','max'])

Unnamed: 0_level_0,mean,min,max
A,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
hello,2,1,3
there,2,2,2


In [229]:
df.groupby('A').agg({'B':['mean','min'], 'C':['min']})

Unnamed: 0_level_0,Unnamed: 1_level_0,B,B,C
Unnamed: 0_level_1,A,mean,min,min
0,hello,2,1,5
1,there,2,2,6


In [226]:
df.reset_index?

In [224]:
a_df.columns = [a+'_'+b for a,b in a_df.columns]

In [225]:
a_df

Unnamed: 0,A_,B_mean,B_min,C_min
0,hello,2,1,5
1,there,2,2,6


In [193]:
df.groupby('A')['C'].mean().to_frame()

Unnamed: 0_level_0,C
A,Unnamed: 1_level_1
hello,6
there,6


In [184]:
df.groupby('A').min()

A      B
hello  1    1
       3    1
there  2    1
Name: B, dtype: int64

In [None]:
df.groupby('A').max()

In [None]:
df.groupby('A')['B'].value_counts()

In [None]:
df

In [123]:
df = pd.DataFrame({'A':['NaN', None], 'B':[np.nan, 'np.nan']})

In [99]:
df.replace('NaN', None)

Unnamed: 0,A,B
0,,
1,,np.nan


In [84]:
df.loc[0,'A']

'NaN'

In [88]:
df.loc[0,'B']

nan

In [97]:
pd.read_csv('test.csv')

Unnamed: 0,A,B
0,,5


In [89]:
df.isna()

Unnamed: 0,A,B
0,False,True
1,True,False


In [65]:
np.nan == np.nan, None==None

(False, True)

In [101]:
not True

False

In [66]:
~df.isna()

Unnamed: 0,A,B
0,True,False
1,False,True


In [128]:
df.fillna('Unk')

Unnamed: 0,A,B
0,,Unk
1,Unk,np.nan


In [129]:
df.fillna('Unknown', inplace=True)

In [106]:
df

Unnamed: 0,A,B
0,,Unknown
1,Unknown,np.nan


In [107]:
df['A'].unique()

array(['NaN', 'Unknown'], dtype=object)

In [120]:
df.join(pd.get_dummies(df['A']))    # one hot encoding

Unnamed: 0,A,B,NaN,Unknown
0,,Unknown,1,0
1,Unknown,np.nan,0,1


In [121]:
df.append?

In [109]:
df

Unnamed: 0,A,B
0,,Unknown
1,Unknown,np.nan


In [122]:
pd.get_dummies?

In [125]:
df

Unnamed: 0,A,B
0,,
1,,np.nan


In [126]:
pd.get_dummies(df)        # if no columns specified one hot encodes all non numerical columns

Unnamed: 0,A_NaN,B_np.nan
0,1,0
1,0,1


In [130]:
pd.get_dummies(df, columns=['A','B'])      # only encode specific columns

Unnamed: 0,A_NaN,A_Unknown,B_Unknown,B_np.nan
0,1,0,1,0
1,0,1,0,1


In [132]:
df

Unnamed: 0,A,B
0,,Unknown
1,Unknown,np.nan


In [131]:
df['A'].apply(lambda x: x=='Unknown')       # applying a function on column

0    False
1     True
Name: A, dtype: bool

In [73]:
df[df['A'].apply(lambda x: x=='Unknown')]    # using result to access df

Unnamed: 0,A,B
1,Unknown,np.nan


In [74]:
df[~df['A'].apply(lambda x: x=='Unknown')]   # finding opposite of condition

Unnamed: 0,A,B
0,,Unknown


In [138]:
df['HasUnknown'] = df.apply(lambda x:x['A']=='Unknown' or x['B']=='Unknown', axis=1)  # applying a function to row

In [144]:
df['HasUnknown'] = df.apply(lambda x:any([x[i]=='Unknown' for i in ['A','B']]), axis=1)

In [141]:
any([x[i] == 'Unknown' for i in ['A','B','C']])

True

In [136]:
df

Unnamed: 0,A,B,HasUnknown
0,,Unknown,True
1,Unknown,np.nan,True


In [145]:
df['Unk'] = df['B'].apply(lambda x: x=='Unknown')   # ecoding a categorical column with 2 values

In [146]:
df

Unnamed: 0,A,B,HasUnknown,Unk
0,,Unknown,True,True
1,Unknown,np.nan,True,False


In [375]:
df1 = pd.DataFrame({'C':[0,1], 'B':['a','b']})

In [378]:
df1

Unnamed: 0,C,B
0,0,a
1,1,b


In [376]:
df2 = pd.DataFrame({'A':[5,7], 'Y':['c','a']})

In [377]:
df1.merge(df2, left_on='B', right_on='Y')   # combine two dataframes on a shared key

Unnamed: 0,C,B,A,Y
0,0,a,7,a


In [None]:
df1.merge(df2, on='B')

# General Tips
<a id="gt"></a>

The last line of the error tells you what the problem is

The body of the error tells you exactly where in the code the error occurs

Use comments to write down what you want to do even if you don't immediately know how to do it

Test out parts of your code in a new cell when debugging

When debugging, try inserting in print statements throughout the code to see what the variable look like and find out where mistake happens

It's best to assume that the function only has access to variables declared within itself

# Syntax
<a id="syn"></a>

In [None]:
# [] indexing, selecting, next to: df, df.loc, df.groupby(), df.iloc, lists, arrays, strings

In [None]:
# () next to functions, methods, function definitions, tuple, separating conditional statements, math use

In [None]:
# {} creating sets or dictionaries, dictionary comprehension, json

In [None]:
# <> definitions of functions

In [None]:
# : end of conditionos, function and class definitions,loops, slicing (from here to here), between key and value in dict

In [None]:
# , indication of another argument, separation of parameters or axes

In [None]:
# & | comparing boolean arrays or boolean series (elements outside of sing need to be in ())

In [None]:
# ~ within pandas means not (turns true to false and vice versa)

In [None]:
# ; used to eliminate text output from graphs

# Terminology
<a id="tr"></a>

Iterate (loop, traversing) - the action of going through all the elements of an object (list, range, etc.) 

Concatenate - the action of stiching two lists or two strings together (Ex. [1,2] + [3,4] -> [1,2,3,4])

Cast - change the type of a variable (Ex. int('1') casts '1' from string to integer)

Increment/Decrement - increase or decrease a variable (Ex. i += 1)

Substring - part of a string

Index - location of an object within a list as a whole number starting from 0

Method/Function - an action that changes an object and/or returns another object

Return - the value that comes back as a result of calling a function (Ex. math.sqrt(5.5) returns a value)

Call - execute a function

Print - display text to a user

Immutable - Object that cannot change it's value or size

Unpack - treat an object containing multiple elements as separate elements

Key - Unique identifier within a dictionary

Iterable - an object that can be iterated (traversed or looped over)

Parameter - variable within the function definition

Argument - value that gets passed to a function

Instance - an object of a certain type

Inheritance - taking on aspects of a broader class

Dunder - special methods used in classes (Ex. __ init __, __ str __, __ len __, __ repr __)

Object - the broadest possible class in Python

Axis - dimension within an array

# Useful Functions
<a id="uf"></a>

len() - function that returns the length of an object (Ex. len([1,2,3,4]) = 4)

append() - a function that adds a new element to a list (Ex. li = [1,2] \n li.append(3) adds 3 to existing list)

enumerate() produces a tuple where the first element is the index and the second element is the value

in - checks whether an object is a member of another object (Ex. 1 in [1,2] == True)

''.join(['a','b','c']) -> 'abc'

help(object) - function that returns more information about the object

object? - function that returns more information about the object

function('press shift + tab') -> tells you what arguments you can pass to this function

object.'press tab' -> shows all available methods/attributes