# New Hire Training - Python Fundamentals: Day 1 Lesson Codes
This Jupyter Notebook file is a summary of codes demonstrated in class by The Marquee Group during the J.P. Morgan New Hire Training.

### Session 1 – Introduction to Python and Python Objects

This session will provide an overview of the Python programming language and familiarize participants with Python’s data types and key object methods. The session will cover:
- Using Jupyter Notebook to write and execute Python code and key shortcuts and settings
- Creating and manipulating variables
- Introduction to Python data types and methods including numbers, strings and lists

#### Section 1 - Intro to Jupyter Notebooks

In [None]:
#Creating a new Jupyter Notebook:
"""
- New --> Athena 3.7 Notebook
"""

# Jupyter Modes:
"""
- EDIT Mode - Green border around a section, allows to edit code inside a Code cell or text in a Markdown cell
- Command Mode - Blue border around a section, keyboard shortcuts allow to navigate within the entire notebook

"""

#Useful Shortcuts
"""
- Help --> Keyboard Shortcuts (Hotkey: H) will list all key shortcuts used in Jupyter Notebook
- Most commons ones we will use during the program:
    - CTRL + Enter to run a section at a time
    - SHIFT + Enter to run a section and also advance forward to next section
    - Enter (during Command Mode) - enter Edit mode to edit the text/code in current cell
    - Esc (during Edit Mode) - exit Edit mode and return to Command Mode

- Other key Command Mode shortcuts:
    - Y - Change current cell to code type
    - M - Change current cell to markdown type
    - Up / Down arrow keys - Navigate to cell above/below
    - A / B - Insert a new cell above/below
    - C / V / X - Copy/Paste/Cut selected cells
    - D, D - Delete selected cells
    - O - hide/unhide output

- Other key Edit Mode shortcuts:
    - SHIFT + TAB - Provides tooltips while cursor is inside brackets of a function or method
    - CTRL + / - Comments/Uncomments selected lines of code
    - CTRL + D - Deletes selected lines of code
    - Tab - auto-complete code
"""

#Markdown Tips:
"""
Cheat-sheet: https://www.markdownguide.org/cheat-sheet
- #, ##, ### for headers
- **text** to bold text
- *text* to italicize text
- 1., 2., 3. or - to create lists
"""

#Other Tips:
"""
Use # to create single line "comments" in Code
Use triple quotations for multi-line comments or notes
"""

#### Section 2 - Data Types

In [None]:
#To run this section hit CTRL + Enter

x = 5 #integer
z = 2.5 #float

type(x) #this will get output if it's the last line in the cell

In [None]:
print(type(x)) #if running entire file or section, you need to surround with print

x = "Hi" #string
    #note how variables can be re-assigned to be a different data type

greeting = "Hello World"
name = "Bogdan Tudose"

print("Hello " + name)
    #use + to concatenate

# greeting + z 
    #this gives an erorr, cannot concat float to str

print(greeting + str(z))

#### Section 3 - Strings and String Methods

In [None]:
y = '6' #this is a string
print(y * 100) #will print 6 repeated 100 times
print("-" * 50) #repeats the dash 50 times

#Cleaning up numbers
myNumber = "$123,456,789"
myNumber * 2

In [None]:
y = '6' #this is a string
print(y * 100) #will print 6 repeated 100 times
print("-" * 50) #repeats the dash 50 times

#Cleaning up numbers
myNumber = "$123,456,789"
myNumber * 2

In [None]:
myNumber.strip('$') #did not change original
myNumber

In [None]:
myNumber = myNumber.strip('$')
myNumber = myNumber.replace(',', '')
myNumber = int(myNumber)
myNumber

In [None]:
#All in one step
myNumber = "$123,456,789"
myNumber = int(myNumber.strip('$').replace(',', ''))
myNumber

#variableName --> camel case
#variable_name --> snake case

In [None]:
#%%% String Format Method
output = "Sales: {}, Gross Margin {:.2%}"
print(output.format(2000,0.3))

output2 = "Market Cap: {:>9,}"
mkCap1 = 1234
mkCap2 = 3456700
mkCap3 = 100234
print(output2.format(mkCap1))
print(output2.format(mkCap2))
print(output2.format(mkCap3))

#### Section 4 - Lists and List Methods

In [None]:
#Arrays in other programming language
#Slides 26-32

ticker1 = 'AAPL'
ticker2 = 'SP500'
ticker3 = 'MSFT'

tickers = ['AAPL', 'SP500', 'MSFT', 'TSLA', 'NFLX']

#extract info from a list
tickers[3]

In [None]:
#change one of the itmes
#NFLX --> AMZN
tickers[4] = 'AMZN' #don't forget counting starts at 0
len(tickers) #size of the array --> 5
tickers

In [None]:
tickers[-1] = 'NFLX' #<-- this changes the last item

#pull out the first 3 items
tickers[0:3] #0 <= x < 3, 2nd number is open interval

In [None]:
#%%% List Methods
listX = [1, 2, 3]
listY = [100, 200, 300]
listZ = ['a', 'b', 'c']

z = 10

listX.insert(1, z) #adds z as the new second item
listX.append(listY) #adds listY as one entry
listX[4][1] #extracts the "200"

listX.extend(listZ) #breaks appart listZ as multiple "rows"
listX

In [None]:
#%%% Copying Lists

list1 = [1,2,3]
list2 = list1 #Creates a link
list3 = list1.copy() #copies the Values
list4 = list1[:] #copies the Values

list2[0] = 100 #this will change first item of both list1 and list2
list4[0] = 5000 #only changes first item in list4
list1, list2, list3, list4

<font color = 'blue'> **Exercises** </font>
- Try Exercises 1, 2 from Assignment 1

### Session 2 – Python Data Structures and Control Flow

This session will provide an overview of more advanced data types in Python and common programming control flow concepts such as if statements and looping. The session will cover:
- Differences and uses of various data structures such as sequences, lists, dictionaries, tuples and sets
- Decision making with if/else structures and Boolean logic
- Looping techniques with for and while loops and list comprehension

#### Section 5 - Dictionaries & Tuples

In [None]:
#%%% Dictionaries
#Array using your own "names" for the "rows"
#storing information in key:value pairs
#keys have to be unique and data is not ordered

stocks = {'AAPL':'Apple Inc.', 
          'CAT':'Caterpillar', 
          'MSFT':'Microsoft',
          'NFLX':'Netflix'}

#To extract
stocks['CAT']

#To change values
stocks['AAPL'] = "Apple"

#To create a new entry
stocks['FB'] = 'Facebook'

#list of all row names
print(stocks.keys())
#list of all the values
print(stocks.values())
stocks

In [None]:
#%%% Tuples
#Arrays that store constants
#tax rates, fx rates
taxes = (0.25, 0.35, 0.50)

taxes[1] #this pulls out the 2nd item, 35%

#This doesn't work:
#taxes[0] = 0.45

In [None]:
#%%% Other Data Types
#Sets - similar to lists but hold unique data

set_nums = {100, 5, 9, 2}
print(set_nums)

#useful for removing duplicates
tickers = ['AAPL','TSLA','MSFT','NFLX','AMZN','AAPL','NFLX']
tickers_unique = set(tickers)
tickers_unique

#### Section 6 - If Statements - Boolean Logic

In [None]:
#or |
#AND &
#not equal to: !=
#to compare if two things are equal use == not =
    #one "=" is used to change the value of a variable

x = 5
y = 10

# y = x #this changes b to be equal to x
# y == x #this checks if y and x are the same

#If Example
if x > y:
    print("x is greater than y")
    print("x is ", x)
    if x > 100:
        print("x is greater than 100")
elif y > x:
    print("y is greater than x")
else:
    print("x is equal to y")

print("this is outside the if statement")

In [None]:
#Be careful with "Truthy" statements, anything that is not 0 is True
if 10 - 2:
    print("This IF is True")

#### Section 7 - Looping

In [None]:
#Two main types of loops: For loop, While loop
#For Loop - you know how many times to loop
#While Loop - you are checking a condition (loop with an IF stmt)
    #loop will run as long as condition is True

#%%% For Loop
numbers = [2, 5, 7, 9, 13, 25]

#Looping through a list
#Print the squares and cubes of these numbers really quickly
for num in numbers:
    print("-"*10)
    print(num)
    print("Square:",num ** 2) #square
    print("Cube:",num ** 3) #cube

In [None]:
#More efficient:
output = "X:{} Square:{} Cube:{}"
for num in numbers:
    print("-"*20)
    print(output.format(num, num**2, num**3))

In [None]:
#Looping "X" number of times
for i in range(5): #prints numbers 0, 1, 2, ... 4
    print(i)

for x in range(2,26,2): #range(start, end, skip)
    print(x) #prints 2, 4, 6, etc. stops at 24

In [None]:
#%%% List Comprehension
nums = [5, 7, 9, 12]
numsCube = [x**3 for x in nums]

#Faster than doing:
cubeList = []
for x in nums:
    cubeList.append(x**3)

print(cubeList)

#Can combine with IF Statements
nums = [3, 50, 29, 12, 100, 62]
squareList = [x**2 for x in nums if x > 50]
print(squareList)

In [None]:
#%%% While Loops
x = 5

while x > 2:
   print(x)
#     this is an infinite loop, it never stops
#         you can interrupt by hitting the Stop botton
#         OR hitting I,I after code starts running

In [None]:
while x < 1000:
    print(x)        
    if x == 50:
        break #can stop a while loop earlier with break command
    x += 1  #x = x + 1

In [None]:
#Careful with using floats in If statements or While loops
print(1.1 + 2.2 == 3.3) #Gives False
        #actually doing 3.30000001 or 3.29999999998...
print("{:.20f}".format(3.3))

round(1.1 + 2.2)  == round(3.3) #Gives True

<font color = 'blue'> **Exercises** </font>
- Try Exercises 3, 4, 5 from Assignment 1

### Session 3 – Python Functions and Introduction to NumPy

This session will cover the fundamentals of creating custom functions in Python and introduce the concept of Python packages and libraries, including an overview of the NumPy package. The session will cover:
- Creating custom functions
- Function arguments, special parameters and docstrings
- Lambda functions
- Error handling using exceptions
- Importing Python packages
- Overview of NumPy library and NumPy array functions

#### Section 8 - Functions

In [None]:
#%% Section 8 - Functions
#creating your own formulas that don't exist in Python
#also useful for creating "mini programs" that can be reused in your code
    #e.g. function that imports data from files, scrapes websites, etc.

def fnCube(x):
    cube = x ** 3
    return cube #outputting the result

def fnSquare(x):
    return x ** 2

#functions do not always have to return a value
def printGreeting(fName, lName):
    print("Hello " + fName + " " + lName)

#functions can return multiple outputs
def perimAreaRectangle(length, width):
    area = length * width
    perim = 2* (length + width)
    return perim, area 

#functions have to be written and loaded in memory before using them
    #typically declare functions at the top of our codes
x = 5
y = 10
print(fnSquare(x))

perim, area = perimAreaRectangle(x, y)
fnCube(y)

In [None]:
#%%% Lambda Functions
#Useful for writing simple "one-liner" functions

#lambda input(s): what to do with the input
lambdaSquare = lambda x: x**2

print(lambdaSquare(25))

hypLength = lambda x, y: (x**2 + y**2) ** 0.5 #could also use math.sqrt
print(hypLength(4,3))

In [None]:
#%%% Docstrings
#Sometimes useful to provide documentation to your function for others to know how to use it properly
def cagr(beg, end, n):
    """
    CAGR function calculates compounded annual growth rate for a simple investment.
    beg = Beginning investment amount, at year 0
    end = Ending investment amount at exit year
    n = Number of years between beginning investment and exit
    """
    return (end / beg) ** (1/n) - 1

#SHIFT + TAB to see docstring as you are typing function
cagr(50, 100, 5)

#### Section 9 - Error Handling

In [None]:
#%%% Error Handling
# - errors can be prevented by using if statements or try/except
x = 5
y = 0
x / y
#ZeroDivision Error

In [None]:
#Using If Statements
if y == 0:
    print("Cannot divide by 0")
else:
    print(x/y)

In [None]:
#Using generic try/except structure
try:
    print(x/y)
except:
    print("Could not divide x by y")

In [None]:
#Using more specific try/except
# 5 / "2" #TypeError
# 5 / 0 #ZeroDivisionError 
x = 5
y = 2

try:
    #print(x/Y) #will cause a NameError as Y not defined
    print(x/y)
except ZeroDivisionError:
    print("Cannot divide by 0")
except TypeError:
    print("Need to ensure both numbers are floats or integers")
except:
    print("Other error")

#### Section 10 - NumPy Library

In [None]:
#%% Section 9 - NumPy Library
#Numpy package provides more advanced mathematical and statistical functions
#Also allows for more advanced calculations with arrays and matrices

#Importing the package
import numpy as np

In [None]:
#%%% Random Numbers
np.random.seed(42) #sets a seed for the random number generator. Set once per execution
print(np.random.rand(10))
print(np.random.rand(10)) #creates another list of 10 random numbers

In [None]:
# Notice setting seed again produces the same sequence as the first time
np.random.seed(42) 
print(np.random.rand(10))

In [None]:
#Random Integers
print(np.random.randint(1,7, size=100)) 
    #generates 100 random integer from 1 to 6 (exclusive of the 7)
    #similar to rolling dice
    #each number has equal chance of getting drawn

In [None]:
#Random Normal Distrubtion
#Now the probability of a number being generated will follow the normal (bell curve) distribution.
print(np.random.normal()) #draws a random number from the normal distribution
print(np.random.normal(5,2)) # can specify mean and standard dev
print(np.random.normal(5,2, 10)) # can specify mean and standard dev and size

In [None]:
#%%% Math and Statistical Functions
results = np.random.normal(5,2, 100)
    #notice how results variable is numpy array (not a list)

#Statistical functions
print(np.mean(results))
print(np.std(results))
print(np.cov(results))
print(np.sum(results))

#Other math functions
e = np.exp(1)
np.log(e) #natural log
np.sqrt(144)

<font color = 'blue'> **Exercises** </font>
- Try Exercises 6, 7 from Assignment 1

### Session 4 – Introduction to Pandas Package

This session will introduce participants to the Pandas package in Python and familiarize them with the DataFrame object. The session will cover:
- Overview of uses of Pandas package
- Creating and manipulating Pandas DataFrames and Series
- Exploring data sets and accessing rows, columns and slices of DataFrames
- Sorting and filtering DataFrames

#### Section 11 - Intro to Pandas

In [None]:
#Things you can do with pandas package:
"""
Data Manipulation
- Loading and cleaning data
- Filtering, sorting
- Access specific columns or rows
- Rolling metrics - moving average
- Calculate new columns --> formulas to do math
- Pivot table
"""

In [2]:
#%%% Importing pandas
import pandas as pd  
pd.__version__

#pd is the common "alias" used in Python community
#http://pandas.pydata.org/pandas-docs/stable/

<font color = 'blue'> **Creating DataFrames** </font>

In [3]:
#%%% Creating DataFrames from simple data types

#Using a dictionary
tickers_pe = {'AAPL':25,
             'MSFT':31,
             'AMZN':48,
             'FB':22,
             'NFLX':38,
             'TSLA':99,
             'MCD':24,
             'WMT':22,
             'JNJ':16}
df = pd.DataFrame({"Ticker":list(tickers_pe.keys()),"PE":list(tickers_pe.values())})
df

In [None]:
#Using lists
companies = ['A','B','C','D','E','F','G']
prices = [100, 50, 22.35, 20, 15.90, 60.95, 44.21]
eps = [20.22, 5, 1.13, 1.95, 2.36, -1.11, 1.83]

coData = pd.DataFrame({'Company':companies, 'Price':prices, 'EPS':eps})
coData

<font color = 'blue'> **Importing Data** </font>

In [None]:
sp500 = pd.read_csv("ADAPT2021/StockData/SP500.csv") #using relative path of project
sp500

In [None]:
finData = pd.read_excel("ADAPT2021/ExData/Data Manipulation Worksheet.xlsx", sheet_name="Financing Table Clean")
finData

<font color = 'blue'> **Exploring Data** </font>

In [None]:
#Exploring Data
coData.info()

In [None]:
sp500.info()
#note how Dates are showing up as "object"

In [None]:
finData.info()
#note how DATE is of type datetime

In [None]:
coData.head() #first 5 rows

In [None]:
coData.tail() #last 5 rows

In [None]:
sp500.describe()

<font color = 'blue'> **Cleaning Data** </font>

In [None]:
#%%%Setting up the Data
#Change Dates to be numbers not text
sp500['Date'] = pd.to_datetime(sp500['Date'])
sp500.info() #Date column is now datetime not object

In [None]:
#changing the index to Dates column
sp500.set_index(['Date'], drop=True, inplace = True)
    #inplace = True is the same thing as sp500 = sp500.set_index....
    #most pandas functions/methods don't change the original table
    #you need to reassign to change the original DataFrame
    #or use inplace=True
    
    #drop=True is the default and can be omitted, drop=False will retain the original column

sp500

#### Section 12 - Accessing/Slicing Data

In [None]:
#%%% Accessing rows
#Method 1 - iloc - integer location - index
    #starts with 0; row 1 in Excel = row 0 in Pandas

sp500.iloc[0] #first row
# pd.options.display.float_format = '{:.2f}'.format #to change format of floats

sp500.iloc[:2]#start at beg, stop row[1]
    #still does open interval for iloc

#Method 2 - loc - new index, which is now Date
sp500.loc['20130930']
sp500.loc['20130930':'20131010'] #inclusive of the last date
sp500.loc['2013-10'] #prints all October of 2013
sp500.loc['2016'] #all 2016

In [None]:
#%%% Accessing columns
#Method 1 - Using df['name column'], similar to Dictionaries
sp500['Adj Close']

#Method 2 - Using df.colName, does not work for columns with spaces in name
sp500.Close

#Multiple columns - provide a list of header names
sp500[['Close','Open']]

In [None]:
#%%% Slicing parts of DataFrame
#Method 1 - Using .loc['rowName','colName']
sp500.loc['2014-09', 'Low'] #column name inside []
sp500.loc['2015-09', ['Low','High']] #multiple columns can be given in a list

#Method 2 - Using .loc['rowName']['colName']
sp500.loc['2016-01']['Volume']#column name inside []
    #Careful when using this method to "Copy" data into another DataFrame
    #It could create a "view" instead of a copy, where tables are linked
    #Use df.loc['row']['col].copy() or Method 1
    #See Slide 75 for more details

sp500_OC = sp500[['Open','Close']]
sp500_HL = sp500.loc[:,['High','Low']] #another way of copying just values
        # .loc[:] grabs all rows

In [None]:
sp500.loc[:,['High','Low']]

In [None]:
#%%% Output to Excel
sp500_OC.to_csv("ADAPT2021/Output/SP500 Open Close.csv")
sp500_HL.to_excel("ADAPT2021/Output/SP500 High Low.xlsx")

#### Section 13 - Manipulating Data
- Creating new columns / calculated fields
- Sorting
- Filtering

In [None]:
#Creating new columns
coData['P/E'] = coData['Price'] / coData['EPS']
coData

In [None]:
sp500['Intraday Returns'] = sp500['Close'] / sp500['Open'] - 1
        # (Closing - Open) / Open = Close/Open - 1

#Daily Returns
sp500['Returns'] = sp500['Close'] / sp500['Close'].shift(1) - 1
sp500['Returns 2'] = sp500['Close'].pct_change()
# pd.options.display.float_format = '{:.4f}'.format #to change format of floats
sp500

<font color = 'blue'> **Sorting Data** </font>

In [None]:
#%%% Sorting Data

#Descending by volume
sp500.sort_values(['Volume'], ascending=False, inplace=True)
#don't forget the inplace=True if you want to change the original
#OR: sp500 = sp500.sort_values....

#Sort multiple cols
sp500.sort_values(['Volume','Close','Open'],
                  ascending=[False,False,True],
                  inplace=True)

#Sort by Index
sp500.sort_index(inplace=True)

<font color = 'blue'> **Filtering Data** </font>

In [None]:
#%%% Filtering Data

# posDays = sp500[ booleanMask  ]
#where a boolean mask is a column of True/False

sp500[sp500['Open'] < sp500['Close']] #method 1 - use condition directly
sp500['isPositive'] = sp500['Open'] < sp500['Close']
sp500[sp500['isPositive']] #method 2 - give it a column of T/F

In [None]:
coData[coData['P/E']>15]

In [None]:
coData[coData['Company']=='F']

In [None]:
finData[(finData['INDUSTRY']=='Finance') & (finData['LEAD UNDERWRITER']=='J.P. Morgan')]

<font color = 'blue'> **Quick Plotting** </font>

In [None]:
sp500['Close'].plot()

<font color = 'blue'> **Exercises** </font>
- Try Exercise 8 from Assignment 1
- Try mini Pandas assignment below

In [None]:
#%% Questions

#1) Load Apple data set (StockData --> aapl.csv) 
    #and Financing Deals data set (ExData --> Data Manip --> Clean tab)

#2) Find all the days of Apple where closing share price was between 70 and 75
    
#3) Financing Deals data (Data Manip file) --> find all deals done by GS and JPM

#4) Find all the deals done in May of 2006
#5) Find all the deals done by J.P. Morgan in Insurance
#6) Calculate the returns of Apple's closing share price
    #what is the average return and standard deviation?