In [None]:
# Adapted from: http://justinbois.github.io/bootcamp/2020_fsri/lessons/l01_welcome.html

### Module 1 Learning: Working in SageMaker Studio & Review of Python concepts
In this activity, become familiar with the SageMaker Studio IDE (Integrated Development Environment) and learn/review some basic Python concepts

In [None]:
# This is a code cell. The '#' character makes this a commented line.
# The next line is executable code.
print('Hello, world.')

### Cells

A Jupyter notebook consists of cells. The two main types of cells you will use are code cells and markdown cells. First, an overview.

A code cell contains actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar of your Jupyter notebook. Otherwise, you can can hit Esc and then y (denoted Esc - y‚Äù) while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it.

If you want to execute the code in a code cell, hit Enter while holding down the Shift key (denoted Shift + Enter). Note that code cells are executed in the order you shift-enter them. That is to say, the ordering of the cells for which you hit Shift + Enter is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are not known to the Python interpreter. This is a very important point and is often a source of confusion and frustration for students.

Markdown cells contain text. The text is written in markdown, a lightweight markup language. As you are typing the contents of these cells, the results appear as text. Hitting Shift + Enter renders the text in the formatting you specify.

Markdown formatting reference: https://www.ibm.com/docs/en/db2-event-store/2.0.0?topic=notebooks-markdown-jupyter-cheatsheet

You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting Esc - m in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.

In general, when you want to add a new cell, you can click the + icon on the notebook toolbar. The shortcut to insert a cell below is Esc - b and to insert a cell above is Esc - a. Alternatively, you can execute a cell and automatically add a new one below it by hitting Alt + Enter.


<P><P><P><P>If you evaluate a Python expression that returns a value, that value is displayed as output of the code cell. This only happens, however, for the last line of the code cell.

In [None]:
# Would show 9 if this were the last line, but it is not, so shows nothing
4 + 5

# I hope we see 11.
5 + 6

Note, however, if the last line does not return a value, such as if we assigned value to a variable, there is no visible output from the code cell.



In [None]:
# Variable assignment, so no visible output.
a = 5 + 6

In [None]:
# However, now if we ask for a, its value will be displayed
a

## Colvin's Top 5 Tips:
For working in juyter notebooks and Python for data
### Tip #1
Use the keyboard as much as possible. With a little practice, you can move very quickly around a notebook.
<P>
Bigger list of keyboard shortcuts than shown here:<BR> 
    https://gist.github.com/discdiver/9e00618756d120a8c9fa344ac1c375ac

In [None]:
# To execute a cell, use the keyboard
# for Windows: shift-enter (move to next cell) or ctrl-enter (stay on current cell)
# Mac may be slightly different
print('Using shift-enter will save you lots of time.')

In [None]:
# Use the command mode keystrokes:
# 'Esc' puts the cell into command mode
# 'Enter' puts the cell into edit mode
# Try this sequence:
# - hit the Esc key
# - then hit the 'b' key
# This inserts a cell below. Very fast and useful.

In [None]:
# Insert above
# use the key stroke: Esc a
# Inserts a cell above.

In [None]:
# Delete a cell
# Esc d d

In [None]:
# Undo last action
# Esc z

In [None]:
# Change cell type:
# Esc m (markdown)
# Esc y (code)
#
# Then hit enter to edit the cell.

### Tip #2: Using software from others: (importing explained)
Installing and loading modules. 
This is often difficult on a laptop, but easy on SageMaker Studio Lab

In [None]:
# Checking Python kernel version
# import a single function from a module
from platform import python_version
python_version()

#### Functions, Modules & Packages
In brief:
- packages are a collection of modules
- modules are a collection of functions
- functions are code someone else wrote to do useful things

In [None]:
# If a package is already installed, then we can just import it
import pandas
# No error, so we are OK.
# Then we can immedately use it:
df = pandas.DataFrame(data = [1,2,3])
df.head()

In [None]:
# On often used items, we can give them a simple, short name
import pandas as pd # Give this package a name 'pd'
df = pd.DataFrame(data = [1,2,3])
df.head()

In [None]:
# Try to import something that isn't installed
import geopandas
# Error: not installed, so you can't import it

In [None]:
# Pip is a 'package mangement system' for Python. Use it to install packages, if needed.
%pip install geopandas
import geopandas

In [None]:
# You can also "un-import" the module
del geopandas
# And use pip to remove it
%pip uninstall geopandas -y

In [None]:
# You can also see what packages are loaded
%pip list

#### Practice using an imported package
Let's create a pandas dataframe and use a few of its functions

In [None]:
# Create a list of lists, then load that data into a pandas DataFrame
data_lst = [[1,3,5],[7,9,11],[13,15,17]]  # This is using just the default Python language
# Now let's use pandas
df = pd.DataFrame(data = data_lst)
# Show the dataframe
df

In [None]:
# Get the average value of a column
print('Col 0 mean:',df[0].mean())
# To see pandas documentation for this function, Google "pandas series mean"
# Get the total of the first row
print('Row 0 sum:',df.loc[0].sum()) # Used pandas 'loc' and 'sum' functions

### Tip #3: Write "good" code
Use descriptive variable names.<BR>
Spend lots of time verifying the output from a cell is exactly what you expect.<BR>
Write comments so others can understand (and you can remember quickly when you come back to your code.

In [None]:
# Often, it is useful to  know the type of every variable.This often leads finding a bug in your code.
# Very bad variable name, as you might assume it is a "Python list"
strange_lst = pd.DataFrame([1,'2',[{'mobile':'(805) 867-5309'}]])
# Also, this is a complex data structure.
strange_lst

In [None]:
# Display the 2nd item in the list.
strange_lst[1]  # This works on a Python list, but not a pandas DataFrame

In [None]:
# I use the type() function a lot.
type(strange_lst)

In [None]:
# Show me the 2nd item in a pandas DataFrame
print(strange_lst.iloc[1])  # Use the position in the DataFrame (the 2nd element)
# Documentation for iloc: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
#
print(strange_lst.loc[1]) # Use the index of the item in the DataFrame (right now, the index is 0,1,2)
# Documentation for loc: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
# Print the type of the 3rd item in the DataFrame
print('Third item in the list:', strange_lst.iloc[2])
print('Type:',type(strange_lst.iloc[2]))

In [None]:
# Dig in on 3rd item
complex_item = strange_lst.iloc[2]
complex_item

### Tip #4: Use the Internet to help you write code
- Look for help and examples from the web. You can often find some useful code in less than 1 minute.<BR>
- stackoverflow.com is a great resource.<BR>
- google search: pandas merge two dataframes side by side<BR>
- Leads quickly to: https://stackoverflow.com/questions/23891575/how-to-merge-two-dataframes-side-by-side

#### Practice using the Internet for code

In [None]:
# Question #1: You have a 'legend.jpg' file in your current directory. 
# Display that file below this cell.
# Your code here:

In [None]:
# Question #2: Display a scatter plot of the following points using the matplotlib package
x = [1,3,5,7,9]
y = [2.2,7.1,1.1,7.5,10.1]
# Your code here

### Tip #5
Get comfortable with data structures:
- udemy.com $13 course: https://www.udemy.com/course/python-and-jupyter-notebooks-for-beginners/
- Python native: lists, dicts, tuples.
- Get good at looping through any data structure: lists, dicts, DataFrames, arrays, etc.
- If working with AWS in Python, you need to be a master of dictionaries and JSON (use the json library). AWS responsds with JSON.
- Learn a lot about pandas: creating, editing, merging, apply(), filtering
- The NumPy module: Sometimes NumPy enhances or improve pandas.
- Simple visualizaiton with matplotlib. Useful for quick and dirty visual analysis.
- If doing machine learning, a great place to start is scikit-learn. It will teach the fundamentals, then other ML (including AWS SageMaker and other Cloud providers) algorithms will be easy.

In [None]:
# The range() function creates a range of numbers, often very useful for loops
x = range(5)
print('The x variable is of type:',type(x))
# Let's use x in a loop
for num in x:
    print(num)

In [None]:
print("\nLet's try an embedded loop using 2 ranges.\n")
# This is an embedded for loop:
for i in range(5):
    # For every i = 0,1,2,...4, do this    
    for j in range(2):
        # Print current i & 0, then i & 1
        print('i=',i,'j=',j)

#### Extreme datatypes

In [None]:
# Let's investigate datatypes
# Create a Python list, but make each item a different type
lst = [1,2.0,'Three',{'Four':4},('Five','Six',7)] # int, float, string, dict, tuple
# Display the list:
lst

In [None]:
# Let's append the strange_lst (actually a pandas dataframe) the real Python list
lst.append(strange_lst)
# Now look at the value and type of each item
for item in lst: # This is a real Python list, but it has a variety of datatypes in its elements
    print('Print the item:\n',item,', Type of item:',type(item))

In [None]:
# Question #3: Generate 10 random integers between 1 and 10 and store them in a pandas dataframe
# Your code here

In [None]:
# Question #4: Transpose your dataframe
# Your code here

### One last thing: In jupyter notebooks, order of cell execution matters..

In [None]:
# Restart the kernel and clear all outputs: Esc 0 0 (this is zero zero)
# This is a fresh beginning for the notebook.
# This variable will be gone until you rerun the cell that defines it.
lst