### Python Jump-start for Data Analysis
This is not your normal "getting started with Python tutorial". It is intended to help people who are progressing into using Python and Pandas for data analysis. They may have learned some Python previously, but that knowledge has gone cold. It could be used by a non-coder who is running someone else's Jupyter notebook --to demystify the gobbledygook. Let's start with the basics about the tutorial itself.</br>

__Using this Tutorial__
* This tutorial is in the form of a Jupyter notebook. These have an *.ipynb extension and contain two types of cells: 1) "markdown" text like this one and 2) Python code.
* You run an individual cell by clicking in it and hitting Shift+Enter. That is a good way to walk through a tutorial. You can also choose `Run / Run All Cells` to run everything at once
* You can clear previous outputs and reset variables using the `Kernel / Restart Kernel and Clear All Outputs` menu in Jupyterlab</br>

__Topics in this Tutorial__
* Variables in Python
* Working with Previously-written Open-Source code libraries
* Working with Project-Specific Code Libraries
* Working with Python Lists
* Working with Python Dictionaries
* Iterating Over Lists 
* Python range() objects
* Iterating Over Dictionaries
* The Python zip() to Co-Iterate and Combine Lists
<div style="text-align: right"> J.D. Landgrebe, </br>November 17, 2023 </div>

#### Let's get started with some Python basics

##### Variables in Python
Use "=" to set a variable's value
* Variables need to start with a letter, so `x1` is ok, but not `1x`
* Variable names are case sensitive, so `x1` is not the same as `X1`
* Variable names can contain underscore `_` characters but not other special characters
* We will get to this in a minute, but Python variables can store single values like these examples, or they can store more complex things like lists, dictionaries, DataFrame tables, functions and objects
</br></br>...and `print()` is a useful function. Don't forget the enclosing parentheses!

In [43]:
#Examples of setting variables and valid variable names
x = 2
name = 'Francisco'
first_last = 'Francisco Martin'
print(x)
print(name)
print(first_last)

2
Francisco
Francisco Martin


##### Working with Previously-written Open-Source code libraries
For many tasks like data analysis, you don't need to write code from scratch. You can import libraries and just use their methods and attributes. There are two kinds of such libraries. The first is open-source libraries that you likely are pre-installed along with Python on your laptop. Ask a support expert for help 
if you don't have a library you need. </br></br>A quick, library example is the `math` library (hopefully you have this one pre-installed!). It comes with `pi` as an attribute and has many useful functions such as the trigonometry example here.

In [23]:
#Import the pre-installed math library
import math

#Print pi from the math library
print('pi = ', math.pi)

#Use the cosine function in the math library
print('cosine(pi) = ', math.cos(math.pi))

pi =  3.141592653589793
cosine(pi) =  -1.0


One more note or convention worth mentioning about imported open-source libraries: It is common to import certain libraries and assign them to shortcut letters. This avoids needing to constantly type the full name of the library to access its methods. One example is the Pandas library for data analysis (and it is lowercase `pandas` to import. EVERYONE assigns this to the letters `pd` as below. Similarly, the Numpy (aka numpy) library for numerical work is  imported as `np`. You could use any letters, but don't. Nobody in the Python community will understand deviating from these conventions! These import statements are bringing in large and powerful code libraries and putting them at your disposal!

In [24]:
#Examples of importing open-source libraries (these statements will work if
#you have the libraries pre-installed on your computer)
import pandas as pd
import numpy as np

#print an empty Pandas DataFrame to show use of pd abbreviation for Pandas
print(pd.DataFrame())

Empty DataFrame
Columns: []
Index: []


##### Working with Project-Specific Code Libraries
The topic of custom, project-specific libraries can be a big source of confusion for beginners! Here is a quick explanation of how they work</br>

Project files such as this notebook may also import custom libraries as a way to isolate and use code that was previously developed and validated --maybe even by you. This is a great practice for making things modular and nice. Putting code in a custom library is also a way to conceal long code sections from someone who is not Python-knowledgeable and who just needs to run something from a notebook where short and sweet is ideal.</br>

Such custom libraries typically reside in files with a *.py extension. For those import statements to work, the *.py file needs to be visible or accessible to the importing notebook or file. That can get a little messy, but it often means that, without expert intervention, code will work when the project's collection of files is together but not if an individual notebook like this one is ripped away from the other files. This means that it is generally a good idea to copy an entire project folder versus trying to pick out a single, useful notebook.</br></br>
As an example, this project folder contains a very small code library called `our_demo.py`. It contains a function that prints the numbers up to a specified integer. This will work as long as `our_demo.py` is sitting in the same folder as this notebook. The cell below shows an example of running `our_demo.print_numbers()`.</br></br>
Here is what is in the `our_demo.py` file in case you do not want to take the time to open it and look for yourself. Having the code there means we don't need it cluttering up this Jupyter notebook.
```python
#Demonstration library used by python_jumpstart.ipynb
def print_numbers(i):
    for i in range(1, i + 1):
        print(i)
```

In [25]:
import our_demo

our_demo.print_numbers(5)

1
2
3
4
5


### Lists and Dictionaries
Python has two data structures useful for gathering multiple items together in an organized way. These come in handy for topics like working with a list of related values, data tables (aka DataFrames) or strings such as column names.</br>

##### Working with Python Lists
The first multi-item data structure is lists. Lists can have 0+ items in them, and they use square bracket notation. The items do not need to be the same data type but they often are. Lists have an order, so you can retrieve the first item or third through tenth items. Here are a couple of useful code snippets that set variables equal to lists or portions thereof

In [44]:
#An empty list
list1 = []
print(list1)

#A list with some items
list2 = ['a', 'b', 'c']
print(list2)

[]
['a', 'b', 'c']


You can append individual items to a list (this only works if the list already exists hence the initial `list3 = []` command

In [27]:
list3 = []
list3.append('z')
list3.append('x')
print(list3)

['z', 'x']


You can remove any list item like this

In [28]:
list3.remove('z')
print(list3)

['x']


You can access list items by their index and by using what is called "slice" notation. Both use square brackets to enclose the indices. Python indexes from zero, so the first item will be the 0th. In slice notation, the last item is -1; next-to-last is -2 etc. Here are a few examples on an example list

In [45]:
list_subjects = ['chem', 'physics', 'math', 'history']
print(list_subjects)

['chem', 'physics', 'math', 'history']


In [46]:
#First and last items
print(list_subjects[0])
print(list_subjects[-1])

chem
history


In [48]:
#Items from first to second (index 0 and 1)
#two different notations give the same result
# ":2" means up to but not including index 2
print(list_subjects[:2])
print(list_subjects[0:2])

['chem', 'physics']
['chem', 'physics']


In [32]:
#Items from the second to the last (Recall that 1 index refers to 2nd item)
print(list_subjects[1:])

#Second and third items (index 1 and 2)
print(list_subjects[1:3])

['physics', 'math', 'history']
['physics', 'math']


##### Working with Python Dictionaries
Lists are great, but what if you need to "key" the items to look them up by keys? That requires a dictionary. They have keys and values. Like lists, the values can be any data type including even lists. The keys should be strings or numbers. Here is an example of creating a dictionary outright or by adding individual items

In [33]:
#The dictionary uses curly brackets; use square brackets to reference an item by its key
convert = {'cm_to_in':2.54, 'ft_to_mi':5280, 'kg_to_lb':0.4536}
print(convert['ft_to_mi'])

5280


In [34]:
#Can add items but only to an existing dictionary 
car_types = {}
car_types['Porsche'] = 'Cayenne'
print(car_types)

{'Porsche': 'Cayenne'}


In [35]:
#Can change an item by setting it to a new value
car_types['Porsche'] = '911 Turbo'
print(car_types)

#Can delete an item using the `del` keyword
del convert['cm_to_in']
print(convert)

{'Porsche': '911 Turbo'}
{'ft_to_mi': 5280, 'kg_to_lb': 0.4536}


In [50]:
#Getting fancier - A dictionary where each item's value is a list
dict1 = {'letters':['a', 'b', 'c'], 'numbers':[1, 2, 3]}
print(dict1)

{'letters': ['a', 'b', 'c'], 'numbers': [1, 2, 3]}


##### Iterating Over Lists and Dictionaries
This is all great, but often the purpose of using a list or dictionary is to be able to iterate over its items. Lists are directly iterable.

In [36]:
#Re-create previous list of subjects
list_subjects = ['chem', 'physics', 'math', 'history']

#To iterate over all items, use a loop value like `subj` here
print('Iterate over all list items')
for subj in list_subjects:
    print(subj)

Iterate over all list items
chem
physics
math
history


If you need to work with the index of the list items, Python `enumerate` gives you the index inside the loop

In [37]:
for i, subj in enumerate(list_subjects):
    print(i, subj)

0 chem
1 physics
2 math
3 history


Of course, the slice notation is a way to subset the list and loop on just part

In [38]:
for subj in list_subjects[2:]:
    print(subj)

math
history


##### Python range() objects
Separate from iterating on lists, `range()` objects are a way to iterate a specific number of times to perform some action

In [39]:
#Range items can have just a single argument that defines the "stop before" limit
for i in range(3):
    print(i)
    
print('')

#Two arguments give a start and end (stop before) value
for i in range(4,8):
    print(i)

0
1
2

4
5
6
7


##### Iterating Over Dictionary Items
Sometimes you need to loop over a dictionary's items. Its `.items()` attribute holds the keys and values.

In [40]:
convert = {'cm_to_in':2.54, 'ft_to_mi':5280, 'kg_to_lb':0.4536}
for key, val in convert.items():
    print(key, val)

cm_to_in 2.54
ft_to_mi 5280
kg_to_lb 0.4536


##### The Python zip() to Co-Iterate and Combine Lists
Python has a `zip()` function that can combine matched lists including creating a dictionary's keys and values from two, matched lists. This can be very useful. Here are two example usages

In [41]:
#Iterate over two (or more) matched lists with zip()
list1 = ['apple', 'banana', 'cherry']
list2 = [0.49, 0.79, 0.59]

for fruit, cost in zip(list1, list2):
    print(fruit, cost)

apple 0.49
banana 0.79
cherry 0.59


In [42]:
#Create a dictionary from two, matched lists
fruit_costs = dict(zip(list1, list2))
fruit_costs

{'apple': 0.49, 'banana': 0.79, 'cherry': 0.59}