# <center>NPS Python for Data Analysis Primer</center>
## <center> <img src='Images/NPS_Logo.jpg' height=250/></center>
<center style="font-size:24px">LTC Matt Smith</center>
<center style="font-size:24px">NPS Operations Research Dept.</center>
<center>matthew.smith@nps.edu</center>

## Lesson 1: Python and Jupyter Basics

# In This Notebook:

- [Why Python](#Why-Python)
- [Jupyter Notebook Environment](#Jupyter-Notebook-Environment)
- [Basic Python Syntax](#Basic-Python-Syntax)
    - [Indentation](#Indentation)
    - [Comments](#Comments)
    - [Assigning Variables](#Assigning-Variables)
    - [Installing Packages](#Installing-Packages)
- [Data Types](#Data-Types)
    - Individual Items: [Numbers](#Numbers), [Strings](#Strings), [Booleans](#Booleans), [None Type](#None-Type), [Type-Casting](#Type-Casting)
    - Collection of Items: [Lists](#Lists), [Tuples](#Tuples), [Sets](#Sets), [Dicts](#Dicts), [Comprehensions](#Comprehensions)
- [String Matching](#String-Matching)
    - [Regex](#Regex)
- [Code Structures](#Code-Structures)
    - [If-else-elif](#If-else-elif)
    - [For Loops](#For-Loops)
    - [While Loops](#While-Loops)
    - [Try-Except](#Try-Except)
    - [Functions](#Functions)
    - [Classes](#Classes)
- [Working with Files](#Working-with-Files)

# Why Python

Python is a general purpose and high level language, making it easy and intuitive to use.  While it did not start as a data science tool when first launched in 1991, its application for data science took off in the 2010s after the development of key libraries including numpy and scipy for numerical processing and advanced mathematics (2005), pandas for spreadsheets and data management (2010), scikit-learn for machine learning (2010), as well as other subsequent tools for natural language processing (nltk), data visualization (matplotlib, seaborn, bokeh, plotly, streamlit), deep learning (tensorflow, PyTorch), and many others. 

In addition to the new and growing suite of data analysis tools, python's roots as a general purpose language means it is powerful and useful in a wide range of settings, from programming back end servers (flask, django, etc), webscraping and crawling (urllib, scrapy, beautifulsoup), database management, and many other things.  This means python can serve as a single language used in every step of your workflow, making it production-ready.

Additionally, while interpreted languages such as Python tend to have inferior performance to lower-level programming languages, many of these extensions for computation-intensive tasks are developed and built upon lower layer Fortran and C implementations for fast and vectorized operations.  

For all these reasons, python is an increasingly popular language for data science and enjoys a robust and active development community.  If you are curious about a specific comparison between python and R for data science, check out this [Python vs R](https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis) article and infographic from datacamp.

# Jupyter Notebook Environment

Before jumping into python, lets give a quick orentation to the Jupyter environment.  The web-based jupyter notebook environment was of the main tools developed by the [IPython](http://ipython.org/) (Interactive Python) project, and you will often see the terms IPython and Jupyter used interchangably.  

These interactive jupyter notebooks have become one of the most useful tools in modern data science due to the fact that it allows the analyst to use the full power of the browser to directly explore the results of their coding and analysis.  Jupyter also allows you to create and share rich documents that combine markdown content with code (such as this notebook!).  And Jupyter is not limited to just Python.  The name Jupyter originated from the fact that the notebooks support Julia, Python and R, and they now support kernels (implementations of the language-agnostic Jupyter notebook environment) in over 40 languages. 

To use Jupyter on your local machine, we recommend using the Anaconda distribution, https://www.anaconda.com/download/.  The Anaconda Distribution comes with a bunch of useful things, including:
- Python 3 (the language)
- Anaconda Navigator GUI for spinning up various IDEs (Interactive Development Environments), such as Jupyter Notebooks adn Jupyter Lab
- The conda tool for managing python packages and environments


### Jupyter Notebook vs Jupyter Lab vs VSCode
You may notice that there are two Jupyter tools available: JupyterLab and Jupyter Notebook.  In general, Jupyter Notebook is a simpler environment for running python notebooks.  JupyterLab is intended to be the next generation interface that integrages the web-based notebooks with a more modern Interactive Development Environment (IDE). This means that JupyterLab comes with some additional tools and functionality, including:
- A built-in table of contents that is built off of the markdown headings (very useful!)
- Ability to render multiple panes side by side
- Better ability to view and edit .py files and other file types
- Ability to execute any selected block of code (or a single line of code) rather than an entire cell

See https://stackoverflow.com/questions/50982686/what-is-the-difference-between-jupyter-notebook-and-jupyterlab/ for some additional discussion. 

Another IDE for running python code and notebooks is [**VSCode**](https://code.visualstudio.com/).  This is quickly gaining popularity as a robust tool for data analysis workflows due to its simple interface and powerful built-in tools like git version control, integrated terminal, package managers, ability to establish remote connections, community extensions, and more.

If new to python and Jupyter it can be helpful to start with Jupyter Notebooks or JupyterLab due to its simpler look at feel.  However, it's worth checking out VSCode as well, as the deeper you get into your data analysis and software development journey, the more valuable you will find the tools and functionality that VSCode provides. 

### Jupyter Interface

The main Jupyter interface gives a view of all the folders, notebooks (with the .ipynb extension for IPython Notebook), and other files available in your workspace, and will look something like this:

![image.png](attachment:image.png)

You can click on any notebook to run it.  To create a new Python notebook, use the New -> Python button in the upper right of JupyterLab environment, or File -> New File in VSCode.  You can use New -> Terminal to launch a terminal/command line interface, which can be handy for managing your computing environment.  

### Running Commands

OK, now lets finally run some code!

To run a cell, you can either hit the "Run" button (looks like a Play button), or type `Shift-Enter` to run a cell and select the cell below.  It's also sometimes useful to use `Alt-Enter`, which runs current cell then <i>creates</i> a new cell below, or `Ctrl-Enter` to run and remain in the current cell.

In [None]:
#Type Shift-Enter to run the cell and view the output
print('Hello')

There are also several useful options, such as running all cells above, all cells below, or all cells in the entire notebook.  

### Inserting New Cells

One of the advantages of Jupyter is that you can tinker and execute your code in chunks, running one cell at a time.  To add new cells at any point to play around with some code (a great idea!), you can use the Insert -> New Cell Above/Below menu options, or you can use the `+` button (next to save button) to insert new cell below.  You can also use the other handy shortcut buttons to cut and paste cells or move them up and down throughout the notebook.  

### Interrupting Running Code

If you run code and get stuck in a long or infinite loop (we've all been there), you can kill the running code by hitting the STOP button (next to the run button). If the whole notebook seems to be frozen or just doesn't seem to be acting right, it might be worth restarting the Kernel using the Kernel menu options above.  

### Tab Completion

You can use tab completion to view a variable name with just the first few letters.

Note: In VSCode, most of this information will show up automatically, so Tab completion may not do anything in that environment.  

In [None]:
#Run this cell to load the variables
player_name = 'Tiger Woods'
player_city = 'Jupiter, FL'

In [None]:
#Uncomment following line, then type TAB after the 'player' to see the different possible variables
#player

Tab completion also can let you know the avaialble methods (functions) or attributes (characteristics) of a given object.  Place the cursor `player_name.[HERE]` and then hit TAB to see the available string methods

In [None]:
#Uncomment following line, then type TAB after period
#player_name

Tab can also be used within a function to learn more about a specific function.  Move the cursor `my_name.split([HERE])` and type "Shift-Tab" to see documnetation and options for this function.

In [None]:
#Move the cursor in between the parentheses, then type "Shift-Tab" to see the method's documentation
player_name.split()

In [None]:
#Tab completion is also useful to get the name of an attribute within a function.
#Move cursor to after the word 'max' and hit TAB to reveal the function option called 'maxsplit'
#player_name.split(max)

In [None]:
#Having learned about the split() method, we can try it out!
player_name.split()

### Introspection

Add a ? after any variable or function to learn more about it.  Run any of the cells below to display the help info.

In [None]:
player_name?

In [None]:
player_name.split?

In [None]:
##Use wildcard to search for a method
player_name.*find*?

In [None]:
#Use double ?? to get full source code (if available)
player_name.rfind??

### Magic Commands

Jupyter provies many useful shortcuts we can execute using a % symbol, known as magic commands.  These are not python commands, but rather are jupyter/IPython commands that facilitate common tasks or provide useful information about the running python code.  

In your local environment, create a `scripts` folder, and within that folder create a new file called `print_todays_date.py` with the following code:

```
import datetime as dt

today = dt.date.today().strftime('%Y-%m-%d')
print(today)
```

Then run the Jupyter cell below to execute that file.

In [None]:
#Run another .py file
#Ucomment and run after adding the .py file
#%run scripts/print_todays_date.py

In [None]:
#Use timeit to see how long a script takes
import numpy as np
a = np.random.randn(100,100)
%timeit np.dot(a,a)

Another useful Jupyter feature is that you can directly run terminal commands by preceding them with the `!` character.  

In [None]:
#Can also trigger terminal commands with !
#This is equivalent to opening a terminal and running "python --version"
!python --version

To view and explore the full list of commands, see https://ipython.readthedocs.io/en/stable/interactive/magics.html.  

### Keyboard Shortcuts

These can be very useful.  See Help -> Keyboard shortcuts for full list.  

As an example, type Esc-a (Press and release ESC, and then press and release "a") to insert new cell above, or Esc-b to insert new cell below.

### Markdown

Markdown offers many powerful ways to add pretty formatting to the notebook.  To inspect the markdown code, go to a cell and hit 'Enter' to get into edit mode (or double click in a Markdown cell).  To convert the markdown code back into the formatted display, simply "run" the cell (e.g. with Shift-Enter, or clicking the Run button).  

To create a new markdown cell, you can Insert a new Markdown cell using the options in your IDE.

Another useful trick is to convert a code cell to a markdown cell.  To do this, while the cell is active hit `Esc` key followed by `m` key (or hit `y` key to get into code mode).  Then can hit `Enter` to get back into Edit mode.

Make equations in Latex:

$$
y = \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ \\ y_n \end{bmatrix}_{n \times 1}
=
\begin{bmatrix}
1 & x_{1,1} & x_{1,2} & \ldots & x_{1,p} \\
1 & x_{2,1} & x_{2,2} & \ldots & x_{2,p} \\
\vdots  & \ddots \\
\vdots \\
1 & x_{n,1} & x_{n,2} & \cdots & x_{n,p}
\end{bmatrix}_{n \times (p+1)}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \vdots \\ \\ \beta_p \end{bmatrix}_{(p+1) \times 1}
=X\beta
$$

Make lists:
- point 1
- point 2
  - sub point
  - sub point
  
Format text using html-like tags to <i>italics</i> or <b>bold</b>, or use shortcuts to *italics* or **bold**.

You can also highlight when you have a code snippet with backticks (same button as the ~, in upper left), `print('Hello')`.

For multi-line codes, you can use triple-backticks:

```
a = 7
print(a**2)
```

For added formatting, specify a specific code language with <code>```python</code> as the first line.  This is known as a *code fence*.  

```python
import math
a = 7
print(a*math.pi)
```


You can also inject regular HTML code:
<center style="font-size:24px">Large Centered Text</center>
<p style="font-size:10px; color:red">Some small red text</p>


For much more, see [this Jupyter markdown tutorial](https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook)

# Basic Python Syntax

### Indentation

Python explicitly uses indentation, not braces, to parse the written code.  After a colon follows a block of code, all of which must be indented by same amount of whitespace.

In [None]:
#So this will work:
if (2>1):
    print('That is True')

In [None]:
#But this won't work (uncomment next 2 lines and run to see):
# if (2>1):
# print('That is True')

### Comments

Comments use a #

In [None]:
#This is a comment

There's no built-in way to do multi-line comments, but a couple tricks you can use:

In [None]:
#First trick is to use a multi-line string in between three single or double quotes

'''
This is a multi-line string,
but python will just see it and ignore it
so we can use it as a comment
'''
#Resume code
a=1

In [None]:
#Another trick is to use jupyter shortcut to add or remove comments from a range of lines
#Highlight the following lines then press CTRL and the "/" key together to comment all lines
#Then press CTRL-/ again to uncomment

a = 1
b = 2
print(a+b)

### Assigning Variables

Variables are assigned using a simple equal sign.

In [None]:
a = 1
a

We can also assign multiple variables at once as follows

In [None]:
a, b, c = 1, 2.2, 'three'
print(a,b,c)

Or we can assign multiple variables to the same value

In [None]:
a = b = c = 7
print(a,b,c)

### Installing Packages

While python has quite a few built in data types and coding structures, nearly any python data analysis workflow will require importing packages to handle the specialized tasks.  Thankfully, the syntax and mechanics of reading in packages are easy, as shown below.  

In [None]:
#We can import an entire package to use it:
import math
print(math.pi)

When we run `import package`, we import the entire package, including all its modules, and sub-modules, and sub-sub-modules, etc.  Sometimes it makes sense to only import a specific module from a package if that's all we need, which we can do with the following syntax:

In [None]:
from numpy import sqrt
sqrt(16)

In [None]:
#It's also common to use an alias for modules we import
import numpy as np

#Now we can reference any numpy modules off of "np"
np.sqrt(16)

If a package is not available in your current computing environment, you may have to *install* it into your environment before you *import* it into your current workspace.  For example, here is a fun package `faker` which may not be automatically installed in your environment.

In [None]:
#This might not work if not installed in the current python environment
import faker

If you got a `ModuleNotFoundError`, don't fret, you can simply install the package.  

All python pacakges are available from the public package repository, known as [PyPI](https://pypi.python.org/pypi), which stands for Python Package Index.  We can run the `pip` command from the terminal command line to install the package into our current environment, or we can use the ! command to run a terminal command directly from the notebook as follows:

In [None]:
#Uncomment and run to install
#!pip install faker

In [None]:
#Now we should be ablet to import and use the package!
import faker

#Use to create some fake names
print('Some fake names using faker...\n')
from faker import Faker
fake = Faker()
for _ in range(5):
  print(fake.name())

# Data Types
Python has built in data types to handle individual items, such as numbers, strings, or booleans, as well as composite data structures to handle a collection of items, including lists, sets, dicts, and tuples. 

### Numbers

Number can be int or float types

In [None]:
a = 1
b = 1.2
type(a),type(b)

Even though ints and floats are different types, python will usually transition naturally from one to the other when it makes sense.  Of course, sometimes the distinction matters so always good to understand what type you are working with.

In [None]:
a = 10 #int
b = 3 #int
c = a / b #int or float?  
print(c,type(c))

In addition to basic arithmetic like `+`, `-`, `*`, `/`, here's the sytax for other common operations:

In [None]:
#exponents
print('2**3 = ',2**3)

#modulo (remainder)
print('10%3 = ',10%3)

#Divisor (floor division)
print('10//3 = ',10//3)

Any arithmetic operators can also be used to update a variable when joined with the equal sign.

In [None]:
a = 10
a += 10 #same as a = a+10, and results in a = 20
a

**Student Challenge**

We've covered a lot so far.  Here's a quick challenge to test your skills!

Below are some other ways we can update a python variable `a`.  Create a new *Code* cell below, create a new variable with `a = 10.0` which will initialize `a` as a *float* variable, then update the variable `a` using any of the equations shown here.  Verify the udpated value of `a`, and also see what data type it is (i.e. `int` or `float`).

In addition to creating and running the code cell, also add a Markdown cell after that so you can write "Challenge successful!". 

Ways to update the value of variable `a`:  
- `a -= 5`
- `a *= 5`
- `a /= 5`
- `a **= 2`  Same as a = a**2 (a squared)
- `a %= 5`  a = a%5, or a modulo 5
- `a //= 5` a = a//5, or floor division

### Strings

Python has a tremendous amount of built-in functionality to work with and analyze string data, which is cited as one reason for its popularity.   

Strings can be defined with single or double quotes.  This flexibility is nice, and also means we can create a string that contains one of the quote types if we want.

In [None]:
s1 = 'this works'
s2 = "this works too"
#Make a string that contains double quotes by enclosing in single quotes
s3 = 'He said "Hello."'
print(s3)

We can also define an *empty string*, which can be useful way to indicate that we have a string object which doesn't have any data yet.

In [None]:
s_empty = ""
s_empty

We can also create a multi-line string using 3 single or 3 double quotes, which can be a useful functionality for natural language processing, formatting output, or several other applications. 

In [None]:
s = """
I want an officer for a secret and dangerous mission.
I want an Army ORSA.
"""
print(s)

We can automatically inject variable values into a larger string using the f-string formatting functionality, which is very useful for creating properly formatting strings for display or coding purposes.  This f-string formatting was recently released as of python 3.6, and has supersceded previous string formatting of %-formatting and str.format().

In [None]:
#Different string formatting options
name = "Tiger Woods"

#Original % formatting
print('Value of name is %s' %(name))

#Next was str.format
print("Value of name is {}".format(name))

#New and better way is with f-strings
print(f'Value of name is {name}')

Since python 3.6, f-string formatting is the preferred method due to it's more concise structure and improved performance.  See https://realpython.com/python-f-strings/ for a good overview.

All formatting options have many options for how to display and format numerical data.  Here are a few options shown below.  Can also view python documentation at https://docs.python.org/3/library/string.html#format-string-syntax.  However, that documention is dense, so can be useful to view the old guide at  https://pyformat.info/ for a more useful description of the formatting codes, which can all be used with f-strings.

In [None]:
a = 0.12345
b = 12345

#To specify formatting, add a colon followed by formatting code
print(f"{a:.3f}") #Format as float with 3 decimal points
print(f"{a:.2%}") #Format as percentage with 2 decimal points

print(f"{b}") #Format as integer (same as code "d")
print(f"{b:,}") #Format using commas
print(f"{b:.2e}") #Format as scientific notation

In [None]:
#Another very useful f-string functionality is the {var=} shorthand to display a variable name and its value
#We can add in formatting with a colon after the = sign
print(f"{a = }, {b = :,}")

In [None]:
#We can manipulate the case of strings, which can be useful for matching and cleaning string data
name = 'Tiger woods'
print("name = ",name,"\n")

print("name.upper() = ",name.upper())
print("name.lower) = ",name.lower())
print("name.title() = ",name.title())

We can grab a substring from a string object using the format `name[start:end]`, which makes this a good time to bring up an important point in how python handles indexing in general:
- The starting index of the first element of any python sequence is always 0, not 1.  You can think of the index as specifying the *offset* from the beginning.  
- The ending index specifies the first item that we *don't* include, meaning we go up to, *but not including*, the end index

Here's a useful function to help visualize the index values.

In [None]:
#Create function that displays table showing indices of any python sequence
from IPython.core import display
from IPython.core.display import HTML
def display_indices(seq):
    #display indices of any python sequence x
    html_code = "<table><tr>"
    for i in range(len(seq)):
        html_code += "<td>{}</td>".format(i)
    html_code += "</tr><tr>"
    for x in seq:
        html_code += "<td>{}</td>".format(x)
    html_code += "</tr></table"
    display.display_html(HTML(html_code))

In [None]:
#Check out index values for each character in name
display_indices(name)

In [None]:
#How would we get just the first name?  We start at index 0, then stop before reaching element 5
name[0:5]

In [None]:
#If you leave off the start, python assumes index 0 by default
print(f'{name[:5] = }')

#If you leave off end index, python just keeps going until the end
print(f'{name[5:] = }')

#You can also go backwards from the end using negative index
print(f'{name[-1] = }')
print(f'{name[-3:] = }')

**Student Challenge**

Try to print only the last name from the variable `name`.

In [None]:
#Try to grab just the last name:


In [None]:
#We can search for substrings with the find function, which returns the index where substring starts
print("name.find('oo') = {}".format(name.find('oo')))

#Note that find returns only the FIRST occurence of the substring
print("name.find('o') = {}".format(name.find('o')))

#If substring does not appear, find returns -1
print("name.find('xyz') = {}".format(name.find('xyz')))

Some other useful string functions:

In [None]:
s = " apples, oranges, bananas.  "
print("s = '{}'\n".format(s))

#remove any 'whitespace' from beginning or end of string
print("s.strip() = '{}'".format(s.strip()))

#If we want to remove any spaces OR period:
print("s.strip(' .') = '{}'".format(s.strip(' .')))

#Or we can only strip leading characters with s.lstrip(), or rear characters with s.rstrip()
print("s.lstrip() = '{}'".format(s.lstrip()))

#Replace substring
print("s.replace('apples','kiwis') = '{}'".format(s.replace('apples','kiwis')))

#We can REMOVE a substring by replacing it with an empty string 
print("s.replace('apples','') = '{}'".format(s.replace('apples','')))

#Count occurences
print("s.count(',') = '{}'".format(s.count(',')))

One subtle point you may have noticed is that running a command like `s.strip()` doesn't update the variable `s` itself, it only returns a *new* string value which we can choose to use or ignore.  To actually update the string, you need to re-assign the output of the function back to the variable.

In [None]:
s = " apples, oranges, bananas.  "
s.strip(' .')
#s hasn't changed
print(s)

s = s.strip(' .')
#Now s is actually updated
print(s)

Another useful string function is split(), which splits a string into a list of substrings

In [None]:
s = " apples, oranges, bananas.  "
print('s = "{}"'.format(s))

#By default s.split() will split on spaces
print("s.split() = {}".format(s.split()))

#Note that the first element is 'apples,', a string that CONTAINS a comma
#To choose characters we split on, use the 'sep' argument
print("s.split(sep=', ') = {}".format(s.split(', ')))

The join() method will join, or combine elements of a sequence into a single string.

In [None]:
" - ".join(['apples','oranges','bananas'])

We can also check if our string startswith or endswith anything in particular

In [None]:
s = " apples, oranges, bananas.  "

s.startswith('apples') #False
s.startswith(' apples') #True
s.endswith('bananas') #False
s.endswith('.  ') #True

### Booleans

Booleans in python are designated with the protected words True and False.

In [None]:
a = True
b = False
print(a,b)

In [None]:
#Boolean expressions
a, b = True, False
print('a = {}, b = {}'.format(a,b))

#a AND b
print('a&b = {}'.format(a&b))

#a OR b
print('a|b = {}'.format(a|b))

#a XOR b
print('a^b = {}'.format(a^b))

#negation
print('not a = {}'.format(not a))

### None Type

Python has a `None` type, used to indicate that a variable exists but isn't assigned a value yet.  This is similar to the `NA` type in R.

In [None]:
a = None
print(a,type(a))

Using None is a useful way to take some action only if a variable exists.  For example, often times a function argument will have None as its default value, meaning it will get ignored unless the user specifies a value.

In [None]:
a = None
b = 3

if a:
    print('a exists!')
else:
    print("a doesn't exist!")
    
if b:
    print('b exists!')
else:
    print("b doesn't exist!")

### Type-Casting

Python allows variables of one type to be cast in another.

In [None]:
my_int = 10
my_float = 20.0
my_str = '30'
my_bool = True

#Try these to see what happens
print(f"{int(my_float) = }") #20
print(f"{float(my_int) = }") #10.0
print(f"{str(my_float) = }") #'20.0'
print(f"{int(my_str) = }") #30
print(f"{int(my_bool) = }") #1
print(f"{int(not my_bool) = }") #0

Note that the boolean casting essentially gives us a way to indicate if a variable exists and contains data.  This can be useful when we start building control structures, such as if-else statements that should take a certain action if the variable contains data.

In [None]:
print('bool("string data") = {}'.format(bool("string data")))
print('bool("") = {}'.format(bool("")))
print('bool(5) = {}'.format(bool(5)))
print('bool(None) = {}'.format(bool(None)))

### Lists

Lists are one of the workhorse data types in python.  Lists are a sequence of python objects, defined in square brackets

In [None]:
num_list = [5,4,3,2,1]
type(num_list)

In [None]:
#Lists don't have to contain only one data type
#Each list element can be any python object, even another list
x = [3,2,1,0.5,'Blast Off!',[1,2,3]]
x

In [None]:
#What's the length of this list?  6 or 8?
#Length is only 6, because the last list [1,2,3] counts as a single element of the list x
len(x)

As discussed earlier, indexing in python always starts with 0 for the first element, and the ending index specifies the first element to ignore, i.e. we stop before reaching the end index.

In [None]:
#List indexing starts at 0
name_list = ['Alice','Bob','Charlie','Denise','Eddie']
#Make a copy for later
name_list_copy = name_list.copy()
print('name_list = ',name_list,'\n')

print('name_list[1] = {}'.format(name_list[1]))
print('name_list[0:3] = {}'.format(name_list[0:3]))

#Python assumes start is 0 if not specified
print('name_list[:3] = {}'.format(name_list[:3]))

#Python assumes end index is the end of the sequence if not specified
print('name_list[3:] = {}'.format(name_list[3:]))

#We can use negative indexes to go backwards from the end
print('name_list[-1] = {}'.format(name_list[-1]))
print('name_list[-3:] = {}'.format(name_list[-3:]))

In [None]:
#We can find the index of a given item with the index() method
name_list.index('Denise')

There are many built in functions to work with and manipulate lists.  A few key ones are given below.  

In [None]:
#Add new elements to end with append
name_list.append('Fred')
name_list

In [None]:
#What if we want to add a few more names?
new_names = ['Grace','Henry','Ignacio']
name_list.append(new_names)
#What is name_list now?? Take a look
name_list

Hmm, that's probably not what we wanted.  Lets delete the last element, then try again to add it.

There are a few different list methods for removing elements:
- `name_list.remove(x)` removes item x from the list, e.g. `name_list.remove(new_names)`
- `del name_list[i]` deletes the item at index i, or `del name_list[i:j]` deletes items in index range i to j.  To delete just the last element, can use `del name_list[-1]`.
- `name_list.pop(i)` pops out the element at index i.  That is, it removes that item from the list AND returns it to the user, which can be a useful functionality.

In [None]:
#Remove the list of new_names from name_list
name_list.remove(new_names)

#Note: we also could have removed the last element with name_list = name_list[:-1]
name_list

In [None]:
#Here's how del would work
del name_list[0:2] #Remove the first 2 elements
name_list

In [None]:
#Using pop
deleted_name = name_list.pop(-1)
print(deleted_name, name_list)

Reset name_list to its original version, and try again to add the list new_names

In [None]:
name_list = name_list_copy.copy()
name_list

In [None]:
#Combine two lists with the + operator
name_list = name_list + new_names 
#We also could have done name_list += new_names
#We also could have done name_list.extend(new_names)
name_list

In [None]:
#Reassign a given index
name_list[2] = 'Reggie'
name_list

In [None]:
#Check if an item is in a list
print('Henry' in name_list,'Susan' in name_list)

In [None]:
#We can count the occurences of a given item with count() method
y = ['a','b','c','a','a']
y.count('a')

A note of caution: be careful when making a copy of a list.  In python, if you try to make a copy by saying `name_list2 = name_list`, then we don't get a new variable.  Instead, there is still only one python object in memory, and that one object has two references (names) pointing to it.  Hence, changing one variable will change the other.  

In [None]:
name_list2 = name_list
name_list2[1] = 'Xander'

#What does name_list now look like??
name_list

In [None]:
#To avoid this issue, you can make a copy of a list to get a truly new variable
name_list2 = name_list.copy()
#Now name_list2 can be manipulated without altering name_list
name_list2[0] = 'Zoolander'
print(name_list)
print(name_list2)

Be careful trying to perform arithmetic on each item in a list.  For multiplication, `n*list` or `list*n` will repeat the list n times.  To perform arithmatic to each element, we either need to use a list comprehension or utilize numpy, both of which are discussed later in this lesson. 

In [None]:
num_list = [1,2,3,4]
3*num_list

In [None]:
num_list*3

Python also has built-in functions to sort lists, either numerically or alphabetically.

In [None]:
num_list = [3,2,6,7,3,1,4]
print(f'{num_list = }')

In [None]:
#We can act on the list with python's build in sorted()
sorted(num_list)

In [None]:
sorted(num_list,reverse=True)

In [None]:
#We can also sort a list by a custom key function
#Here sort by distance from the mean
list_mean = sum(num_list)/len(num_list)
print(f"{list_mean=}")

In [None]:
def dist_from_mean(num):
    return abs(num-list_mean)

#Sort list by absolute distance from mean
sorted(num_list,key=dist_from_mean)

In [None]:
#Note that at this point, the num_list hasn't been altered
num_list

To sort the list itself, we can either assign the sorted output to the list with `num_list = sorted(num_list)`, or we can use the list methods `sort()` or `reverse()` to act on and alter the list in place.

In [None]:
#To alter the list itself, we can use list.sort() 
num_list.sort()
num_list

In [None]:
#Can also alter the list in place with num_list.reverse()
num_list.reverse() #Same as num_list.sort(reverse=True)
num_list

### Tuples

Tuples are similar to lists in that they are a sequence of objects, but they are different in that they are **immutable**, meaning they cannot be changed.  

Tuples are defined between () as follows

In [None]:
my_tup = (5,4,3,2,1,0.5,'Blast Off!')
my_tup

In [None]:
#We can also define tuples with just commas and no parentheses
my_tup2 = 1,2,3
my_tup2

As mentioned, tuples are immutable, meaning we cannot alter them

In [None]:
#This will give an error if you try to run it
#my_tup[0] = 10

You may have two burning questions at this point:
1. How the heck do you pronounce tuple?  Turns out there are 2 accepted answers in practice.  As Guido van Rossum, the creator of Python [tweeted](https://twitter.com/gvanrossum/status/86144775731941376), "I pronounce tuple too-pull on Mon/Wed/Fri and tub-pull on Tue/Thu/Sat. On Sunday I don't talk about them. :)"
2. More importantly, why do we need lists and tuples?  Turns out there can be advantages to having an immutable data type that cannot be changed, i.e. so a human cannot mess it up.  Those from a Lean Six Sigma background may be familiar with concept of [Poka-Yoke](https://en.wikipedia.org/wiki/Poka-yoke), or mistake-proofing a process.  Additionally, if you don't need all the built in functionality of a list, then a tuple is a simplier way to accomplish your task.

In [None]:
#tuples can be indexed the same way as lists
print('my_tup = {}'.format(my_tup))
print('my_tup[2] = {}'.format(my_tup[2]))
print('my_tup[-3:] = {}'.format(my_tup[-3:]))

In [None]:
#Since tuples are immutable, they only have 2 methods: index() and count()
#You can confirm these are the only two by typing my_typ.[TAB] to see list of available methods
my_tup.index('Blast Off!') #returns index of item

In [None]:
my_tup.count('Blast Off!')

A common usage is to **unpack** tuples into individual variables

In [None]:
a, b, c = my_tup2
print("a = {}, b = {}, c = {}".format(a,b,c))

Even though we can't change a tuple, we can use a trick to update it by converting it to a list, changing the list, then converting back to a tuple.

In [None]:
print('my_tup = {}'.format(my_tup))
temp_list = list(my_tup)
temp_list[0] = 100
my_tup = tuple(temp_list)
print('my_tup = {}'.format(my_tup))

### Sets 

Whereas a list is an ordered sequence of objects, with items possibly repeated, sets are a sequence of unique objects.  Sets are defined as comma-separated values between {}.

In [None]:
my_set = {1,2,3,'four','five'}
my_set

In [None]:
#Add a single item with add() method
my_set.add('six')
my_set

In [None]:
#Add collection of items with update()
my_set.update([7,8,'nine'])
my_set

In [None]:
#Remove single item with remove()
my_set.remove('five')
my_set

In [None]:
#Can also remove an item with discard()
my_set.discard('four')
my_set

In [None]:
#Note: the discard() method will not raise an error if the item is not in the list, but remove() method will raise an error.
my_set.discard('Hello') #Will run without raising an error that stops your code

There also are useful methods for comparing and combining two sets.

In [None]:
a = {1,2,3,4,5}
b = {1,2,3,'four','five'}

In [None]:
#Intersection (elements in a AND b)
#Any of these will work
a.intersection(b)
b.intersection(a)
a & b

In [None]:
#Union (elements in a OR b)
#Any of these will work
a.union(b)
b.union(a)
a | b

In [None]:
#Difference (elements in one but not the other)
#Again either will do the same thing
print(f"{a.difference(b) = }")
print(f"{a - b = }")

In [None]:
#Symmetric differnce (elements in one set or the other, but not in both)
a.symmetric_difference(b)
a ^ b

In [None]:
#We can use any of these set methods to update the value of at set
#Set a equal to union of a and b:
a |= b #same as a = a|b
a

In [None]:
#Can also check if one set is subset or superset of another
b.issubset(a)

In [None]:
b.issuperset(a)

In [None]:
#Can also check if a value is in a set
'one' in a

### Dicts

Last but not least for the composite data structures is the dictionary object `dict`, which contains a mapping of keys to values.  The dict data type is one of the most important data types in python.

In [None]:
d = {1:'Alice',
     2:'Bob',
     'three':20,
     'four':[1,2,3,4]}
d

In [None]:
#We can now reference any value by referencing its key
d['three']

In [None]:
#If we reference a key that isn't in the dict, we'll get a key error
#d[10] #Will return a key error.  Uncomment and run to see.

In [None]:
#To avoid that issue, we can use the get() method to specify a default value to return if key isn't found
d.get(10,'default value since key is not in dict')

In [None]:
#We can add new key-value pairs
d[5] = (3,2,1)
d

In [None]:
#Or reassign an existing key, since dict can only have one entry per key
d[5] = 'new value'
d

In [None]:
#We can also manipulate an existing value in place.  E.g. we can append a value to the list at d['four']
d['four'].append(100)
d[1] += ' Smith'
d

In [None]:
#delete a key
del d[5]
d

In [None]:
#Check if item is one of the keys in the dict
2 in d

In [None]:
#Combine two dicts
d2 = {7:'Smith',8:'Jones'}
d.update(d2)
d

In [None]:
#We can access sequence of keys or values from a dict
list(d.keys())

In [None]:
list(d.values())

In [None]:
#Or we can access sequence of (key,value) tuples with the items() method
#This can be useful for looping through the items, as we'll see in the section on for loops
d.items()

In [None]:
#We can also have nested dicts, where the value is itself a dict
#This is essentially the format of the popular JSON data format
d[10] = d2
d

This nested dict structure if very common in many data analysis applications.

### Comprehensions

Comprehensions are an extremely useful technique for quickly creating composite data structures (such as lists, tuples, sets, dicts) by automatically grabbing and altering each element in a sequence.  That probably made no sense, so lets look at some examples.  

In [None]:
num_list = [1,2,3,4,5,6,7,8,9,10]

#What if want a list that replaces each element in num_list with a str version of that element?
#We could start with an empty list, then append one item at a time as we loop through the items in num_list
#OR we can concisely build it using a comprehension as follows:
str_list = [str(i) for i in num_list]
str_list

In [None]:
#We can add conditions, e.g. only capture even numbers
[str(i) for i in num_list if i%2==0]

In [None]:
#By the way, you might be tempted to try str(num_list)
#...but that will do something different...
str(num_list)

It is common to use comprehensions to create dict objects.  In these cases, we can take advantage of the `zip` function which combines multiple sequences along side each other.

In [None]:
keys = [1,2,3,4]
values = ['Alice','Bryson','Charlie','Doug']

#Use zip to combine multiple sequences side by side
z = zip(keys,values)
list(z)

In [None]:
#Use zip to build dictionary comprehension
keys = [1,2,3,4]
values = ['Alice','Bryson','Charlie','Doug']
d = {k:v for k,v in zip(keys,values)}
d

Another handy python function is enumerate, which zips together any sequence with a matching sequences of indexes.  

In [None]:
for i, n in enumerate(values):
    print(i,n)

In [None]:
#Which again we can use to create a dict comprehension
d = {i:n for i, n in enumerate(values)}
d

# String Matching

As mentioned, python has a reputation for providing an extensive set of tools for working with string data.  Now that we've covered strings and lists, it's worth taking a quick look at Regex (regular expression), a popular and essential technqiue for searching and cleaning string data, which is often a crucial step in data analysis workflow.  

### Regex

Regular Expression, aka regex or regexp, is a popular technique used in many programming languages to search for specific patterns within a string.  Python has a built-in package `re` to handle regex techniques.  A couple quick examples are given below.  For a good quick orientation see the [W3 Python regex tutorial](https://www.w3schools.com/python/python_regex.asp), or for more in depth overview see the [python regex documentation](https://docs.python.org/3/howto/regex.html).

In [None]:
import re

In [None]:
#Say we want to extract the phone numbers from this string
s1 = """Alice's number is 111-222-3333, and Bob's is 444-555-6666"""

To extract the phone numbers, we first have to identify the regex **pattern** we are looking for.  Here, the pattern would be 3 digits (regex character `\d`), followed by a `-`, followed by 3 more digits, etc.  Next, we need to find that pattern in the string using one of the re searching functions.  Here's how it looks using `re.findall()`.

It's also common to build regex patterns using *raw strings*, designated as `r'some_raw_string'`.  The reason for this is that many regex patterns contain the backslash character `\`, which is used as an *escape* character in regular python strings to represent some special symbol or behavior.  

> Fun fact: For the first year+ of using regex, I thought `r'...'` represented a *regex* string, not a *raw* string.  Now I know!

In [None]:
#Here's how we can find all matches of a phone number pattern
pat = r'\d\d\d-\d\d\d-\d\d\d\d'
re.findall(pat,s1) #Return a list of all matches of the pattern in the string

Try running the above cell without the `r` before the string to see if it works.  It may work with a warning, or it may give an error, depending on your python version.

In [None]:
#We can make the pattern more concise by specifying how many of a given item we want
#Here, \d{n} means we want exactly n occurences of a digit
pat = r'\d{3}-\d{3}-\d{4}'
re.findall(pat,s1) #Return a list of all matches of the pattern in the string

In [None]:
#What if the numbers can use a - or . to separate numbers?
s2 = """Alice's number is 111-222-3333, and Bob's is 444.555.6666"""

#We can specify a set of characters to match by putting them in square brackets
pat = r'\d{3}[-.]\d{3}[-.]\d{4}'
#[-.] means match a SINGLE character that is either a "-" or a "."
re.findall(pat,s2)

In [None]:
#We can also use number ranges
pat = r'[0-9]{3}[-.][0-9]{3}[-.][0-9]{3}'
re.findall(pat,s2)

In [None]:
#What happens if the pattern doesn't exist?
pat = '[0-9]{9}' #Look for 9 straight digits, which doesn't exist in our string
#In this case, we'll just get an empty list back
re.findall(pat,s)

# Code Structures

### If-else-elif

In [None]:
#Syntax for if-else control structures
if 5>3:
    print("That is True!")
else:
    print("That is False!")

In [None]:
#We can also use the elif keyword if we have subsequent conditions
if 1>10:
    print('a')
elif 5>10:
    print('b')
elif 15>10:
    print('c')
else:
    print('d')

In [None]:
#We can create composite booelan statements
#Challenge: Try to edit condition so it is True
if (5>3 & 7=='7'):
    print("Both are True!")
else:
    print("Initial condition was False")

In [None]:
#if statements are also a useful way to check if a variable exists and has a value (i.e. is non-empty)
#if we feed in a None type or empty variable (empty string, list, etc), if will return False
a = None
b = [] #empty list
c = [1] #non empty list
d = "" #empty string
e = "1" #non empty string
#Try feeding in these different variables to the "if" condition
if (b):
    print("That is True!")
else:
    print("That is False!")

### For Loops

For loops in python take advantage of the ability to loop through the elements of any *iterable* sequence, such as a list.

In [None]:
num_list =[1,2,3,4,5,6,7,8,9,10]
for n in num_list:
    print(n)

In [None]:
#python has a built in function range() which gives a sequence of numbers 
#range(n) gives sequence of integers from 0 to n-1 (that is, n total numbers)
for n in range(5):
    print(n)

In [None]:
#We can also optionally specify start or step of the range, as long as all parameters are integers
for n in range(10,20,2):
    print(n)

for loops also have the standard continue and break keywords, which can be very useful for flow control.

In [None]:
#Only print even numbers, and break out of the loop when we get past 10
for n in range(20):
    if n%2==1:
        #number is odd, so skip ahead to next value in the for loop with continue
        continue
    if n>10:
        break
    print(n) #Should only print even numbers <= 10

### While Loops

In [None]:
#syntax for while loops:
i = 0
my_list = []
while(i<10):
    my_list.append(i)
    i += 1
my_list

### Try-Except

If you've coded before in any language, you know that an error in the code can bring the whole workflow to a crashing halt.  The python try-except control structure is a very useful way to try and execute some code that may or may not return an error without having to worry about crashing the notebook and thus preventing any subsequent code from running.

Consider this code where we want to try and type-cast a string as a float, which will raise an exception (python error) if the string does not represent a viable number.

In [None]:
str_list = ['1','2.2','three','Dog','3.14']
for s in str_list:
    try:
        print(float(s))
    except:
        print('Could not convert "{}" to a float'.format(s))

There are also different types of exceptions, and sometimes it makes sense to handle them differently.  For example, `float(s)` can return a ValueError if it cannot be translated to a float (e.g. for s = 'Dog'), or it can return a TypeError if s is not a valid argument (e.g. if s is a list).  

In [None]:
x_list = ['1','2.2','three','Dog',[1,2,3],3]
for x in x_list:
    try:
        print(float(x))
    except ValueError:
        print("Got a ValueError for float({})".format(x))
    except TypeError:
        print("Got a TypeError for float({})".format(x))

In [None]:
#We can use the else control to execute code only if the try command SUCCEEDS

str_list = ['1','2.2','three','Dog','3.14']
for s in str_list:
    try:
        f = float(s)
    except:
        #If try command raised an error, we get here
        print('Could not convert "{}" to a float'.format(s))
    else:
        #If try command succeeded (did not raise an error) we get here
        print('Success for converting "{}" to float'.format(s))
        

In [None]:
#And "finally", we can cover the finally control which will execute code,
#  regardless of whether the try command raised an error
str_list = ['1','2.2','three','Dog','3.14']
for s in str_list:
    try:
        f = float(s)
    except:
        #If try command raised an error, we get here
        print('Could not convert "{}" to a float'.format(s))
    else:
        #If try command succeeded (did not raise an error) we get here
        print('Success for converting "{}" to float'.format(s))
    finally:
        #Always runs this
        print('Done dealing with {}'.format(s))

## Functions

In [None]:
#Basic function syntax
def cube(x):
    return x**3
cube(10)

In [None]:
#We can add as many arguments as we want
def make_list(x,y,z):
    return [x,y,z]
make_list(1,2,3)

In [None]:
#And we can assign default values
def power(base,power=2):
    return base**power
#If we only specify base, then power gets default value of 2
power(10), power(10,3)

In python functions, arguments without defaults are called positional arguments, and those with default values are called keyword arguments.  Any keyword arguments must come after the positional ones.  If you use the name to reference any argument (positional or keyword), you can specify them in any order.

In [None]:
#This will work correctly, even though arguments are out of order
power(power=3,base=10)

It is also common to set default values to None, allowing the function to evaluate whether or not that argument exists.

In [None]:
def power(base,power=None):
    if power:
        return base**power
    else:
        return base
power(10)

We can also accept an unknown number of positional arguments using "*args" argument.

In [None]:
def make_list(*args):
    #All positional arguments are passed to a tuple called args
    return list(args)
print(make_list(1,2,3,4))
print(make_list('a','b','c','d','e'))

In [None]:
#*args can come after any number of other positional arguments
def make_list_with_name(name,*args):
    return [name] + list(args)
make_list_with_name('Matt',1,2,3,4)

We can also collect an unknown number of keyword arguments as "**kwargs".  Inside the function, these are available as a dict matching keyword names to values.

In [None]:
def print_kwargs(**kwargs):
    #within the funciton, kwargs is a dict matching any keyword names to values
    print('Keyword arguments:', kwargs)
print_kwargs(a=1,b=2,c=10)

In [None]:
#We can create a function that pulls fixed arguments with additional positional and keyword arguments
def lots_of_args(first_name,last_name,*args,**kwargs):
    print('Name: {} {}'.format(first_name,last_name))
    print('Additional positional args: ',list(args))
    print('Additional keyword args: ',kwargs)
lots_of_args('Matt','Smith',1,2,3,4,a=1,b=2)

### Lambda Functions

Sometimes we need to utilize some function in one specific instance, but don't need to use it again so there's no need to assign a name to the function.  We can handle these situations with a **lambda function**, similar to what R calls anonymous functions.  Though this may seem strange if you haven't seen them before, they are actually quite useful in many data analysis applications.  For example, it's common to use lambda functions to apply transormations to data structures, or to use them as inputs to functions that take functions as inputs.

In [None]:
#Define a function that takes a function as an input
def apply_func(func,num):
    return func(num)

#We could define and feed in a new function, such as the cubed function from before
def cube(x):
    return x**3
apply_func(cube,10)

In [None]:
#Or we could pass in a lambda function, a more concise approach
apply_func(lambda x: x**3,10)

Also common to see lambda functions used as the key used to sort.  Imagine we have a list of tuples, and we want to sort them.

In [None]:
my_list_of_tuples = [
    ('Ohio',100),
    ('California',500),
    ('Montana',300)
]

How could we sort this list by the numerical values?  To do so, we can use the `key` parameter of the built in `sort` method, which takes in a function that returns the value we want to sort by.  Lets build a function that returns the 2nd element of a tuple, and use it as the key function.

In [None]:
#Define function to return 2nd element of tuple
def grab_value(curr_item):
    return curr_item[1]

#Use it to sort the list of tuples
sorted(my_list_of_tuples,key=grab_value)

We see that it returned a new list sorted by the numerical value.  Rather than define the `grab_value` function beforing using it as the sorting key, we can accomplish the same thing by using a lambda function as the key.

In [None]:
#Sort using lambda function
sorted(my_list_of_tuples,key=lambda x: x[1])

## Classes

As an object-oriented program, everything in python (numbers, strings, lists, functions, anything else you can think of) is an object.  Objects are any custom data structure that has properties (called attributes) and functions (called methods).  

Classes are the tool used to build new objects.  If you plan to do any development in python, classes are how you will build new tools and packages.  Even if you don't do development yourself, it's still helpful to understand the basics of how classes work so you can understand the code and packages you are dealing with.  

A few basics are given below.  For a more thorough treatment, I'd recommend the Classes chapter in the [Introducing Python](https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch10.html#ch_objects) book on O'Reilly (which by the way is free with a .mil email).

In [None]:
#Classes are defined with the class keyword
class Cat:
    color='black'

a_cat = Cat()

At this point, `a_cat` is an object, and is an instance of the `Cat` class.  a_cat has a single property at this point, but doesn't haven't any methods yet.

In [None]:
#See color attribute
a_cat.color

In [None]:
#Change the color attribute
a_cat.color='brown'
a_cat.color

We can also assign methods to objects by defining the functions within the class.  One quirk of this syntax that takes a bit to get used to is that the first argument of the function definition within a class always refers to the object itself (and it's customary to refer to this argument as `self`).  Any subsequent arguments accept any arguments passed in when that object method is called.

That probably made no sense, so lets look at an example.

In [None]:
#Define a speak() method that takes an optional argument of "phrase" with a default value of "Meow"
class Cat:
    color='black'
    
    #First argument to any class function refers to the object itself
    #It is customary to call this first argument 'self'
    def speak(self,phrase='Meow'):
        return phrase

#Now if we create a Cat object, we can envoke its speak() method
cat2 = Cat()
print(cat2.speak())
print(cat2.speak('Woof'))

One special function name for all classes is `__init__()` (that's two underscores on each side, sometimes refered to as "dunder" for double-under).  Whenever a new object is created, python automagically check what class that object belongs to, and will look for that class's `__init__()` function, and if it exists it will execute that function.  This is typically used to initialize the object with intial properties provided by the user, as shown below.

In [None]:
class Cat:
    def __init__(self,color='black',name=None,age=None):
        self.color=color
        if name:
            self.name = name
        if age:
            self.age = age
            
        def speak(self,phrase='Meow'):
            return phrase

#Now we can create Cat objects and specify their color or name
cat1 = Cat(color='brown')
print(cat1.color)
cat2 = Cat(color='blue',name='Mr. Bigglesworth',age=10)
print(cat2.color,cat2.name,cat2.age)

Another key concept is *inheritance*, where a subclass inherits all the properties and methods of its parent class (aka superclass), but with some additional features.  Lets create a subclass of Cat and give it a new method.

In [None]:
#To define a subclass, use the parent class as an argument to the class definition
class aging_Cat(Cat):
    def make_older(self,yrs = 1):
        self.age += yrs


This subclass aging_Cat will inherit all the features of the parent class Cat, unless we override them in the definition of aging_Cat (which is perfectly acceptable).  For example, aging_Cat has the same `__init__` function as the Cat class, and we've given it an *additional* method, `make_older()`, to make the cat older.

In [None]:
cat3 = aging_Cat(age=10)
cat3.age

In [None]:
cat3.make_older(5)
cat3.age

# Working with Files

## Read and Write Files

Python has a built in function, `open()`, which we can use to open, read, and write files.  

Looking at the docstring (by running `open?`, or by typing `open()` and clicking SHIFT-TAB within the parentheses), we see that the first two arguments are `open(file,mode='r')`. The `file` parameter is the filename, and the `mode` specifies whether we want to open file in read-only mode (`mode='r'`) or in write/edit mode (`mode='w'`).


Lets write some data to a file and then read it back in.

In [None]:
#Uncomment to view docstring
#open?

In [None]:
#Create some text we want to save to a file
my_text = """
I want an officer for a secret and dangerous mission.
I want an Army ORSA.
"""
print(my_text)

In [None]:
#Open a file for writing using write mode
f = open('my_text.txt','w')

#f is a python file handler
f

In [None]:
#We can use the file handler to write data
f.write(my_text)

#Then be sure to close the connection!
f.close()

Now you should see the file `my_text.txt` in your current directory.  

Note that in this implementation, we had to explicitly remember to **close** the file connection with `f.close()`.  Forgetting to do so can lead to resource losses or permission errors when working with files.  

Because of the importance of closing file connections, the most common method of reading and writing files in python is using a *context manager* clause as shown below...

In [None]:
#Prefered method of reading and writing files
with open('my_text.txt','w') as f:
    f.write('Some new data')
    
#Note that this will OVERWRITE the file contents without warning!

This `with` block is known as a conext manager because the file handler `f` is only active withing the indented code block (the context).  This is particularly helpful for file management as it will automatically close the file connection once we leave the context of the with clause (that is, once we finish the indented block of code).

See https://realpython.com/read-write-files-python/ for a deeper overview of reading and writing files, and see https://realpython.com/why-close-file-python/ for a good discussion of why it is important to close the file connections.  

We can use the same structure to read contents from files.

In [None]:
#Read file contents
#Now use read-only mode ('r') to avoid any accidental changes to the file
with open('my_text.txt','r') as f:
    contents = f.read() #Read all data at once
    
print(contents)

## Navigating Folders with os

Python's `os` modules (for "operating system") is quite handy for navigating files and folders directly from python.  

In [None]:
import os
#get current working directory
os.getcwd()

In [None]:
#Make a new directory
if 'new_folder' not in os.listdir():
    #Will throw error if folder already exists
    os.mkdir('new_folder')

#Now get list of all files in current directory (same as "ls" from command line)
os.listdir()

For more, see this friendly overview of key `os` functionality: https://www.geeksforgeeks.org/os-module-python-examples/, or take a look at the official documentation https://docs.python.org/3/library/os.html.