# Base Python Coding Overview: Part 1

*Author: Evan Carey, written for BH Analytics*

*Copyright 2017-2019, BH Analytics, LLC*

## Overview

The purpose of this lecture set is to provide a review of base Python concepts and code. This is not intended to be a complete introduction to Python, but rather a review that is biased towards concepts relevant to doing data science with Python.  
   
You need to understand the following concepts in base Python at a minimum:

* Loading Packages
* Working directory and file paths
* String literals (text data)
* Numeric classes
* Date classes
* Python Collections (next lecture)
* Slicing (next lecture)
* Loops / list comprehension (next lecture)
* Functions (next lecture)

## Packages

Python is lean by default. That means when we start Python, there are only a few base function and objects loaded. We almost always need to load some extra functionality into Python by loading packages. I like to start off each script by importing the needed packages (sometimes called modules or libraries) at the top of the script. I will print out the versions of each package as well so it is clear what was run to produce this notebook. 

In [5]:
# load some packages
import sys
import os
import textwrap # adding this to make wrapping text easier for printed materials. 

Now that I have loaded that package, I can access the objects inside the package by typing `package.object` generally. In this case, I am getting the Python version by typing `sys.version`, and then calling the `textwrap.fill()` function on that result. 

In [6]:
# Get Version information
print(textwrap.fill(sys.version),'\n')

3.6.7 | packaged by conda-forge | (default, Feb 26 2019, 03:50:56)
[GCC 7.3.0] 



I am going to add this piece of code to ensure all the Python output comes through each of the jupyter notebook chunks. If not, the default behaviour is to only print the final output. 

In [7]:
## So all output comes through from Ipython
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Check your Working Directory

Python uses the concept of a working directory (exactly like many other programs). Everytime we start Python, it is started 'somewhere' on our computer (filesystem). Where python is started depends on environment variables in your operating system and how you started Python. I like to set the working directory for my session to be wherever my project files are. I will typically organize a directory like the structure below, then set the working directory to the top level:

`project1 --->
        --> Data
        --> Code
        --> Results
        --> Etc...`
        
So in this case, I would set my working directory to be the file location for project1. You should have downloaded a group of files for this class. You need to modify the working directory to be wherever you have downloaded the files to on your computer (on your filesystem). If you downloaded them to your 'documents' folder and named the folder 'python_course' (and your name is Ra!), the file path might be this:

`c:\Users\Ra\documents\python_course`

Figure out where you have downloaded the folder to, and change the code below to assign the correct working directory. We use the os package to change the working directory.

In [8]:
# Working Directory
print("My working directory:\n" + os.getcwd())
# Set Working Directory - CHANGE THIS CODE
os.chdir(r"/home/ra/host/BH_Analytics/Discover/DataEngineering")
# Confirm it changed the working Directory
print("My working directory:\n" + os.getcwd())

My working directory:
/home/ra/host/BH_Analytics/Discover/DataEngineering
My working directory:
/home/ra/host/BH_Analytics/Discover/DataEngineering


## Base Python Objects

Base Python includes such computer language fundamentals as strings, numbers (both integer and floating point), and boolean types. 

Because Python is object oriented, each instance of a base Python class (like a variable that references a string) inherits the methods of that class. In the case of strings, these include case and search methods.

For an in-depth review of Python's object-orientation, read the chapter of the excellent free online book *A Byte of Python* here: https://python.swaroopch.com/oop.html

We will now discuss numeric values, string values (called string literals), and boolean values. 

## Numeric Values

Let's start by exploring objects that are of a numeric type. Base Python includes integer and floating point numbers. We will investigate much more specific data types in the context of Numpy and Pandas later in the course.
  
  We can do basic addition like so:

In [9]:
#### Numeric types
## Integers
type(3)
## addition with variable assignment
x = 3 + 4
x
type(x)

int

7

int

Floats are different from integers in most computing languages, and Python is no different. A float is simply a numeric variable with some decimal precision. Notice that a float can be a rounded number, but it is still not the same as an integer.

In [10]:
## Floats
y = 3.6 + 4
y
type(y)

7.6

float

In [11]:
## force a float through addition to a float
type(4)
x = float(3) + 4
x
type(x)

int

7.0

float

## Reference Assignment (creating new variables)

We can 'create a variable' and store it for later use, just as we did above. It is OK to functionally think about it this way, but let's take a moment to understand what is actually happening. 

When we assign a reference to a value (an object) like this:

In [12]:
var_x = 4

What we are actually doing is adding pointer from the reference 'var_x' to the object 4. Imagine there is a table of objects, and a table off references. We have now added an extra 'entry' to the table of references (or pointers) called var_x, and linked it to the object resulting from the expression `4`. 


Notice that I did not have to first declare a new variable and give it a name and object type (some languages would require that). Instead, I simply added a reference assignment and Python inferred the rest. 
  
This is actually pretty flexible...I have not made any rules about `var_x` in the code above. One side effect of that is I can pass in a new reference assignment (using the old name) without worrying about the existing reference. The new reference is simply established. We can even add in different object types without issue. 

In [13]:
var_x

4

In [14]:
## Make a new reference assignment
var_x = 7
var_x

7

In [15]:
## make a new reference assignment of a different object type
var_x = 'the'
var_x

'the'

So what happens when I chain multiple variable assignments together? 

In [16]:
## Create a new reference
var_x2 = var_x
var_x
var_x2

'the'

'the'

Now both var_x and var_x2 reference the same object. If I subsequently change the reference for var_x, it does not affect var_x2. 

In [17]:
var_x = 4
var_x
var_x2

4

'the'

This may seem odd, so I will take a moment to elaborate. Consider that you have two different things you are keeping track of: references, and objects. When you establish a reference to an object, you are doing two things simultaneusly: 

*  Creating a new object 
*  Assigning a reference to the object

Any subsequent reference assignments do the same two things. 

## Boolean Values

Another simple class of data in Python is the Boolean data class. Boolean values are a data type that takes on two values (True and False). These are technically a form of integers (0/1), but you can think of them as a different class. We will use Boolean values often in practice. Let's explore these now.
  
Boolean objects are most often the result of some comparison, or an argument we use for a function.

In [18]:
## Boolean values: result of a comparison
z = 3 > 4
type(z)

## Assign references to values
z1 = True
z2 = False

bool

You can invert a boolean value by simply prefixing it with not:

In [19]:
z
not z

False

True

Boolean operators include `and`, `or`, `==` (equal to), `!=` (not equal to), and `<`,`>`,`<=`,`>=`

In [20]:
True or False
True and False
3 == 4
3 != 4
(3 == 3) and (4==8)

True

False

False

True

False

## String Literals

The text values in Python are called string literals. You can use either single or double quotes when creating a string literal, but they must match. Note that they still print the same way even when we use double quotes. One reason to pick either double or single quotes is if you want to include an apostrophe in your string: 

In [21]:
# Creating string variables (assigning reference)
var1 = "This is a string."
var2 = 'Also a string.'
var1
var2

'This is a string.'

'Also a string.'

Multiline strings can be indicated by three quotes. However, you can also hardcode the newline with `\n`:

In [22]:
## Multiline string examples
var3 = """This is a
multiline string"""
var4 = "This is a\nmultiline string"
var3
var4
if var3 == var4:
    print('Yes, they are equal!')

'This is a\nmultiline string'

'This is a\nmultiline string'

Yes, they are equal!


If you wish to include a quote as part of the string, use the inverse for the outside. Alternatively, you can escape a quote character with a backslash. 

In [23]:
var5 = 'This is John\'s cat'
var6 = "This is John's cat"
var5
var5 == var6

"This is John's cat"

True

## String Literal Conversions, Indexing, Iterable

You can convert an object to a string using `str()`. You may have imported data (like a zip code) that you want to convert to be a string. 

In [24]:
## String conversion
zip_code = str(75064)
zip_code
type(zip_code)

'75064'

str

Indexing string works just like other indexing methods in Python you will see later. We use the `[` square bracket to index, and start counting at 0. 

In [25]:
## First character
zip_code[0]
## Second character
zip_code[1]

'7'

'5'

Indexing in general in Python works with the following syntax:

`x[start:stop:increment]`

Remember, Python starts counting at 0!

Negative numbers are counted from the end. 

Here are a few examples of indexing a string:

In [26]:
fname = 'John Smith'

# First two letters
fname[0:2]

'Jo'

In [27]:
# Last two Letters
fname[-2:]

'th'

In [28]:
# Every Other Letter
fname[0::2]

'Jh mt'

Strings are *not mutable*, which means we cannot change them after they have been created. You can see this by attempting to index the string and replace a piece. Some other objects in Python we will encounter are mutable (we can change them), so this distinction matters!

In [29]:
zip_code[0] = '0'

TypeError: 'str' object does not support item assignment

Finally, strings are *iterable*, which simply means they can be iterated through. One example of iteration is a for loop. Not all objects are iterable in Python!

In [30]:
## String literals are iterable
for i in zip_code:
    print(i)

7
5
0
6
4


Remember this error you see - you will generate it at some point, and the solution is to identify where in your code you are treating a number as iterable. 

In [31]:
## Floats and integers are not iterable!
x1 = 75064
for i in x1:
    print(i)

TypeError: 'int' object is not iterable

## Escaping Strings

We saw this implicitly just a moment ago. We use the `\` as an escape for string literals. If you want to indicate a new line, you type `\n`. If you want to indicate a tab, you type `\t`. More details can be found in the python docs at https://docs.python.org/2.0/ref/strings.html 
  
One issue arising from this is we cannot use the typical windows paths when indicating a location on our filesystem! Consider these two potential paths: 

In [32]:
# Windows Path Example
print("C:\Data\project")
print("C:\Data\t1")

C:\Data\project
C:\Data	1


The `\t` was resolved as a tab!

To fix this, we prefix a string with r to indicate a raw string literal (no escapes), then we can use typical paths. Otherwise we must use forward slashes, or escape the backslash:

In [33]:
## These all point to the same place. 
print(r"C:\Data\t1")
print("C:/Data/t1")
print("C:\\Data\\t1")

C:\Data\t1
C:/Data/t1
C:\Data\t1


## Modifying, Splitting and Joining Strings

We can modify strings easily by joining, splitting, or otherwise altering the string. Let's use a basic sentence to show some of these examples. There are several string methods we commonly use to control the case of a string for example:

In [34]:
## Modifying Strings
string1 = 'Mary had a little lamb'
string1.upper()
string1.capitalize()
string1.lower()
string1.title()

'MARY HAD A LITTLE LAMB'

'Mary had a little lamb'

'mary had a little lamb'

'Mary Had A Little Lamb'

Now is a good time to explain what we mean when we say 'method'! In Python, a function that is attached to an object is called a `method`. Since these methods are related to strings, it makes sense to organize and attach them to the string object. These attached functions are simply called 'methods'. The other thing that is attached to objects is 'attributes'. Attributes are simply variables attached to objects. 

Some other useful methods include searching the string for a match:

In [35]:
string1.find('had')

5

Or querying if a string starts with or ends with certain characters:

In [36]:
string1.startswith('Mary')

True

In [37]:
string1.endswith('Mary')

False

We can split strings with the split method, and add them with the + operator. The result of the split method gives a collection called a list, more details on lists will be discussed in a moment!

In [38]:
## Split based on spaces
string1_list = string1.split(" ")
string1_list

['Mary', 'had', 'a', 'little', 'lamb']

If we have a list (or some other iterable object) of strings, we can join the pieces of the collection together using the `join` method from a string. You can check the help for a string method by typing in the following: 

`?str.join()`

This is a bit odd in the way it works/looks, you put the separator first, then the argument to the join method is the iterable. 

In [39]:
## Join with a new separator
"_".join(string1_list)

'Mary_had_a_little_lamb'

Finally, you can concatenate two strings simply by adding them together. 

In [40]:
## Append text
string1 + ", whose fleece..."

'Mary had a little lamb, whose fleece...'

## String Formatting

This is a bit advanced concept, but it is a good time to introduce it. What if you want to print out a string using specific formats to get a 'prettier' print? 
  
  
Strings can be formatted in a more structured manner using the format method. This format method differs from the older Python 2 string formatting method!

  
I will first show you a basic way to print some strings using a for loop. We have not discussed for loops yet, but we will before very long!

Let's assume we have fit a few different models and kept their scores. I use a collection called a 'list' here. More on that to come later as well...

In [41]:
## Structured print of object
## Formatting Strings
mod_scores=[("Model1", 22),
            ("Model2", 104),
            ("Model3", 6)]
for i in mod_scores:
    print(i[0] + " is " + str(i[1]))

Model1 is 22
Model2 is 104
Model3 is 6


You can specify a more general format for each string, which makes printing results prettier. In the code below, I am telling Python to print the first element (element 0), then allow a minimum of 8 spaces, then print the `>>`, then print the second element (element 1) allowing a minimum of 8 spaces.   
String formatting is a big topic, and we don't have time to fully cover it here. You can add quite a bit more flexability to the string formats including rounding, left/right justification, and more. Refer to the Python documentation here:  
https://docs.python.org/3.6/library/string.html#string-formatting  
Or you can check out this nice resource:  
https://mkaz.tech/code/python-string-format-cookbook/

In [42]:
## Make structured output using a for loop
for i in mod_scores:
    print("{0:8} >> {1:8}".format(i[0], i[1]))

Model1   >>       22
Model2   >>      104
Model3   >>        6


## Checking for methods of an object

At this point you might wonder how can you check for all of these methods for a given object (like a string literal). You can check for the methods and attributes of an object using the following commands:

In [43]:
x = 'the'
help(type(x))
# or you could just type this:
# help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt




## Dates and Times

It's convenient in many situations to format and use dates and times in your work. We can use the datetime module to retrieve and store date values in base Python. We will primarily focus on dates/date-times in the context of Pandas later in the course. 

Question for the class: Are dates strings or numbers? (or maybe something else...)

In [44]:
## Get todays date
import datetime
date_today = datetime.date.today()
date_today
type(date_today)

datetime.date(2019, 8, 5)

datetime.date

In [45]:
## Prints like a string though...
print(date_today)

2019-08-05


But we cannot add this string to another string like this: 

In [46]:
## This code gives error:
"Hello World, today is: " + datetime.date.today()

TypeError: must be str, not datetime.date

The issue is that datetimes are their own class. You must first convert it to a string then print it if you wanted to do this. 

In [47]:
## Alternatively, we can create the string then print
day1 = datetime.date.today()
str_message = "Hello World, today is: " + str(day1)
print(str_message)

Hello World, today is: 2019-08-05


## Date formats

We can take this idea of printing a date a bit further. You might think of this as formatting a date into a string object. The  method we need is called `strftime()`. We can define the format exactly using specific date codes, demonstrated below:

In [48]:
## Fancier Date Formats
day1.strftime("%B %d, %Y")
str_message = "Hello World, today is: " + day1.strftime("%B %d, %Y")
print(str_message)

'August 05, 2019'

Hello World, today is: August 05, 2019


You may have noticed in the above code that some of the date format arguments are capitalized while others are not (**`%B`** vs. **`%d`** vs. **`%Y`**). This is almost a mini-language for describing dates, and it is used in many other languages besides Python. A full list of these format is here:  
http://strftime.org/

In [49]:
## Print Date in a standard format:
day1.strftime("%B %d, %Y")

'August 05, 2019'

In [50]:
## Just the year!
day1.strftime("%Y")

'2019'

In [51]:
## Short month then day then two digit year
day1.strftime("%b-%d-%y")

'Aug-05-19'

In [52]:
## You can add other text into the format
day1.strftime("%B %d, %Y, was a %A.")

'August 05, 2019, was a Monday.'

Another question that is often raised at this point is how do you go from character to actual date? You can use the same format code language and the `datetime.strptime()` method. 

In [53]:
string_date = '2016-05-16'
datetime.datetime.strptime(string_date,'%Y-%m-%d')

datetime.datetime(2016, 5, 16, 0, 0)

There will be more details on datetimes in the context of Pandas later in the course.


This was a non-comprehensive review of some base Python concepts.

Do you have any questions about what we have covered so far?

Review:

* Loading Packages
* Working directory and file paths
* String literals (text data)
* Numeric classes
* Date classes