# Everything is better with friends: Using SAS in Python applications with SASPy and open-source tools

## Half-Day Tutorial &nbsp;&bullet;&nbsp;  SAS Global Forum 2020

## Section 1. Python Code Conventions and Data Structures

### Example 1.1. Meet the Python Environment

<b><u>Instructions</u></b>: Click anywhere in the code cell immediately below, and run the cell using Shift-Enter. Then attempt the Exercises that follow, only looking at the explanatory notes for hints when needed.

In [None]:
import warnings
warnings.filterwarnings('ignore')

import platform
print(platform.sys.version)

help('modules')

**Line-by-Line Code Explanation**:

* Lines 1-2: Load the `warnings` module, and use the `filterwarnings` method to suppress warnings globally. (This is needed because of warnings generated when Line 7 is executed in SAS University Edition.)
 
* Lines 4-5: Load the `platform` module, and print Python and OS version information.

* Line 7: Print all modules currently available to be loaded into the Python kernel.

&nbsp;

**Exercise 1.1.1**. True or False: Changing Line 1 to `IMPORT WARNINGS` would result in an execution error.

&nbsp;

&nbsp;

**Exercise 1.1.2**. True or False: The example code should result in an execution error because there are no terminating semicolons.

&nbsp;

&nbsp;

**Notes about Example 1.1**:
1. To increase performance, only a small number of modules in Python's standard library are available to use directly by default, so the `warnings` and `platform` modules need to be explicitly loaded before use. Python has a large standard library because of its "batteries included" philosophy.
2. Numerous third-party modules are also actively developed and made freely available through sites like https://github.com/ and https://pypi.org/. Two of the third-party modules needed for this tutorial are pandas, which we'll use for DataFrame objects below, and saspy, which allows Python scripts to connect to a SAS kernel for executing SAS code. Both of these modules come pre-installed with SAS University Edition.
3. This example illustrates four ways Python syntax differs from SAS:
  * Unlike SAS, capitalization matters in Python. Changing Line 4 to `IMPORT PLATFORM` would produce an error.
  * Unlike SAS, semicolons are optional in Python, and they are typically only used to separate multiple statements placed on the same line. E.g., Lines 4-5 could be combined into `import platform; print(platform.sys.version)`
  * Unlike SAS, dot-notation has a consistent meaning in Python and can be used to reference objects nested inside each other at any depth. E.g., the `platform` module object invokes the sub-module object `sys` nested inside of it, and `sys` invokes the object `version` nested inside of it. (Think Russian nesting dolls or turduckens.)
  * Unlike SAS, single and double quotes always have identical behavior in Python. E.g., `help('modules')` would produce exactly the same results as `help("modules")`.
4. If an error is displayed, an incompatible kernel has been chosen. This Notebook was developed using the Python 3.5 kernel provided with SAS University Edition as of January 2020.

&nbsp;

### Example 1.2. Hello, Data!

<b><u>Instructions</u></b>: Click anywhere in the code cell immediately below, and run the cell using Shift-Enter. Then attempt the Exercises that follow, only looking at the explanatory notes for hints when needed.

In [None]:
hello_world_str = 'Hello, Jupyter!'
print(hello_world_str)
print()
if hello_world_str == 'Hello, Jupyter!':
    print(type(hello_world_str))
else:
    print("Error: The string doesn't have the expected value!")

**Line-by-Line Code Explanation**:

* Lines 1-3: Create a string object (`str` for short) named `hello_world_str`, and print it's value, followed by a blank line.

* Lines 4-7: Check to see if `hello_world_str` has the expected value. If so, print it's type. Otherwise, print an error message.

&nbsp;

**Exercise 1.2.1**. Which of the following changes to the above example would result in an error? (pick all that apply):

* [ ] Removing an equal sign (`=`) in the expression `if hello_world_str == 'Hello, Jupyter!'`
* [ ] Removing the statement: `print()`
* [ ] Unindenting `print(type(hello_world_str))`

&nbsp;

&nbsp;

**Exercise 1.2.2**. Write several lines of Python code to produce the following output:

```
42

<class 'int'>
```

&nbsp;

**Notes about Example 1.2**:
1. This example illustrates three more ways Python differs from SAS:
  * Unlike SAS, variables are dynamically typed in Python. After Line 1 has been used to create `hello_world_str`, it can be assigned a new value later with a completely different type. E.g., we could later use `hello_world_str = 42` to change `type(hello_world_str)` to  `<class 'int'>`. 
  * Unlike SAS, single-equals (`=`) only ever means assignment, and double-equals (`==`) only ever tests for equality, in Python. E.g., changing Line 4 to `if hello_world_str = 'Hello, Jupyter!'` would produce an error.
  * Unlike SAS, indentation is significant and used to determine scope in Python. E.g., unindenting Line 5 would produce an error since the `if` statement would no longer have a body.

&nbsp;

### Example 1.3. Python Lists and Indexes

<b><u>Instructions</u></b>: Click anywhere in the code cell immediately below, and run the cell using Shift-Enter. Then attempt the Exercises that follow, only looking at the explanatory notes for hints when needed.

In [None]:
hello_world_list = ['Hello', 'list']
print(hello_world_list)
print()
print(type(hello_world_list))

**Line-by-Line Code Explanation**:

* Line 1: Create a list object named `hello_world_list`, which contains two strings.

* Lines 2-4: Print the contents of `hello_world_list`, followed by a blank line and its type.

&nbsp;

**Exercise 1.3.1**. Would the Python statement `print(hello_world_list[1])` display the value `'Hello'` or `'World'`?

&nbsp;

&nbsp;

**Exercise 1.3.2**. True or False: A Python list may only contain values of the same type.

&nbsp;

&nbsp;

**Notes about Example 1.3**.
1. Values in lists are always kept in insertion order, meaning the order they appear in the list's definition, and they can be individually accessed using numerical indexes within bracket notation:
  * `hello_world_list[0]` returns `'Hello'`
  * `hello_world_list[1]` returns `'list'`.

2. The left-most element of a list is always at index `0`. Unlike SAS, customized indexing is only available for more sophisticated data structures in Python (e.g., a dictionary, as in Example 1.4 below).

3. Lists are the most fundamental Python data structure and are related to SAS data-step arrays. However, unlike a SAS data-step array, a Python list object may contain values of different types (such as `str` or `int`). However, processing the values of a list without checking their types may cause errors if it contains unexpected values.

&nbsp;

### Example 1.4. Python Dictionaries

<b><u>Instructions</u></b>: Click anywhere in the code cell immediately below, and run the cell using Shift-Enter. Then attempt the Exercises that follow, only looking at the explanatory notes for hints when needed.

In [None]:
hello_world_dict = {
        'salutation'      : ['Hello'       , 'dict'],
        'valediction'     : ['Goodbye'     , 'list'],
        'part of speech'  : ['interjection', 'noun'],
}
print(hello_world_dict)
print()
print(type(hello_world_dict))

**Line-by-Line Code Explanation**:

* Line 1-5 : Create a dictionary object (`dict` for short) named `hello_world_dict`, which contains three key-value pairs, where each key is a string and each value is a list of two strings.

* Lines 6-8: Print the contents of `hello_world_dict`, followed by a blank line and its type.

&nbsp;

**Exercise 1.4.1**. What would be displayed by executing the statement `print(hello_world_dict['salutation'])`?

&nbsp;

&nbsp;

**Exercise 1.4.2**. Write a single line of Python code to print the initial element of the list associated with the key `valediction`.

&nbsp;

**Notes about Example 1.4**:

1. Dictionaries are another fundamental Python data structure, which map keys (appearing before the colons in Lines 2-4) to values (appearing after the colons in Lines 2-4). The value associated with each key can be accessed using bracket notation:
  * hello_world_dict['salutation'] returns ['Hello', 'dict']
  * hello_world_dict['valediction'] returns ['Goodbye', 'list']
  * hello_world_dict['part of speech'] returns ['interjection', 'noun']


2. Whenever indexable data structures are nested in Python, indexing methods can be combined. E.g., `hello_world_dict['salutation'][0] == ['Hello', 'dict'][0] == 'Hello'`.
 
3. Dictionaries are more generally called _associative arrays_ or _maps_ and are related to SAS formats and data-step hash tables.

4. In Python 3.5, the print order of key-value pairs may not match insertion order, meaning the order key-value pairs are listed when the dictionary is created. As of Python 3.7 (released in June 2018), insertion order is preserved.
  
&nbsp;

### Example 1.5. Introduction to Data Frames

<b><u>Instructions</u></b>: Click anywhere in the code cell immediately below, and run the cell using Shift-Enter. Then attempt the Exercises that follow, only looking at the explanatory notes for hints when needed.

In [None]:
from pandas import DataFrame
hello_world_df = DataFrame(
    {
        'salutation'      : ['Hello'      , 'DataFrame'],
        'valediction'     : ['Goodbye'    , 'dict'],
        'part of speech'  : ['exclamation', 'noun'],
    }
)
print(hello_world_df)
print()
print(hello_world_df.shape)
print()
print(hello_world_df.info())

**Line-by-Line Code Explanation**:

* Line 1: Load the definition of a `DataFrame` object from the `pandas` module. (Think of a DataFrame as a rectangular array of values, with all values in a column having the same type.)

* Lines 2-8: Create a DataFrame object (`df` for short) named `hello_world_df` with dimensions 2x3 (2 rows by 3 columns), with each key-value pair in the dictionary in Lines 3-7 becoming a column that is labelled by the key.

* Lines 9-14: Print the contents of `hello_world_df`, following by a blank line, the number of rows and columns in it, another blank line, and some information about it.

&nbsp;

**Exercise 1.5.1**. Write a single line of Python code to print the column labelled by `salutation`.

**Exercise 1.5.2**. Write a single line of Python code to print the final element of the column labeled by `valediction`.

**Notes About Example 1.5**:

1. The DataFrame object type is not built into Python, which is why we first have to import its definition from the pandas module.

1. DataFrames can be indexed like dictionaries composed of lists. E.g., `hello_world_df['salutation'][0] == ['Hello', 'dict'][0] == 'Hello'`

2. A DataFrame is a tabular data structure with rows and columns, similar to a SAS data set. However, while SAS datasets are typically accessed from disk and processed row-by-row, DataFrames are loaded into memory all at once. This means values in DataFrames can be randomly accessed, but it also means the size of DataFrames can't grow beyond available memory.

3. The dimensions of the DataFrame are determined as follows:
  * The keys `'salutation'`, `'valediction'`, and `'part of speech'` of the dictionary passed to the `DataFrame` constructor function become column labels.
  * Because each key maps to a list of length two, each column will be two elements tall (with an error occurring if the lists are not of non-uniform length).
  
  
4.  The DataFrame constructor function can also accept many other object types, including another DataFrame.