## Numerics

Python has three distinct numeric data types:
- Integers
- Floating Point
- Complex Numbers - Will not be discussed

In Python, numberss are either nbumeric literals or created as a result of built-in operators or functions.  Any numeric literal containing an exponent sign or a decimal point is mapped to a floating-point type.  Whole numbers including hexadecimals, octal, and binary numbers are mapped as integer types.  Python permits "mixed" arithmetic operations, meaning numerics with different types used in expressions are permitted.  

Note:  The built-in function <b>type()</b> is used to return the object's data type.

In [2]:
nl = '\n'

x = 1
y = 1.5
z = x * y

print (nl                     ,
       'x type is:' , type(x) , nl,
       'y type is:' , type(y) , nl,
       'z type is:' , type(z))


 x type is: <class 'int'> 
 y type is: <class 'float'> 
 z type is: <class 'float'>


In the above example, x is an integer, y is a float.  The product of x and y is assigned to z which Python then cast as a float.  Similar to the SAS language, there is no need to declare variables and their associated datatypes as they are inferred from their usage.

The SAS language, however, does not make a distinction between integers and floats.  The below SAS example illustrates the same program logic as above, written in SAS.  SAS Log output has also been included:

SAS Code:
> ![image.png](attachment:image.png)

Log Output:
> ![image.png](attachment:image.png)

> ![image.png](attachment:image.png)

Results Output:
> ![image.png](attachment:image.png)

The above SAS program creates the temporary SAS dataset WORK.TYPES.  With the creation of the SAS dataset, we can search the SAS DICTIONARY table SASHELP.VCOLUMN and return the "tyhpe" associated with the SAS variables, x, y, and z.  The results from the PROC PRINT is displayed above, and shows variables x, y, and z are defined as num, indicating they are numerics.

### Python Operators

The Python interpreter permits a wide range of methematical expressions and functions to be combined together.  Python's expression syntax is very similar to the SAS language using the operators +, -, and *, </i> and / for addition, subtraction, multplication, and division, respectively.  Like the SAS parentheses (()) are used to group operations for controlling precedence:
> ![image.png](attachment:image.png)

### Boolean 

Python's two Boolean values are <b>True</b> and <b>False</b> with the capitalization as shown.  In a numerical context, for example, when used as an argument to arithmetic operations, they behave like integers with values <b>0</b> for <b>False</b> and <b>1</b> for <b>True</b>.

In [3]:
print(bool(0))

False


In [4]:
print(bool(1))

True


SAS does not have a Boolean data type.  As a result, SAS Data Step code is often constructed as a Series of cascating <b>IF-THEN/DO</b> blocks used to perform Boolean style truth tests.  SAS does have implied Boolean test operators, however.  The exist function is a good example.

### Comparison Operators 

Python has eight comparison operators.  They all have the same priority which is higher than that of the Boolean operators.
> ![image.png](attachment:image.png)

The last two Python comparison operators <b>is</b> and <b>is not</b> do not have direct analogs in SAS.  You can think of Python's <b>is</b> and <b>is not</b> as testing object identity (i.e. if two or more objects are the same).  Another way to think of this is:  Do both objects point to the same memory location?  A Python object can be thought of as a memory location holding a data value and a set of associated operations.  This is illustrated below.

In [5]:
#Evalutating equality using ==
x = 32.0
y = 32
if (x==y):
    print ("True. 'x' and 'y' are equal")
else:
    print("False. 'x' and 'y' are not equal")

True. 'x' and 'y' are equal


In the above example, x is assigned the value of 32.0 and y is assigned 32.  Linesw 3 through 6 illustrate the Python IF/ELSE construct.  Since you would expect x and y to evaluate to the same arithmetic value.

Note:  Python uses == to test the equality in cotrast to SAS which uses =.

In [6]:
#Evaluating equality using is
x = 32.0
y = 32
x is y

False

The is opeator does not test if the values assigned to x and y are equivalent.  The <b>is</b> function is used to text if objects x and y are the same.  Do objects x and y point to the same memory location?  Below further illustrates the point.

In [8]:
x = 32.0
y = x
x is y

True

Let's test Boolean Comparisons:

In [10]:
print(bool(''))

False


In [11]:
print(bool(' '))

True


In [12]:
print(bool('Arbitrary String'))

True


The first Boolean test returns <b>False</b> given the string is empty or null.  The result from the second Boolean test returns <b>True</b>.  This is a departure from how SAS handles missing character variables.  In SAS, zero or more whitespaces assigned to a character variable is considered a missing value.

Here's a simple chained Boolean comparison operation:

In [13]:
x = 20
1 < x < 100

True

Here is a second chained Boolean comparison:

In [14]:
x = 20
10 < x < 20

False

A fairly common type of Boolean expression is testing for equality and inequality among numbers and strings.  For Python, the inequlaity comparison uses != for evaluation and the SAS language uses ^=.

Python Syntax Example:

In [15]:
x=2
y=3
x != y

True

SAS Syntax Example:
> ![image.png](attachment:image.png)

SAS Log Output:
>  ![image.png](attachment:image.png)

Boolean String Equality Test:

In [17]:
s1 = 'String'
s2 = 'string'
s1 == s2

False

This Boolean comparison returns <b>False</b> since the first character in object s1 is "S" and the first character in object s2 in "s".

The same Boolean String Equality Test using SAS syntax:
>![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

### IN/NOT IN 

We can illustrate membership operators with <b>in</b> and <b>not in</b>:

In [18]:
'on' in 'Python is easy to learn'

True

In [19]:
'on' not in 'Python is easy to learn'

False

<b>in</b> evaluates to <b>True</b> if a specified sequence is found in the target string.  Otherwise it evaluates to <b>False</b>.

<b>not in</b> evaluates to <b>False</b> if a specified sequence is found in the target string.  Otherwise it evaluates to <b>True</b>.

### AND/OR/NOT 

Python's Boolean operation order for and, or, and not is listed below:
> ![image.png](attachment:image.png)

- The operator <b>not</b> yields <b>True</b> if its argument is false; otherwise it yelds <b>False</b>.
- The expression x and y first evaluates x; if x is <b>False</b>, its value is returned; otherwise, y is evaluated and the resulting value is returned
- The expression x or y first evaluates x; if x is <b>True</b>, its value is returned; otherwise, y is evaluated and the resulting value is returned

Examples:

In [21]:
True and False or True

True

Order of Operations:
1. True and False --> True
2. True or True --> True

In [23]:
(True or False) or True

True

Order of Operations:
1. True or False --> True
2. True or True --> True

In [24]:
#Python Boolean and Example:
s3 = 'Longer String'
'r' and " " in s3

True

Order of Operations:
1. 'r' in s3 --> True
2. " " in s3 --> True
3. True and True --> True

SAS Syntax Equivalent:
>![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

The <b>FINDC</b> function searches the character variable s3 left to right for the character 'r'.  This function returns the location for the first occurrence where the character 'r' is found, in this case, position 6.  The causes the first half of the <b>IF</b> predicate to evaluate to true.  Following <b>AND</b> is the second half of the <b>IF</b> predicate using the <b>FINDC</b> function to search for a blank character which is found at position 7.  This predicate evaluates true.  Since both <b>IF</b> predicates evaluate to true, this results in the statement following <b>THEN</b> to execute to write <b>'True'</b> to the SAS log.

In [25]:
#Python Boolean or Example:
s4 = 'Skinny'
s5 = 'Hunger'

'y' in s4 or s5

True

Order of Operations:
1. 'y' in s4 --> True
2. 'y' in s5 --> False
3. True or False --> True

SAS Syntax Equivalent:
> ![image.png](attachment:image.png)

Log Output:
> ![image.png](attachment:image.png)

The <b>FINDC</b> function searches the character variable s4 left to right for the character 'y'.  This function returns the location for the first occurrence where the character 'y' is found, in this case, position 6.  The causes the first half of the <b>IF</b> predicate to evaluate to true.  Since the first <b>IF</b> predicate evaluates true, this results in the staement following <b>THEN</b> statement to execute and write <b>'True'</b> to the SAS log.  The <b>ELSE</b> is not executed.

### Numerical Precision

It is a mathematical truth that 0.1 multiplied by 10 produces 1.

In [1]:
x = [.1] * 10
x == 1

False

Why is this false?  Let's examine the first line of the program.  X defines a Python list (an ordered collection of items).  In this case, our list contains ten numeric floats with the value 0.1.  When the Python interpreter executes the first line of the program, the list x is expanded to contain ten items (floats), each with the value of 0.1.  This is illustrated below:

In [2]:
x = [.1] * 10
print(x)

[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]


This intermediate summation step is illustrated here:

In [3]:
0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1

0.9999999999999999

The explanation for these results is how floating-point numbers are represented in computer hardware as base 2 fractions.  And, as it turns out, 0.1 cannot be represented exactly as a base 2 fraction.  It is an infintely repeating fraction.

Fortuneately, there are straightforward remedies to this challenge.  Similar to SAS, the Python Standard Library has a number of built-in numeric functions such as round().  Python's round() function returns a number rounded to a given precision after the decimal point. If the number of digits after the decimal is omitted from the function call or is <b>None</b>, the function returns the nearest integer to its input value.

In [5]:
nl = '\n'

total = 0
list = [.1] * 10
#Remember:  [.1] * 10 resolves to [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

for i in list:
    total += i
    
print(nl,
     "Boolean expression:  1 == total is:            ", 1 == total,
     nl,
     "Boolean expression:  1 == round(total) is:     ", 1 == round(total),
     nl,
     "total is:", total,
     nl,
     "total type is:", type(total))


 Boolean expression:  1 == total is:             False 
 Boolean expression:  1 == round(total) is:      True 
 total is: 0.9999999999999999 
 total type is: <class 'float'>


The object <b>total</b> is an accumulator used in the <b>for loop</b>.  The contruct += as part of the accumulation is equivalent to the SAS expression:
> total = total + i

The numerical precision issue raised here is not unique to Python.  The same challenge exists for SAS, or any other language utilizing floating-point arithmetic, which is to say nearly all computer languages.  

SAS Example:
> ![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

This SAS program uses a <b>DO/END Loop</b> to accumulate values into the total variable.  Then <b>inc</b> variable, set to a numeric value of 0.1, is a stand-in for the items in the Python list.  The first <b>IF</b> statement on line 12 performs a comparison of the accumulated values into variable <b>total</b> with the numeric variable <b>one</b> having an integer value of 1.  Similar to the Python example, the first half of this <b>IF</b> predicate executes indicating the comparison is false.

The <b>IF</b> statement on line 15 uses the <b>ROUND</b> function to round the variable total value (.999...) to the nearest integer value (1).  This <b>IF</b> predicate now evaluates true and writes to the log.  Line 16 does not execute.

The last line of the program writes the value of the variable total using the SAS-supplied 8.3 format, which displays the value 1.000.  The internal representation for the variable <b>total</b> remains .9999999999.

### Strings 

In Python strings are referred to as an ordered sequence of Unicode characters.  Strings are immutable, meaning they cannon be updated in place.  Any methon applied to a string such as replace() or split() used to modify a string returns a copy of the modified string.  Strings are enclosed in either singl quotes (') or double quotes (").

If a string needs to include quotes as a part of the string literal, then backslash (\) is used as an escape character.  Alternatively, like SAS, one can use a mixture of single quotes and double quotes assuming they are balanced.

Here are some simple examples:

In [7]:
s5 = 'Hello'
s6 = "World"

print(s5,s6)
print(s5+s6)
print('Type() for s5 is:', type(s5))

Hello World
HelloWorld
Type() for s5 is: <class 'str'>


SAS Equivalent:
> ![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

Python upper() Method:

In [8]:
print(s5 + " " + s6.upper())

Hello WORLD


SAS Equivalent:
> ![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

Python Multiline String:

In [11]:
s7 = '''Beautiful is better than ugly.
... Explicit is better than implicit.
... Simple is better than complex.
... Complex is better than complicated.
... Flat is better than nested.
... Sparse is better than dense.
... Readability counts, and so on...'''
print(s7)

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts, and so on...


Note how three consecutive single quotes (') are needed to define a multiline string.  A Docstring preserves the spacing and line breaks in the string literal.

The below example illustrates the use of the count() method for counting occurrences of an excerpt ('c' in this case) in a target string:

In [12]:
print('Occurrences of the letter "c":', s7.count('c'))

Occurrences of the letter "c": 6


The methods available for the built-in string object are found in the Python Standard Library 3.7 documentation at https://docs.python.org/3/library/stdtypes.html#string-methods.

### String Slicing 

Python uses indexing methods on a number of different objects having similar behaviors depending on the object.  With a sequence of characters (string), Python automatically creates an index with a start position of zero (0) for the first character in the sequence and increments to the end position of the string (lenght-1).  The index can be thought of as one -dimensional array.

The general form for Python string slicing is:
> string[start : stop : step]

Python string slicing is a sophisticated form of parsing.  By indexing a string using offsets separated by a colon, Python returns a new object identified by these offsets. <b>Start</b> identifies the lower-bound position of the string which is inclusive; <b>stop</b> identifies the upper-bound position of the string which is non-inbclusive; Python permits the use of a negative index value to count from right to left; step indicates every nth item, with a defualt value of (1).

At times you may find it easier to refer to characters toward the end of the string.  Python provieds an "end-to-beginning" indexer with a start position of -1 for the last character in the string and decrements to the beginning position.  See the below table as an example:

> ![image.png](attachment:image.png)

A number of SAS character handling functions have modifiers to enable scanning from right to left as opposed to the default behavior of scanning left to right.  Here is a Python vs. SAS Comparison.

Python:

In [13]:
s = 'Hello World'
s[0]

'H'

SAS Comparison:
> ![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

In contrast to Python, with an index start position of 0, SAS uses an index start position of 1.  The SAS <b>SUBSTR</b> function scans the character variable <b>s</b> (first argument), starts at position 1 (secont argument), and extract 1 character position (third argument).

Here's a Python example where there is no start position defined:

In [14]:
s = 'Hello World'
s[:5]

'Hello'

Here the start position defaults to 0 and the end position is passed as 5, which is whitespace (blank).  That whitespace is not returned.

The next example shows what happens when an index <b>start</b> value is greater than the length of a sequence being spliced:

In [17]:
s = 'Hello World'
print(len(s))
empty = s[12:]
print(empty)
bool(empty)

11



False

When the index start value is greater than the length of the sliced sequence, Python does not raise an error, rather, it returns an empty (null) string.  Recall from an earlier discussionon Boolean comparisons taht empty objects evaluate as <b>False</b>

Here is an example where the end value for sting slicing is greater than the length of the actual string:

In [18]:
s = 'Hello World'
s[:12]

'Hello World'

When the index stop value is greater than the length of the sliced sequence, then the entire sequence is returned.

The next example identifies the <b>start</b> index position 3 which is included and the stop index position of -1 (indicating the last character in the sequences) which is not included:

In [19]:
s = 'Hello World'
s[3:-1]

'lo Worl'

Since the stop index position is not inclusive, the last character in the sequence is not included.

If we want to include the last letter in this sequence, then we would leave the stop index value blank:

In [20]:
s = 'Hello World'
s[3:]

'lo World'

Here are two examples scanning right to left:

In [21]:
s = 'Hello World'
s[-11]

'H'

In [22]:
s[-12]

IndexError: string index out of range

With the first slice operation, because there is a single index value, it defaults to the <b>start</b> value.  With a negative value, the slice operation begins at the end of the sequence and proceeds right to left decrementing the index value by 1 (assuming the step value remains the default value of 1).

In the second slice operation, a negative <b>start</b> value larger than the sequence length to be sliced is out of range and therefore reaises an <b>IndexError</b>

Here's an example utilizing the backslash (\) to escape the single quote (') to be part of the returned sequence:

In [23]:
q = 'Python\'s capabilities'
print(q)

Python's capabilities


Same thing as above, but with utilizing double quotes (") and no backslash (\)

In [25]:
q1 =  "Python's features"
print(q1)

Python's features


## Formatting

This section will introduce the basics of Python numeric and string formatting.  We will also hit this topic in following chapters.

### Formatting Strings

Formatting Python strings involve defining a string constant containing one or more format codes.  The format codes are fields to be replaced enclosed by curly braces ({}).  Anything not contained in the replacement field is considered literal text, which is unchanged on output.  The format arguments to be substituted into the replacement field can use either keywork ({gender}, e.g.) or positional ({0},{1} e.g.) arguments.

Here's a format method with a postitional argument:

In [26]:
'The subject\'s gender is {0}'.format("Female")

"The subject's gender is Female"

The argument "Female" from the <b>format()</b> method is substituted into the replacement field designated by {0} contained inside the string sonstant literal text.  Also notice the use of the backslash (\) to escape the single quote to indicate a possessive apostrophe for the string literal 'subject'

Format specification spearated by a colon (:) are used to further enhance and control output appearances:

In [28]:
'The subject\'s gender is {0:>10}'.format("Female")

"The subject's gender is     Female"

Here, the format specification in the replacement field uses the alignment option {0:>10} to force the replacement field to be right aligned with a width of ten characters.  By default the field width is the same size as the string used to fille it.  

In the subsequent examples we use this same pattern for format specifications to control the field width and appearances of numerics.

This example illustrates multiple positional arguments.  Note that these positional arguments can be called in any order:

In [29]:
scale = 'Ratings are:  {0} {1} or {2}'
scale.format('1. Agree', '2. Neutral', '3. Disagree')

'Ratings are:  1. Agree 2. Neutral or 3. Disagree'

In [31]:
scale = 'Ratings are: {2} {0} {1}' 
scale.format('1. Agree', '2. Neutral', '3. Disagree')

'Ratings are: 3. Disagree 1. Agree 2. Neutral'

The format() method also accepts keyword= arguments:

In [32]:
location = 'Subject is in {city}, {state} {zip}'
location.format(city='Denver', state='CO', zip='80218')

'Subject is in Denver, CO 80218'

Combining positional and keyword arguments together:

In [33]:
location = 'Subject is in {city}, {state}, {0}'
location.format(80218, city='Denver', state='CO')

'Subject is in Denver, CO, 80218'

Notice when combining postional and keyword arguments together, keyword arguments are listed first followed by positional arguments.

F-string formatting is designated with a preceding f and curly braces containing the replacement expression.  F-strings are evaluated at runtime, allowing the use of any valid expression inside the string.  Here's an example:

In [37]:
radius    = 4
pi        = 3.14159

print("Area of a circle with radius:", radius,
      '\n',
      f"is: {pi * radius **2}")

Area of a circle with radius: 4 
 is: 50.26544


In this example, the formula for calculating the area of a circle is enclosed within a set of curly braces ({}).  At execution time, the results are calculated and printed as a result of calling the print() function

### Formatting Integers

The pattern for applying formats to integers is similar to that of strings.  The main difference being the replacement field deals with formatting numeric values.  And, as indicated earlier, some format specifications have values independent of the data types to be formatted.  For example, field padding is common to all datat types, whereas a comma sparator (to indicate thousands) is only applied to integers and floats.

Here's an example:

In [42]:
int = 123456789
nl = '\n'
print(nl,
     'int unformatted:', int,
     nl,
     'int formatted:', "{:>20,d}".format(int))


 int unformatted: 123456789 
 int formatted:          123,456,789


In this example, we use a postional argument for the format() method along with the format specification {:>20} to indicate we want the decimal value right aligned with a field width of 20.

Here we combine multiple format specification to achieve the desired appearance:

In [44]:
print("{:>10,d}\n".format(123456789),
     "{:>10,d}".format(1089))

123,456,789
      1,089


In this example, the format specification {:>10,d} indicates the field is right justified with a widtch of 10.  The ,d part of the specification indicates the digits use a comma as the thousands separator.  This example sues a signle print() function requiring a new line \n indicator after the first number in order to see the effect of the alignment.

Integers can be displayed with their corresponding octal, hexadecimal, and binary representation:

In [46]:
int = 99
nl = '\n'

print(nl,
     'decimal:     ', int,
     nl,
     'hexidecimal: ', "{0:x}".format(int),
     nl,
     'octal:       ', "{0:o}".format(int),
     nl,
     'binary:      ', "{0:b}".format(int))


 decimal:      99 
 hexidecimal:  63 
 octal:        143 
 binary:       1100011


Python Fromat for Leading 0's

In [47]:
'Integer 99 displayed as {:04d}'.format(99)

'Integer 99 displayed as 0099'

The format specifieer :04d indicates leading zeros(0) are to be added in the field width of 4.

Python Leading Plus Sign

In [48]:
'{:+3d}'.format(99)

'+99'

The format specification {:+3d} indicates a preceding plus sign (+) using a field width of 3.

### Formatting Floats

Consider the exaample below.  It illustrates a format specification for floats to display one digit after the decimal using {0:.1f} or four places after the decimal {0:.4f}.  Regardless of how the value is displayed using one or four places to the right of the decimal, the internal representation of the value remains the same.

In [49]:
"precision: {0:.1f} or {0:.4f}".format(3.14159265)

'precision: 3.1 or 3.1416'

The next example illustrates a format specification for percentages.  In the case of both Python and SAS, the percent format multiplies the resulting number b y 100 and places a trailing percent (%) sign.

In [50]:
"6.33 as a Percentage of 150: {0:.2%}".format(6.33/150)

'6.33 as a Percentage of 150: 4.22%'

The SAS equivalent uses the SAS-supplied percent 8.2 format to indicate two places after the decimal are displayed followed by a percent sign (%).
> ![image.png](attachment:image.png)

SAS Log Output:
> ![image.png](attachment:image.png)

### Datetime Formatting 

The next examples illustrate using the strftime(format) for date, datetime, and time handling.  Python date, datetime, and time objects support the strftime(format) method which is used to derive a string representing either dates or times from date and time objects.  This string is then manipulated with directives to produce the desired appearances when displaying output.  In other words the strftime(format) method constructs strings from the date, datetime, and time objects rather than manipulating these objects directly.

Consider the below example:

In [52]:
from datetime import datetime, date, time
now = datetime.now()
print(now)

2021-03-23 15:20:51.910545


In [53]:
print(type(now))

<class 'datetime.datetime'>


Up to this point all of the Python examples we ahve seen are executed using a built-in interpreter.  We have not needed ot rely on additional Python modules or programs.  In order to load other Python programs or modules, we use the import statement.  Here, the first line in our example imports the objects datetime, date, and time from the Python module datetime.

In our example we also create the now object.  In our program, the value associated with the now object is like a snapshot of time.

Calling the print() method for the now object displays the current data and time this program executed.

In the next example we introduce formatting directives for date and time formatting:

In [54]:
from datetime import datetime, date, time

nl = '\n'
now = datetime.now()

print('now: '     , now,
     nl           ,
     'Year:'      , now.strftime("%Y"),
     nl           ,
     'Month:'     , now.strftime("%B"),
     nl           ,
     'Day:'       , now.strftime("%d"),
     nl, nl       ,
     'concat1:'   , now.strftime("%A, %B %d, %Y A.D."),
     nl,
     'datetime:'  , now.strftime("%c"))

now:  2021-03-23 15:29:05.081545 
 Year: 2021 
 Month: March 
 Day: 23 
 
 concat1: Tuesday, March 23, 2021 A.D. 
 datetime: Tue Mar 23 15:29:05 2021
