<img src="https://www.mines.edu/webcentral/wp-content/uploads/sites/267/2019/02/horizontallightbackground.jpg" width="100%"> 
### CSCI250 Python Computing: Building a Sensor System
<hr style="height:5px" width="100%" align="left">

# Python data type: `string`

# Objective
* introduce the `string` data type
* discuss methods for interaction with strings

# Resources
* [Python introduction](https://docs.python.org/3/tutorial/introduction.html#strings)
* [Programiz Python tutorial](https://www.programiz.com/python-programming/string)

# Definition 

A `string` consists of a group of characters in order:
* are enclosed by single quotes (`'`) or double quotes (`"`)
* special characters can be included with **escape sequences**

In [1]:
s = "Python is the \"most powerful language you can still read\". - Paul Dubois"

print(  id(s))
print(type(s))
print(     s )

1750670381744
<class 'str'>
Python is the "most powerful language you can still read". - Paul Dubois


Strings that span multiple lines are enclosed between `"""`.

In [2]:
s = """
Python is the \"most powerful language you can still read\".

- Paul Dubois
"""
print( s )


Python is the "most powerful language you can still read".

- Paul Dubois



# `string` accessibility

1. indexing
2. slicing
3. mutability
4. unpacking
5. nesting

## 1. indexing 
A `string` element can be retrieved by its index. 

The index starts at `0`.

In [None]:
print(s,'\n')

print( s[1] )

print( s[len(s)-5] ) 
print( s[      -5] )

## 2. slicing
We can retrieve a group of elements from a `string` by **slicing**.

In [None]:
print(s,'\n')

print( s[1:7] )
print( s[-12:-1] )

## 3. mutability
Is the ability to change the content without changing the identity.

`string` type is **immutable**. 

In [3]:
print( s )
print('id(s) =',id(s) )


Python is the "most powerful language you can still read".

- Paul Dubois

id(s) = 1750670381360


In [4]:
s[11:13] = 'THE'

print( s )
print('id(s) =',id(s) )

TypeError: 'str' object does not support item assignment

## 4. unpacking
Allows simultaneous access to components of the `string` type.

In [None]:
s = 'abc'
print(s)

In [None]:
x,y,z = s
print( x,y,z )

## 5. nesting
`string` types **cannot** be nested.

# `string` specific methods
Can be accessed by typing the variable name, followed by `.` and **TAB**. 

The name of the method followed by `?` returns the associated selfdoc.

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Explain the meaning of **methods** associated with type `string`.
* Add comments explaining their purpose. 
* Include examples demonstrating their usage.

In [None]:
s = 'Colorado School of Mines'

In [None]:
s.lower()

In [None]:
s.upper()

In [None]:
s.split()

In [None]:
u = s.split()
print(' '.join(u))

In [None]:
s.find('of')

In [None]:
s.count('o')

In [None]:
s.replace('Mines','Minds')

In [None]:
translator = s.maketrans('o','x')
print(translator)

In [None]:
s.translate(translator)

# `string` builtin methods

Functions and types available to the Python interpreter:

https://docs.python.org/3.3/library/functions.html

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Explain the meaning use of **builtins** usable on type `string`.
* Add comments explaining their purpose. 
* Include examples demonstrating their usage.

In [5]:
s

'\nPython is the "most powerful language you can still read".\n\n- Paul Dubois\n'

In [6]:
all(s)
#returns true if all elements are true or if the string is empty...?
# question: What would be a false element?

True

In [7]:
any(s)
#returns true if any of the elements are true, but false if string is empty

True

In [8]:
len(s)
# returns the number of characters in the string

75

In [9]:
list(s)
#returns a list of each character in the string, useful because lists are mutable

['\n',
 'P',
 'y',
 't',
 'h',
 'o',
 'n',
 ' ',
 'i',
 's',
 ' ',
 't',
 'h',
 'e',
 ' ',
 '"',
 'm',
 'o',
 's',
 't',
 ' ',
 'p',
 'o',
 'w',
 'e',
 'r',
 'f',
 'u',
 'l',
 ' ',
 'l',
 'a',
 'n',
 'g',
 'u',
 'a',
 'g',
 'e',
 ' ',
 'y',
 'o',
 'u',
 ' ',
 'c',
 'a',
 'n',
 ' ',
 's',
 't',
 'i',
 'l',
 'l',
 ' ',
 'r',
 'e',
 'a',
 'd',
 '"',
 '.',
 '\n',
 '\n',
 '-',
 ' ',
 'P',
 'a',
 'u',
 'l',
 ' ',
 'D',
 'u',
 'b',
 'o',
 'i',
 's',
 '\n']

In [10]:
sorted(s)
#returns a list of characters sorted in unicode order

['\n',
 '\n',
 '\n',
 '\n',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 '"',
 '"',
 '-',
 '.',
 'D',
 'P',
 'P',
 'a',
 'a',
 'a',
 'a',
 'a',
 'b',
 'c',
 'd',
 'e',
 'e',
 'e',
 'e',
 'f',
 'g',
 'g',
 'h',
 'h',
 'i',
 'i',
 'i',
 'l',
 'l',
 'l',
 'l',
 'l',
 'm',
 'n',
 'n',
 'n',
 'o',
 'o',
 'o',
 'o',
 'o',
 'p',
 'r',
 'r',
 's',
 's',
 's',
 's',
 't',
 't',
 't',
 't',
 'u',
 'u',
 'u',
 'u',
 'u',
 'w',
 'y',
 'y']

In [12]:
max(s)
#returns the character with largest unicode value in the string

'y'

In [13]:
min(s)
#returns the character with smallest unicode value in string

'\n'

In [14]:
ord(s[3])
#returns the unicode code point of the character at that index

116

In [15]:
print(s)
#prints the entire string


Python is the "most powerful language you can still read".

- Paul Dubois



In [16]:
set(s)
#returns the list as a set object, which includes one of each type of character in the string

{'\n',
 ' ',
 '"',
 '-',
 '.',
 'D',
 'P',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'w',
 'y'}

# `string` constants

Constants defined in the `string` module.

https://docs.python.org/3/library/string.html

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Explain the meaning use of **constants** defined for type `string`.
* Add comments explaining their purpose. 
* Include examples demonstrating their usage.

In [18]:
import string
#imports library string, which includes functions that can be used on string objects

In [21]:
print(string)

<module 'string' from 'C:\\Users\\sammb\\anaconda3\\lib\\string.py'>


In [22]:
string.ascii_letters
#displays all the letters in the ascii alphabet

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [23]:
string.ascii_lowercase
#displays all lowercase ascii letters

'abcdefghijklmnopqrstuvwxyz'

In [24]:
string.ascii_uppercase
#displays all uppercase ascii letters

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [25]:
string.digits
#displays all ascii digits

'0123456789'

In [26]:
string.punctuation
#displays all ascii punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [27]:
string.whitespace
#displays all ascii characters that denote white space (newline, space, etc.)

' \t\n\r\x0b\x0c'

# `string` addition
Addition of two `string` types concatenates the inputs.

In [28]:
a = 'blah'
b = 'halb'
a + b

'blahhalb'

# `string` multiplication
Multiplication of a `string` by a number repeats the input.

In [29]:
a = 'blah'
a * 3

'blahblahblah'

<img src="https://www.dropbox.com/s/7vd3ezqkyhdxmap/demo.png?raw=1" width="10%" align="left">

# Demo
Consider the text in the file pbd.txt extracted from [**Carl Sagan**](https://en.wikipedia.org/wiki/Carl_Sagan)'s book entitled **Pale Blue Dot**.

*** 
Use Python `string` functions to  
* find the number of characters
* find the number of vowels and consonants
* find the number of capital letters

In [None]:
import string

Load the PBD text from an external file into a string:

In [31]:
with open ("pbd.txt", "r") as pbdFile:
    myPBD = pbdFile.read()

print(type(myPBD))

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 31074: character maps to <undefined>

In [None]:
print(myPBD)

Count the number of characters using the function `len`:

In [None]:
print(len(myPBD),'characters')

We can remove punctuation from the string using a translator. 

First we need to define a string containing all punctuation symbols: 

In [None]:
pct = string.punctuation
print(pct)

Then translate the string using the punctuation string 
* all characters present in this string are mapped to None.

In [None]:
translator = str.maketrans('','',pct)
myCHR = myPBD.translate(translator)

In [None]:
print(myCHR)

Count the non-punctuation characters with the function `len`:

In [None]:
print(len(myCHR),'non punctuation characters')

Define a string containing all the vowels:

In [None]:
vow = 'aeiou'

Count the vowels by comparing each character to the vowels:

In [None]:
# version 1
vCount = 0
for i in myCHR:
    
    for v in vow + vow.upper():
        if( i == v):
            vCount += 1
            
print(vCount,'vowels')

Alternatively, we can count the vowels using the function `count`:

In [None]:
# version 2
vCount = 0
for v in vow:
    
    vCount += myCHR.count(v)
    vCount += myCHR.count(v.upper())
    
print(vCount,'vowels')

Count the consonants by subtracting the vowels from the total count:

In [None]:
cCount = len(myCHR) - vCount

print(cCount,'consonants')

Count the capital letters by testing the logical value of `isupper`:

In [None]:
uCount = 0
for i in myCHR:

    if( i.isupper() ):
        uCount += 1

print(uCount,'upper case letters')