# Strings in Python

A String is a sequence of characters.

In older versions of Python a character was represented by a 7-bit ASCII code. The 7 bits allowed us to represent 128 unique characters. This was sufficient to represent all the upper and lower case letters of the alphabet of English, all the punctuation marks, and digits. There were even enough symbols left over to represent control characters that could represent end of line, or spaces and tabs. 

The 7 bits were padded to the left by a single 0 bit and a character came to be represented by a byte. Since computer programming is an international activity, the letters in the alphabet of other languages had to be included. In the newer versions of Python, characters are represented by 16 bit Unicode. 

ASCII forms a subset of Unicode.

In current python3 version strings can be Unicode UTF-8, UTF-16 or UTF-32 



In [1]:
# A string literal is defined within single or double quotes.

firstName = 'Alan'
lastName = "Turing"

## String Indexing

You can think of a string as a sequence of characters. 

* The length of a string is given by the operator len. The length gives the number of characters in a string including blank spaces. 
* The index of a character in a string gives its position in the string where the first character has an index of 0 and the last character has an index of (length - 1). 
* Python allows for negative indexing. The index -1 represents the last character, and the index -2 represents the last but one character, and so on. You can use a for loop to iterate through the string one character at a time.


In [2]:
m_str = "Hello World"

# index 8 is 
print("m_str[8] is: ", m_str[8])

# index -5 is 
print("m_str[-5] is: ", m_str[-5])

# lenght of this string is:
print("len(m_str) is", len(m_str))

print("------------")

for ch in m_str:
    print(ch)

m_str[8] is:  r
m_str[-5] is:  W
len(m_str) is 11
------------
H
e
l
l
o
 
W
o
r
l
d


## Concatenation and Repetition
The + symbol is the concatenation operator. And the * symbol is the repetition operator.



In [3]:
# concatane strings by using simple + operator 

m_str = "spam" + "a" + "lot"
print (m_str)



spamalot


In [4]:
# multiplication operator repeates the sting

m_str = 2 * "spam" + "a" + "lot" * 3
print (m_str)

spamspamalotlotlot


# String Slicing
You can slice a string into substrings. Python provides an easy way to slice strings. You must provide the starting index and the ending index, like so:



In [5]:
m_str = "To be or not to be"
print(m_str)

start = 2
end = 9
subStr = m_str[start:end]

subStr

To be or not to be


' be or '

## String Library

Python has an extensive number of string functions that are stored in a string library. 

To use these functions you do not have to include this library explicitly. 

The main documentation manual for string library is here 

https://docs.python.org/3.9/library/stdtypes.html#text-sequence-type-str



<table border = "1" width = "75%">
<tr><th> Function </th><th> Meaning </th></tr>
<tr>
<td>capitalize()</td>
<td>Returns a copy of the string with only its first character capitalized.</td>
</tr>
<tr>
<td>center (width) </td>
<td> Returns a copy of the string centered in another string of length 
<i>width</i>. </td>
</tr>
<tr>
<td> count (sub) </td>
<td> Returns the number of occurrences of substring <i>sub</i>.</td>
</tr>
<tr>
<td> endswith (suffix) </td>
<td> Returns <i>True</i> if the string ends with the specified <i>suffix</i>
and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> find (sub) </td>
<td> Returns the lowest index in the string where the substring <i>sub</i>
is found and -1 if it is not found. </td>
</tr>
<tr>
<td> isalnum () </td>
<td> Returns <i>True</i> if all the characters are alphanumeric and there
is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> isalpha () </td>
<td> Returns <i>True</i> if all the characters in the string are alphabetic
and there is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> isdigit () </td>
<td> Returns <i>True</i> if all the characters in the string are digits
and there is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> islower () </td>
<td> Returns <i>True</i> if all alphabetic charactes are in lower case,
and there is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> isspace () </td>
<td> Returns <i>True</i> if there are only white space characters, and there
is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> isupper () </td>
<td> Returns <i>True</i> if all alphabetic characters are in upper case
and there is at least one character, and <i>False</i> otherwise. </td>
</tr>
<tr>
<td> join (seq) </td>
<td> Returns a string that is a concatenation of elements of the sequence
<i>seq</i>. </td>
</tr>
<tr>
<td> ljust (width) </td>
<td> Returns a string of length <i>width</i> with the original string
left justified in it. </td>
</tr>
<tr>
<td> lower () </td>
<td> Returns a copy of the string converted to lowercase. </td>
</tr>
<tr>
<td> lstrip () </td>
<td> Returns a string with leading whitepace characters removed. </td>
</tr>
<tr>
<td> replace (old, new) </td>
<td> Returns a copy of the string with all occurences of the substring
<i>old</i> replaced with <i>new</i>. </td>
</tr>
<tr>
<td> rfind (sub) </td>
<td> Returns the highest index in the string where substring <i>sub</i>
is found and -1 if the substring is not found. </td>
</tr>
<tr>
<td> split ([sep]) </td>
<td> Returns a list of substrings of the string using the <i>sep</i>
as the delimiter. </td>
</tr>
<tr>
<td> startswith (prefix) </td>
<td> Returns <i>True</i> if the string starts with the <i>prefix</i> and
<i>False</i> otherwise. </td>
</tr>
<tr>
<td> strip () </td>
<td> Returns a copy of the string with the leading and trailing characters
removed. </td>
</tr>
<tr>
<td> swapcase () </td>
<td> Returns a copy of the string with uppercase characters converted to
lower case and vice versa. </td>
</tr>
<tr>
<td> upper () </td>
<td> Returns a copy of the string converted to uppercase. </td>
</tr>
</table>


# String Related Functions

<p>
Strings are <i>immutable</i>, i.e. once created they cannot be changed.
    
Even though you have functions like <i>replace()</i> that give the
appearance of changing characters in a string, the reality is that the
original string is untouched and new copy with the replacements is 
returned. If the orginal variable is assigned the address of the new 
string, then the space in memory occupied by the old string is reclaimed 
by the garbage collector.
<p>

<p>
Internally, the characters in a string are represented in binary code.
Python allows you to get the numerical value of that binary code using the
function <i>ord()</i>. It also allows you to convert a valid numerical
value to a character using the <i>chr()</i> function.

In [6]:
print(ord ('5'))
print(chr (75))

53
K


In [7]:
help(ord)

Help on built-in function ord in module builtins:

ord(c, /)
    Return the Unicode code point for a one-character string.



In [8]:
help(chr)

Help on built-in function chr in module builtins:

chr(i, /)
    Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.



In [1]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

You can also force Python to evaluate a string as if it were an expression by using the eval() function. 

For example, "2 + 3" is a string. However, doing eval ("2 + 3") will return the result of the expression 2 + 3, i.e. 5. 

Similarly you can convert an expression into a string by using the str() function. To convert the literal floating point number 3.14 into a string you do str (3.14).

In [9]:
eval("2 + 3")

5

In [12]:
a = 3.14

print(a)

# str() is a build-in function 
b=str(a)

b

3.14


'3.14'

In [1]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  