# Strings

Strings are used in Python to record text information, such as names. Strings in Python are actually a *sequence*, which basically means Python keeps track of every element in the string as a sequence. For example, Python understands the string "hello' to be a sequence of letters in a specific order. This means we will be able to use indexing to grab particular letters (like the first letter, or the last letter).

This idea of a sequence is an important one in Python and we will touch upon it later on in the future.

We'll learn about the following:

    1.) Creating Strings
    2.) Printing Strings
    3.) String Indexing and Slicing
    4.) String Properties
    5.) String Methods
    6.) Print Formatting

## Creating a String
To create a string in Python you need to use either single quotes or double quotes. For example:

In [1]:
# Single word
'hello'

'hello'

In [2]:
# Entire phrase 
'This is also a string'

'This is also a string'

In [3]:
# We can also use double quote
"String built with double quotes"

'String built with double quotes'

In [4]:
# Be careful with quotes!
' I'm using single quotes, but this will create an error'

SyntaxError: invalid syntax (<ipython-input-4-da9a34b3dc31>, line 2)

The reason for the error above is because the single quote in <code>I'm</code> stopped the string. You can use combinations of double and single quotes to get the complete statement.

In [5]:
"Now I'm ready to use the single quotes inside a string!"

"Now I'm ready to use the single quotes inside a string!"

Now let's learn about printing strings!

## Printing a String

Using Jupyter notebook with just a string in a cell will automatically output strings, but the correct way to display strings in your output is by using a print function.

In [6]:
# We can simply declare a string
'Hello World'

'Hello World'

In [7]:
# Note that we can't output multiple strings this way
'Hello World 1'
'Hello World 2'

'Hello World 2'

We can use a print statement to print a string.

In [8]:
print('Hello World 1')
print('Hello World 2')
print('Use \n to print a new line')
print('\n')
print('See what I mean?')

Hello World 1
Hello World 2
Use 
 to print a new line


See what I mean?


## String Basics

We can also use a function called len() to check the length of a string!

In [9]:
len('Hello World')

11

Python's built-in len() function counts all of the characters in the string, including spaces and punctuation.

## String Indexing
We know strings are a sequence, which means Python can use indexes to call parts of the sequence. Let's learn how this works.

In Python, we use brackets <code>[]</code> after an object to call its index. We should also note that indexing starts at 0 for Python. Let's create a new object called <code>s</code> and then walk through a few examples of indexing.

In [10]:
# Assign s as a string
s = 'Hello World'

In [11]:
#Check
s

'Hello World'

In [12]:
# Print the object
print(s) 

Hello World


Let's start indexing!

In [13]:
# Show first element (in this case a letter)
s[0]

'H'

In [14]:
s[1]

'e'

In [15]:
s[2]

'l'

We can use a <code>:</code> to perform *slicing* which grabs everything up to a designated point. For example:

In [16]:
# Grab everything past the first term all the way to the length of s which is len(s)
s[1:]

'ello World'

In [17]:
# Note that there is no change to the original s
s

'Hello World'

In [18]:
# Grab everything UP TO the 3rd index
s[:3]

'Hel'

Note the above slicing. Here we're telling Python to grab everything from 0 up to 3. It doesn't include the 3rd index. You'll notice this a lot in Python, where statements and are usually in the context of "up to, but not including".

In [19]:
#Everything
s[:]

'Hello World'

We can also use negative indexing to go backwards.

In [20]:
# Last letter (one index behind 0 so it loops back around)
s[-1]

'd'

In [21]:
# Grab everything but the last letter
s[:-1]

'Hello Worl'

We can also use index and slice notation to grab elements of a sequence by a specified step size (the default is 1). For instance we can use two colons in a row and then a number specifying the frequency to grab elements. For example:

In [22]:
# Grab everything, but go in steps size of 1
s[::1]

'Hello World'

In [23]:
# Grab everything, but go in step sizes of 2
s[::2]

'HloWrd'

In [24]:
# We can use this to print a string backwards
s[::-1]

'dlroW olleH'

## String Properties
It's important to note that strings have an important property known as *immutability*. This means that once a string is created, the elements within it can not be changed or replaced. For example:

In [25]:
s

'Hello World'

In [26]:
# Let's try to change the first letter to 'x'
s[0] = 'x'

TypeError: 'str' object does not support item assignment

Notice how the error tells us directly what we can't do, change the item assignment!

Something we *can* do is concatenate strings!

In [27]:
s

'Hello World'

In [28]:
# Concatenate strings!
s + ' concatenate me!'

'Hello World concatenate me!'

In [29]:
# We can reassign s completely though!
s = s + ' concatenate me!'

In [30]:
print(s)

Hello World concatenate me!


In [31]:
s

'Hello World concatenate me!'

We can use the multiplication symbol to create repetition!

In [32]:
letter = 'z'

In [33]:
letter*10

'zzzzzzzzzz'

### String concatenation 

In [102]:
s1 = "Python"

In [103]:
s2 = "Language"

In [104]:
s1+s2

'PythonLanguage'

In [105]:
s+' '+s2

'Python Language'

In [107]:
'Text'+' '+'content are in file'

'Text content are in file'

## Basic Built-in String methods

Objects in Python usually have built-in methods. These methods are functions inside the object that can perform actions or commands on the object itself.

We call methods with a period and then the method name. Methods are in the form:

object.method(parameters)

Where parameters are extra arguments we can pass into the method. 

### capitalize(	)
Return a copy of the string with only its first character capitalized.

In [1]:
s = "hello world"

In [2]:
s.capitalize()

'Hello world'

#### center(	width[, fillchar])

Return centered in a string of length width. Padding is done using the specified fillchar (default is a space). 

In [3]:
s = "Fruits"

In [4]:
s.center(40,'*')

'*****************Fruits*****************'

In [5]:
s.center(30,'=')



In [8]:
s.center(50,'+')

'++++++++++++++++++++++Fruits++++++++++++++++++++++'

#### count(	sub[, start[, end]])

Return the number of occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

In [9]:
s = "Mishapyeeiii"

In [12]:
s.count('s')

1

In [14]:
s.count('i')

4

In [15]:
s.count('e',0,11)

2

#### decode(	[encoding[, errors]])

Decodes the string using the **codec** registered for encoding. encoding defaults to the default string encoding. errors may be given to set a different error handling scheme. The default is 'strict', meaning that encoding errors raise **UnicodeError**. Other possible values are 'ignore', 'replace' and any other name registered via **codecs.register_error**.

#### encode(	[encoding[,errors]])

Return an encoded version of the string. Default encoding is the current default string encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error. New in version 2.0. Changed in version 2.3: Support for 'xmlcharrefreplace' and 'backslashreplace' and other error handling schemes added.

More documentation:
https://docs.python.org/3/howto/unicode.html

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.) UTF-8 uses the following rules:

1. If the code point is < 128, it’s represented by the corresponding byte value.

2. If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255.

UTF-8 has several convenient properties:

1. It can handle any Unicode code point.

2. A Unicode string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the null character (U+0000). This means that UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes for anything other than end-of-string markers.

3. A string of ASCII text is also valid UTF-8 text.

4. UTF-8 is fairly compact; the majority of commonly used characters can be represented with one or two bytes.

5. If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8.

6. UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded.

Reference: https://home.unicode.org/

In [16]:
s = 'Test'

In [32]:
## String Encode
encoded_string = s.encode(encoding='utf8',errors='strict')
encoded_string

b'Test'

In [33]:
## String decode

encoded_string.decode()

'Test'

## Emoji printing 

In [34]:
"\u0394"  

'Δ'

In [35]:
"\U00000394"

'Δ'

In [28]:
"\N{GREEK CAPITAL LETTER DELTA}"

'Δ'

In [36]:
"\N{GRINNING FACE}"

'😀'

In [48]:
print("\u2714")

✔


In [49]:
print("\u2716")

✖


In [50]:
print("\uFF10")

０


#### endswith(	suffix[, start[, end]])
Return **True** if the string ends with the specified suffix, otherwise return **False**. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

In [52]:
s = "Python is a OOPs language"

In [53]:
s.endswith('language')

True

In [54]:
s.endswith('OOPs')

False

#### expandtabs(	[tabsize])
Return a copy of the string where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

In [66]:
s = "Python is\ta\tobject oriented\tprogramming\tlanguage"

In [67]:
s.expandtabs()

'Python is       a       object oriented programming     language'

In [68]:
s.expandtabs(2)

'Python is a object oriented programming language'

In [69]:
s.expandtabs(10)

'Python is a         object oriented     programming         language'

#### find(	sub[, start[, end]])

Return the lowest index in the string where substring sub is found, such that sub is contained in the range [start, end]. Optional arguments start and end are interpreted as in slice notation. Return **-1** if sub is not found.

In [70]:
s = "Python new language"

In [71]:
s.find('n')

5

In [72]:
s.find('z')

-1

#### rfind(	sub [,start [,end]])
Return the highest index in the string where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure.

In [38]:
s = "Python new language"

In [39]:
s.rfind('n')

13

In [40]:
s[13]

'n'

#### index(	sub[, start[, end]])
Like find(), but raise **ValueError** when the substring is not found.

In [73]:
s.index('n')

5

In [74]:
s.index('z')

ValueError: substring not found

#### rindex(	sub[, start[, end]])
Like rfind() but raises ValueError when the substring sub is not found.

In [41]:
s = "Python new language"

In [42]:
s.rindex('n')

13

In [43]:
s.rindex('Z')

ValueError: substring not found

#### isalnum(	)
Return true if all characters in the string are alphanumeric and there is at least one character, false otherwise.
For 8-bit strings, this method is locale-dependent.

In [75]:
s.isalnum()

False

In [76]:
s = "Pyth0n"

In [77]:
s.isalnum()

True

#### isalpha(	)
Return true if all characters in the string are alphabetic and there is at least one character, false otherwise.
For 8-bit strings, this method is locale-dependent.

In [79]:
s = "python"

In [80]:
s.isalpha()

True

#### isdigit(	)
Return true if all characters in the string are digits and there is at least one character, false otherwise.
For 8-bit strings, this method is locale-dependent.

In [85]:
s = "Python3 is powerful"

In [86]:
s.isdigit()

False

In [87]:
s = "1234"

In [88]:
s.isdigit()

True

In [89]:
s = "20 years ago Python was developed"

In [90]:
s.isdigit()

False

In [91]:
s = "2"

In [92]:
s.isdigit()

True

#### islower(	)
Return true if all cased characters in the string are lowercase and there is at least one cased character, false otherwise.

In [93]:
s = "Python"

In [60]:
s.islower()

True

In [95]:
s = "python"

In [96]:
s.islower()

True

#### isspace(	)
Return true if there are only whitespace characters in the string and there is at least one character, false otherwise.

In [99]:
s = "p y t h o n"

In [100]:
s.isspace()

False

In [105]:
s = " "

In [106]:
s.isspace()

True

In [107]:
s = " a "

In [108]:
s.isspace()

False

#### istitle(	)
Return true if the string is a titlecased string and there is at least one character, for example uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return false otherwise.

In [109]:
s = "python"

In [110]:
s.istitle()

False

In [112]:
s = "Python"

In [113]:
s.istitle()

True

#### isupper(	)
Return true if all cased characters in the string are uppercase and there is at least one cased character, false otherwise.

#### islower(	)
Return true if all cased characters in the string are lowercase and there is at least one cased character, false otherwise.

In [56]:
s = "PY"

In [57]:
s.isupper()

True

In [58]:
s = 'python'

In [59]:
s.islower()

True

#### join(	seq)
Return a string which is the concatenation of the strings in the sequence seq. The separator between elements is the string providing this method.

In [116]:
s = "Python"

In [121]:
"_".join(s)

'P_y_t_h_o_n'

In [118]:
s

'Python'

In [122]:
s = "=>"

seq = ("A",'B','C') # This is sequence of strings.

s.join(seq)

'A=>B=>C'

In [61]:
s = " "
seq = ('A','B','C')

In [62]:
s.join(seq)

'A B C'

#### ljust(	width[, fillchar])
Return the string left justified in a string of length width. Padding is done using the specified fillchar (default is a space). The original string is returned if width is less than len(s).

In [123]:
s = "Text file"

In [129]:
s.ljust(20,'*')

'Text file***********'

In [130]:
s = "Text"

In [131]:
s.ljust(25,"=")



#### rjust(	width[, fillchar])
Return the string right justified in a string of length width. Padding is done using the specified fillchar (default is a space). The original string is returned if width is less than len(s). Changed in version 2.4: Support for the fillchar argument.

In [31]:
s = "Text file"

In [33]:
s.rjust(25)

'                Text file'

In [34]:
s.rjust(25,'+')

'++++++++++++++++Text file'

In [37]:
print(s.rjust(25))

                Text file


#### lower(	)
Return a copy of the string converted to lowercase.
For 8-bit strings, this method is locale-dependent.

In [1]:
s = "PYTHON"

In [2]:
s.lower()

'python'

#### upper(	)
Return a copy of the string converted to uppercase.

In [17]:
s = 'python'

In [18]:
s.upper()

'PYTHON'

#### strip(	[chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:

In [48]:
s = "    Study Python Programming     "

In [49]:
s.strip()

'Study Python Programming'

In [51]:
s = " Python"

In [52]:
s.strip('n')

' Pytho'

In [53]:
s.strip()

'Python'

#### lstrip(	[chars])
Return a copy of the string with leading characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix; rather, all combinations of its values are stripped:


In [3]:
'     Python'.lstrip()

'Python'

In [5]:
s = "    Study Python Programming     "

In [6]:
s.lstrip()

'Study Python Programming     '

#### rstrip(	[chars])
Return a copy of the string with trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped:

In [7]:
s = "    Study Python Programming     "

In [8]:
s.rstrip()

'    Study Python Programming'

In [9]:
s.lstrip().rstrip()

'Study Python Programming'

In [14]:
s = "Python is a programming language"

In [15]:
s.rstrip('age')

'Python is a programming langu'

In [16]:
s.lstrip('Py')

'thon is a programming language'

#### partition(	sep)
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. New in version 2.5.

In [19]:
s = "Python is a object oriented programing language"

In [21]:
s.partition(',')

('Python is a object oriented programing language', '', '')

In [22]:
s.partition('oriented')

('Python is a object ', 'oriented', ' programing language')

In [23]:
s = 'Python'

In [24]:
s.partition('t')

('Py', 't', 'hon')

#### rpartition(	sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself. New in version 2.5.

In [45]:
s = 'Pythoonn'

In [46]:
s.rpartition('o')

('Pytho', 'o', 'nn')

In [47]:
s.rpartition('T')

('', '', 'Pythoonn')

#### replace(	old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

In [25]:
s = "Python is a popular programming language"

In [26]:
s.replace('Python','Java')

'Java is a popular programming language'

In [27]:
s = 'Python Python Python'

In [28]:
s.replace('Python','CPP')

'CPP CPP CPP'

In [29]:
s

'Python Python Python'

In [30]:
s.replace('Python','CPP',2)

'CPP CPP Python'

#### split(	[sep [,maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If **maxsplit** is given, at most **maxsplit** splits are done. (thus, the list will have at most **maxsplit+1** elements). 

If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made). Consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, "'1„2'.split(',')"returns "['1', '', '2']"). 


The sep argument may consist of multiple characters (for example, "'1, 2, 3'.split(', ')" returns "['1', '2', '3']"). Splitting an empty string with a specified separator returns "['']".



If sep is not specified or is None, a different splitting algorithm is applied. First, whitespace characters (spaces, tabs, newlines, returns, and formfeeds) are stripped from both ends. Then, words are separated by arbitrary length strings of whitespace characters. Consecutive whitespace delimiters are treated as a single delimiter ("'1 2 3'.split()" returns "['1', '2', '3']"). Splitting an empty string or a string consisting of just whitespace returns an empty list.

In [63]:
s = "Python is a OOP programming language"

In [64]:
s.split()

['Python', 'is', 'a', 'OOP', 'programming', 'language']

In [65]:
s.split('OOP')

['Python is a ', ' programming language']

In [66]:
s.split('OOP',2)

['Python is a ', ' programming language']

In [67]:
s = "Texts are word contents are mixed of some Texts"

In [69]:
s.split('Texts')

['', ' are word contents are mixed of some ', '']

In [70]:
s.split('Texts',3)

['', ' are word contents are mixed of some ', '']

## rsplit(	[sep [,maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any whitespace string is a separator. Except for splitting from the right, rsplit() behaves like split() which is described in detail below. New in version 2.4.

In [71]:
s = "Texts are word contents are mixed of some Texts"

In [72]:
s.rsplit()

['Texts', 'are', 'word', 'contents', 'are', 'mixed', 'of', 'some', 'Texts']

In [73]:
s.rsplit('Texts')

['', ' are word contents are mixed of some ', '']

#### splitlines(	[keepends])
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

In [77]:
s = "Texts are word contents are mixed of some Texts\nLine 2 texts\nLine 3 texts"

In [78]:
s.splitlines()

['Texts are word contents are mixed of some Texts',
 'Line 2 texts',
 'Line 3 texts']

In [85]:
s.splitlines(0)

['Texts are word contents are mixed of some Texts',
 'Line 2 texts',
 'Line 3 texts']

In [86]:
s.splitlines(1)

['Texts are word contents are mixed of some Texts\n',
 'Line 2 texts\n',
 'Line 3 texts']

#### startswith(	prefix[, start[, end]])

Return **True** if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

In [87]:
s = "Python is OOP language"

In [88]:
s.startswith("Python")

True

In [89]:
s.startswith('P')

True

In [90]:
s.startswith('is')

False

#### swapcase(	)
Return a copy of the string with uppercase characters converted to lowercase and vice versa.

In [91]:
s = "PYTHON is a programming language"

In [92]:
s.swapcase()

'python IS A PROGRAMMING LANGUAGE'

#### title(	)
Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase.

In [93]:
s = "python is OOP language"

In [94]:
s.title()

'Python Is Oop Language'

#### translate(	table[, deletechars])
The string translate() method returns a string where each character is mapped to its corresponding character in the translation table.

The syntax of the **translate()** method is:

   string.translate(table)
   


##### String translate() Parameters

translate() method takes a single parameter:

table - a translation table containing the mapping between two characters; usually created by maketrans()


##### Return value from String translate()

translate() method returns a string where each character is mapped to its corresponding character as per the translation table.


**Reference**

https://www.tutorialspoint.com/python/string_translate.htm

https://www.programiz.com/python-programming/methods/string/translate


In [98]:
# first string
firstString = "abc"
secondString = "ghi"
thirdString = "ab"

string = "abcdef"
print("Original string:", string)

translation = string.maketrans(firstString, secondString, thirdString)
# translate string
print("Translated string:", string.translate(translation))

Original string: abcdef
Translated string: idef


#### zfill(	width)
Return the numeric string left filled with zeros in a string of length width. The original string is returned if width is less than len(s). New in version 2.2.2.

The width specifies the length of the returned string from zfill() with '0' digits filled to the left.


##### Return Value from zfill()
The zfill() returns a copy of the string with '0' filled to the left. The length of the returned string depends on the width provided.

- Suppose, the initial length of the string is 10. And, the width is specified 15. In this case, the zfill() returns a copy of the string with five '0' digits filled to the left.

- Suppose, the initial length of the string is 10. And, the width is specified 8. In this case, the zfill() doesn't fill '0' digits to the left and returns a copy of the original string. The length of the returned string in this case will be 10.


In [97]:
s = "Python"

In [100]:
s.zfill(15)

'000000000Python'

In [101]:
s.zfill(10)

'0000Python'

## Print Formatting

We can use the .format() method to add formatted objects to printed string statements. 

The easiest way to show this is through an example:

In [39]:
'Insert another string with curly brackets: {}'.format('The inserted string')

'Insert another string with curly brackets: The inserted string'

### Reference

- https://docs.python.org/2.5/lib/string-methods.html

- https://www.tutorialspoint.com/python3/python_strings.htm

- https://www.programiz.com/python-programming/methods/string