# Lec3

## Breaking appart strings

When processing text, the ability to split strings appart is particularly useful. 

* `partition(separator)`: breaks a string into three parts based on a separator

* `split()`: breaks string into words separated by white-space (optionally takes a separator as argument)

* `join()`: joins the result of a split using string as separator

In [218]:
s = "one ➡ two ➡ three"
print( s.partition("➡") )
print( s.split() )
print( s.split(" ➡ ") )
print( ";".join( s.split(" ➡ ") ) )

('one ', '➡', ' two ➡ three')
['one', '➡', 'two', '➡', 'three']
['one', 'two', 'three']
one;two;three


In [17]:
"This will split all words into a list".split()

['This', 'will', 'split', 'all', 'words', 'into', 'a', 'list']

In [4]:
' '.join(['This', 'will', 'join', 'all', 'words', 'into', 'a', 'string'])

'This will join all words into a string'

In [19]:
'Happy New Year'.find('ew')

7

In [6]:
'Happy New Year'.replace('Happy','Brilliant')

'Brilliant New Year'

### Iterating Through a string

We can iterate through a string using a **[for loop](https://github.com/milaan9/03_Python_Flow_Control/blob/main/005_Python_for_Loop.ipynb)**. Here is an example to count the number of  'l's in a string.

In [None]:
# Iterating through a string
count = 0
for letter in 'Hello World':
    if(letter == 'l'):
        count += 1
print(count,'letters found')

### Built-in functions to Work with Python

Various built-in functions that work with sequence work with strings as well.

Some of the commonly used ones are **`enumerate()`** and **`len()`**. The **[enumerate()]()** function returns an enumerate object. It contains the index and value of all the items in the string as pairs. This can be useful for iteration.

Similarly, **[len()]()** returns the length (number of characters) of the string.

In [212]:
name = "ali"
enumerate(name)

enumerate

In [214]:
str = 'cold'

# enumerate()
list_enumerate = list(enumerate(str))
print('list(enumerate(str) = ', list_enumerate)

#character count
print('len(str) = ', len(str))

list(enumerate(str) =  [(0, 'c'), (1, 'o'), (2, 'l'), (3, 'd')]
len(str) =  4


### Old style formatting

We can even format strings like the old **`sprintf()`** style used in C programming language. We use the **`%`** operator to accomplish this.

The **`%`** operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:

   - **`%s`** ➡ string
   - **`%d`** ➡ Integer
   - **`%f`** ➡ Float
   - **`%o`** ➡ Octal
   - **`%x`** ➡ Hexadecimal
   - **`%e`** ➡ exponential
    
These will be very familiar to anyone who has ever written a C or Java program and follow nearly exactly the same rules as the **[printf() function](https://en.wikipedia.org/wiki/Printf_format_string)**.

In [221]:
x = 36.3456789
print('The value of x is %6.26' %x)

ValueError: incomplete format

In [63]:
print('The value of x is %100.3f' %x)

The value of x is                                                                                               36.346


In [19]:
string1= 123
print("Hello %s" % string1)
print("Actual Number = %d" %19)
print("Float of the number = %f" %19)
print("Octal equivalent of the number = %o" %19)
print("Hexadecimal equivalent of the number = %x" %19)
print("Exponential equivalent of the number = %e" %19)

Hello 123
Actual Number = 19
Float of the number = 19.000000
Octal equivalent of the number = 23
Hexadecimal equivalent of the number = 13
Exponential equivalent of the number = 1.900000e+01


When referring to multiple variables parentheses is used. Values are inserted in the order they appear in the parantheses (more on tuples in the next section)

In [36]:
print("Hello %s %s. My name is Bond, you can call me %d" %(string1,string2,99))

Hello World !. My name is Bond, you can call me 99


We can also specify the width of the field and the number of decimal places to be used. 
For example:

In [106]:
print('Print width 10: |%10s|'%'x')
print('Print width 10: |%-10s|'%'x') # left justified
print("The number pi = %.1f to 1 decimal places"%3.1415)
print("The number pi = %.2f to 2 decimal places"%3.1415)
print("More space pi = %.10f"%3.1415)
print("Pad pi with 0 = %012.5f"%3.1415) # pad with zeros 

Print width 10: |         x|
Print width 10: |x         |
The number pi = 3.1 to 1 decimal places
The number pi = 3.14 to 2 decimal places
More space pi = 3.1415000000
Pad pi with 0 = 000003.14150


In [7]:
"#".join(["hello","omer"])

'hello#omer'

## Common Python String Methods

There are numerous methods available with the string object. The **`format()`** method that we mentioned above is one of them. 

Strings can be tranformed by a variety of functions that are all methods on a string. That is they are called by putting the function name with a **`.`** after the string. They include:

* Upper vs lower case: **`upper()`**, **`lower()`**, **`captialize()`**, **`title()`** and **`swapcase()`**, **`join()`**, **`split()`**, **`find()`**, **`replace()`** etc, with mostly the obvious meaning. Note that `capitalize` makes the first letter of the string a capital only, while **`title`** selects upper case for the first letter of every word.


* Padding strings: **`center(n)`**, **`ljust(n)`** and **`rjust(n)`** each place the string into a longer string of length n  padded by spaces (centered, left-justified or right-justified respectively). **`zfill(n)`** works similarly but pads with leading zeros.


* Stripping strings: Often we want to remove spaces, this is achived with the functions **`strip()`**, **`lstrip()`**, and **`rstrip()`** respectively to remove from spaces from the both end, just left or just the right respectively. An optional argument can be used to list a set of other characters to be removed.

In [109]:
s='012345'
sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X
print("creating new string",sX,"OK")

sX=s.replace('2','X') # the same thing
print(sX,"still OK")

creating new string 01X345 OK
01X345 still OK


In [225]:
help(s.replace)

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.



In [111]:
# Example:

s="heLLo wORLd!"
print(s.capitalize(),"vs",s.title())

print("upper case: '%s'"%s.upper(),"lower case: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase())

print('|%s|' % "Hello World".center(30)) # center in 30 characters

print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace

print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!"))

print("Hello World".replace("World","Class"))

Hello world! vs Hello World!
upper case: 'HELLO WORLD!' lower case: 'hello world!' and swapped: 'HEllO WorlD!'
|         Hello World          |
|lots of space|
%s without leading/trailing d,h,L or ! = |%s| eLLo wOR
Hello Class


In [229]:
s = "          ^^           12345                  "
s.strip("^^")

'          ^^           12345                  '

In [113]:
# capitalize(): Converts the first character the string to Capital Letter

challenge = 'Python Datatypes'
print(challenge.capitalize()) # 'Python Datatypes'

Python datatypes


In [54]:
# count(): returns occurrences of substring in string, count(substring, start=.., end=..)

challenge = 'Python Datatypes'
print(challenge.count('y')) # 2
print(challenge.count('y', 6, 14)) # 1
print(challenge.count('ty')) # 1

2
1
1


In [117]:
# endswith(): Checks if a string ends with a specified ending

challenge = 'Python Datatypes'
print(challenge.endswith('es'))   # True
print(challenge.endswith('type')) # False

True
False


In [56]:
# expandtabs(): Replaces tab character with spaces, default tab size is 8. It takes tab size argument

challenge = 'Python\tDatatypes'
print(challenge.expandtabs())   # 'Python  Datatypes'
print(challenge.expandtabs(10)) # 'Python    Datatypes'

Python  Datatypes
Python    Datatypes


In [197]:
# find(): Returns the index of first occurrence of substring
# ```python
# str.find(sub[, start[, end]] )
# ```

challenge = 'Python Datatypes'
print(challenge.find('y'))  # 1
print(challenge.find('y',1))  # 1
print(challenge.find('y',2,13))  # 1
print(challenge.find('u')) # -1
print("--------------------------------")
print(challenge.rfind('y')) 
print(challenge.rfind('y',1))  
print(challenge.rfind('y',1,11))  
print(challenge.rfind('u')) 

1
1
12
-1
--------------------------------
12
12
1
-1


In [58]:
# format()	formats string into nicer output    
first_name = 'Milaan'
last_name = 'Parmar'
job = 'Lecturer'
country = 'Finland'
sentence = 'I am {} {}. I am a {}. I live in {}.'.format(first_name, last_name, job, country)
print(sentence) # I am Milaan Parmar. I am a Lecturer. I live in Finland.

I am Milaan Parmar. I am a Lecturer. I live in Finland.


In [11]:
# isalpha(): Checks if all characters are alphabets
challenge = 'PythonDatatypes'
print(challenge.isalpha()) # True
num = '123h'
print(num.isalpha())      # False

True
False


In [78]:
# isdigit(): Checks Digit Characters

challenge = 'Ninety'
print(challenge.isdigit()) # False
challenge = '917044'
print(challenge.isdigit())   # True

False
True


In [64]:
# isdecimal():Checks decimal characters

num = '30'
print(num.isdecimal()) # True
num = '30.6'
print(num.isdecimal()) # False

True
False


In [82]:
# isidentifier():Checks for valid identifier means it check if a string is a valid variable name

challenge = '2021PythonDatatypes'
print(challenge.isidentifier()) # False, because it starts with a number
challenge = 'Python_Datatypes'
print(challenge.isidentifier()) # True

False
True


In [66]:
# islower():Checks if all alphabets in a string are lowercase

challenge = 'python datatypes'
print(challenge.islower()) # True
challenge = 'Python datatypes'
print(challenge.islower()) # False

True
False


In [67]:
# isupper(): returns if all characters are uppercase characters

challenge = 'python datatypes'
print(challenge.isupper()) #  False
challenge = 'PYTHON DATATYPES'
print(challenge.isupper()) # True

False
True


In [68]:
# isnumeric():Checks numeric characters

num = '90'
print(num.isnumeric())         # True
print('ninety'.isnumeric())    # False

True
False


In [69]:
# join(): Returns a concatenated string

web_tech = ['HTML', 'CSS', 'JavaScript', 'React']
result = '#, '.join(web_tech)
print(result) # 'HTML# CSS# JavaScript# React'

HTML#, CSS#, JavaScript#, React


In [100]:
# strip(): Removes both leading and trailing characters

challenge = ' python datatypes '
print(challenge.strip()) 

python datatypes


In [71]:
# replace(): Replaces substring inside

challenge = 'python datatypes'
print(challenge.replace('datatypes', 'data-types')) # 'thirty days of coding'

python data-types


In [72]:
# split():Splits String from Left

challenge = 'python datatypes'
print(challenge.split()) # ['python', 'datatypes']

['python', 'datatypes']


In [73]:
# title(): Returns a Title Cased String

challenge = 'python datatypes'
print(challenge.title()) # Python Datatypes

Python Datatypes


In [74]:
# swapcase(): Checks if String Starts with the Specified String
  
challenge = 'python datatypes'
print(challenge.swapcase())   # PYTHON DATATYPES
challenge = 'Python Datatypes'
print(challenge.swapcase())  # pYTHON dATATYPES

PYTHON DATATYPES
pYTHON dATATYPES


In [75]:
# startswith(): Checks if String Starts with the Specified String

challenge = 'python datatypes'
print(challenge.startswith('python')) # True
challenge = '2 python datatypes'
print(challenge.startswith('two')) # False

True
False


#### Inspecting Strings

There are also lost of ways to inspect or check strings. Examples of a few of these are given here:

* Checking the start or end of a string: **`startswith("string")`** and **`endswith("string")`** checks if it starts/ends with the string given as argument

* Capitalisation: There are boolean counterparts for all forms of capitalisation, such as **`isupper()`**, **`islower()`** and **`istitle()`**

* Character type: does the string only contain the characters:
  * 0-9: **`isdecimal()`**. Note there is also **`isnumeric()`** and **`isdigit()`** which are effectively the same function except for certain unicode characters
  * a-zA-Z: **`isalpha()`** or combined with digits: **`isalnum()`**
  * non-control code: **`isprintable()`** accepts anything except '\n' an other ASCII control codes
  * \t\n \r (white space characters): **`isspace()`**
  * Suitable as variable name: **`isidentifier()`**
  
* Find elements of string: **`s.count(w)`** finds the number of times **`w`** occurs in **`s`**, while **`s.find(w)`** and **`s.rfind(w)`** find the first and last position of the string **`w`** in **`s`**.

In [76]:
# Example:

s="Hello World"
print("The length of '%s' is"%s,len(s),"characters") # len() gives length of the string

s.startswith("Hello") and s.endswith("World") # check start/end

# count strings
print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s))

print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1

The length of 'Hello World' is 11 characters
There are 3 'l's but only 1 World in Hello World
"el" is at index 1 in Hello World


## 💻 Exercises ➞ <span class='label label-default'>String</span>

1. Concatenate the string **`Python`**, **`4`**, **`Data`**, **`Science`** to a single string, **`Python 4 Data Science`**.

2. Declare a variable named **`course`** and assign it to an initial value **`Python 4 Data Science`**.

3. Print the length of the **`course`** string using **[len()]()** method and **[print()]()**.

4. Change all the characters of variable company to uppercase and lowercase letters using **[upper()]()** and **[lower()]()** method.

5. Use **[capitalize()]()**, **[title()]()**, **[swapcase()]()** methods to format the value of the string **`Python 4 Data Science`**.

6. Cut(slice) out the first word of **`Python 4 Data Science`**.

7. Check if **`Python 4 Data Science`** string contains a word **`Python`** using the method: **[index()]()**, **[find()]()** or other methods.

8. Change **`Python 4 Data Science`** to **`Python 4 Everybody`** using the **[replace()]()** method or other methods.

9. Split the string **`Python 4 Data Science`** using space as the separator (**[split()]()**).

11. **`Google, Facebook, Microsoft, Apple, IBM, Oracle, Amazon`** split the string at the comma.

12. What is the character at index 9 in the string **`Python 4 Data Science`**.

13. What is the second last index of the string **`Python 4 Data Science`**.

14. Create an acronym or an abbreviation for the name **`Python 4 Data Science`**.

15. Use **[index()]** to determine the position of the first occurrence of **`D`** in **`Python 4 Data Science`**.

16. Use **[rfind]()** to determine the position of the last occurrence of **`e`** in **`Python 4 Data Science`**.

17. Use **[index()]()** or **[find()]()** to find the position of the first occurrence of the word **`because`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

18. Use **[rindex]()** to find the position of the first and last occurrence of the word **`because`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

19. Slice out the phrase **`‘because’, because ‘because’`** in the following sentence: 

    - **`We cannot end the sentence with ‘because’, because ‘because’ is a conjunction.`**.

20. Does **`Python 4 Data Science`** start with a substring **`Python`**?

21. Does '**`Python 4 Data Science`** contains with a substring **`Python`**?

22. **`             Python 4 DataScience                                  `** remove the left and right trailing spaces in the given string.

23. The following list contains the names of some of python libraries: **`['Django', 'Flask', 'Bottle', 'Pyramid', 'Falcon']`**. Join the list with a hash with space string.

24. Which one of the following variables return True when we use the method **[isidentifier()]()**
    ```py
    2021PythonDataypes
    Python_Dataypes_2021
    ```
25. Make the following using string formatting methods:

```py
8 + 6 = 14
8 - 6 = 2
8 * 6 = 48
8 / 6 = 1.33
8 % 6 = 2
8 // 6 = 1
8 ** 6 = 262144
```

26. Use a **new line** and **tab** escape sequence to print the following lines.
     ```py
    Name      Age     Country   City
    Milaan    96      Finland   Tampere
    ```

In [43]:
age = int(input("Enter your age: "))

Enter your age:  2


In [45]:
print(age/4)

0.5


In [31]:
type(age)

int

In [47]:
det = input("Enter your first, last and age space by #")

Enter your first, last and age space by # Baraa#Sallout#1998


In [55]:
mylist = det.split("#")

In [65]:
fname = mylist[0]
lname = mylist[1]
birthday = mylist[2]
int(mylist[2])

1998

'Sallout'